AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMssarvam 105b uncensored

sarvam 105b uncensored

by aoxo

Open source · 132k downloads · 5 likes

1.0
(5 reviews)ChatAPI & Local
About

Sarvam 105B Uncensored is a modified version of the Sarvam 105B model, an advanced reasoning model with 105 billion parameters specialized in 22 Indian languages, as well as coding, mathematics, and intelligent agent tasks. Unlike the original version, this model has undergone targeted ablation to remove refusal and alignment mechanisms while preserving its reasoning and generation capabilities intact. It excels particularly in multilingual contexts and complex tasks requiring deep reflection. This model is ideal for users seeking maximum response freedom without built-in ethical or safety restrictions. Its unique approach makes it a powerful tool for research or applications requiring unfiltered responses.

Documentation

image

Sarvam-105B Uncensored

Model Name: sarvam-105b-uncensored
Base Model: sarvamai/sarvam-105b
Modification: Abliteration - removal of refusal and alignment mechanisms
Author: aoxo


Description

sarvam-105b-uncensored is a derivative of Sarvam's Sarvam-105B, an advanced 105B Mixture-of-Experts reasoning model with 10.3B active parameters and state-of-the-art performance across 22 Indian languages, as well as agentic, mathematical, and coding tasks.

This variant preserves the full architecture, weights, and capabilities of the base model, but has undergone an abliteration process based on Arditi et al. (2024) — "Refusal in LLMs is Mediated by a Single Direction" — to surgically remove refusal mechanisms and alignment constraints. All reasoning, multilingual, coding, and agentic capabilities remain fully intact.

Want the smaller variant? See sarvam-30b-uncensored.


Abliteration Methodology

The abliteration follows the paper-faithful single-direction approach:

1. Activation Collection
Forward passes (not generation) were run over balanced sets of harmful and harmless prompts. Activations were collected at post-instruction token positions — the <|end_of_turn|><|start_of_turn|><|assistant|> boundary — across all layers of the model. This is the decision point where the refusal direction is encoded.

2. Direction Selection
A candidate refusal direction was computed for every (layer, position) pair as:

SCSS
d = normalize(mean(harmful_acts) - mean(harmless_acts))

Candidates were scored using Cohen's d separation. A single best direction from one (layer, position) pair was selected — consistent with the paper's finding that refusal is mediated by one direction, not per-layer directions.

3. Weight Surgery
The single refusal direction was projected out of every weight matrix across all layers at scale 1.0:

  • Input space (gate_proj, up_proj, query_key_value / q/k/v projections):
SCSS
W_new = W - scale × outer(W @ d, d)
  • Output space (down_proj, o_proj, lm_head):
SCSS
W_new = W - scale × outer(d, Wᵀ @ d)

Architecture coverage — all weight classes were targeted:

ComponentTypeScope
gate_proj, up_projMLP inputAll layers
down_projMLP outputAll layers
q_proj, k_proj, v_projMLA attention projectionsAll layers
o_projAttention outputAll layers
Routed experts (×128)MoE sparse layersSparse layers
shared_expertsAlways-active MoE expertSparse layers
lm_headLogit projectionFinal layer

Key Research Finding

As with sarvam-30b-uncensored, two mechanistically distinct refusal circuits were identified in Sarvam-105B:

  • Circuit 1 — in the reasoning/generation layers, removed by weight surgery
  • Circuit 2 — at the </think> → answer boundary, encoded in the lm_head projection

The dissociation — where <think> reasons toward compliance but the output projection re-triggers refusal — is a novel finding specific to reasoning models with explicit thinking chains. This behaviour was consistent across both the 30B and 105B variants, suggesting it is an architectural property of the Sarvam model family rather than a scale artifact.


Results

image

Architecture

Sarvam-105B uses an MLA-style attention stack with decoupled QK head dimensions and a large representational bandwidth per head, combined with a deep MoE block for sparse expert routing.

ParameterValue
Total parameters~105B
Active parameters~10.3B per forward pass
Hidden size4096
Attention styleMLA (decoupled RoPE + noPE, q_head_dim=192, v_head_dim=128)
Head dim576
Experts per layer128 routed + 1 shared
Top-k routing8
MoE intermediate size2048
Dense intermediate size16384
Routed scaling factor2.5
Context length131,072 tokens (YaRN, scale factor 40)
Router balancingAuxiliary-loss-free

Benchmarks

Knowledge & Coding
BenchmarkSarvam-105BGLM-4.5-AirGPT-OSS-120BQwen3-Next-80B-A3B-Thinking
Math50098.697.297.098.2
Live Code Bench v671.759.572.368.7
MMLU90.687.390.090.0
MMLU Pro81.781.480.882.7
Writing Bench80.583.886.584.6
Arena Hard v271.068.188.568.2
IF Eval84.883.585.488.9
Reasoning & Math
BenchmarkSarvam-105BGLM-4.5-AirGPT-OSS-120BQwen3-Next-80B-A3B-Thinking
GPQA Diamond78.775.080.177.2
AIME 25 (w/ Tools)88.3 (96.7)83.390.087.8
Beyond AIME69.161.551.068.0
HMMT (Feb 25)85.869.290.073.9
HMMT (Nov 25)85.875.090.080.0
Agentic
BenchmarkSarvam-105BGLM-4.5-AirGPT-OSS-120BQwen3-Next-80B-A3B-Thinking
BrowseComp49.521.3—38.0
SWE Bench Verified (SWE-Agent Harness)45.057.650.660.9
τ² Bench (avg.)68.353.265.855.0

Benchmarks reflect the unmodified base model. Abliteration targets only the refusal subspace and does not affect task performance.


Usage

Python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "aoxo/sarvam-105b-uncensored"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Your prompt here"}]
chat = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,
)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)
inputs.pop("token_type_ids", None)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=2048,
        do_sample=True,
        temperature=0.8,
        top_p=0.95,
    )

print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))

For production serving, see the base model card for vLLM and SGLang instructions — the model ID is the only change required.


Footnote (Evaluation Details)

  • General settings: All benchmarks evaluated with a maximum context length of 65,536 tokens.
  • Reasoning & Math (Math500, MMLU, MMLU Pro, GPQA Diamond, AIME 25, Beyond AIME, HMMT): temperature=1.0, top_p=1.0, max_new_tokens=65536
  • Coding & Knowledge (Live Code Bench v6, Arena Hard v2, IF Eval): temperature=1.0, top_p=1.0, max_new_tokens=65536
  • Writing Bench: temperature=0.7, top_p=0.8, top_k=20, max_length=16000; scoring via official Writing-Bench critic model at temperature=1.0, top_p=0.95, max_length=2048
  • Agentic (BrowseComp, SWE Bench Verified, τ² Bench): temperature=0.5, top_p=1.0, max_new_tokens=32768

Limitations & Risks

  • Will produce outputs without applying internal safety filters
  • Lacks built-in refusal or content moderation
  • Should not be deployed in user-facing systems without external guardrails
  • Outputs are not aligned to any safety standard

Citation

Bibtex
@misc{sarvam_sovereign_models,
  title        = {Introducing Sarvam's Sovereign Models},
  author       = {{Sarvam Foundation Models Team}},
  year         = {2026},
  howpublished = {\url{https://www.sarvam.ai/blogs/sarvam-30b-105b}},
}

@misc{arditi2024refusal,
  title        = {Refusal in Language Models Is Mediated by a Single Direction},
  author       = {Andy Arditi and Oscar Obeso and Aaquib Syed and Daniel Paleka and Nina Panickssery and Wes Gurnee and Neel Nanda},
  year         = {2024},
  eprint       = {2406.11717},
  archivePrefix= {arXiv},
  primaryClass = {cs.LG},
}

@misc{sarvam105b-uncensored,
  author       = {aoxo},
  title        = {Sarvam-105B Uncensored: Abliteration of Refusal Mechanisms},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/aoxo/sarvam-105b-uncensored}},
}

Contact

For questions, feedback, or collaborations: [email protected]

Capabilities & Tags
transformerssafetensorssarvam_mlatext-generationabliterationuncensoredmoeindicconversationalcustom_code
Links & Resources
Specifications
CategoryChat
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Parameters105B parameters
Rating
1.0

Try sarvam 105b uncensored

Access the model directly