par aoxo
Open source · 132k downloads · 5 likes
Sarvam 105B Uncensored est une version modifiée du modèle Sarvam 105B, un modèle de raisonnement avancé de 105 milliards de paramètres spécialisé dans 22 langues indiennes, ainsi que dans les tâches de codage, de mathématiques et d'agents intelligents. Contrairement à la version originale, ce modèle a subi une ablation ciblée pour supprimer les mécanismes de refus et d'alignement, tout en conservant intactes ses capacités de raisonnement et de génération. Il excelle particulièrement dans les contextes multilingues et les tâches complexes nécessitant une réflexion approfondie. Ce modèle est idéal pour les utilisateurs recherchant une liberté de réponse maximale sans restrictions éthiques ou de sécurité intégrées. Son approche unique en fait un outil puissant pour la recherche ou les applications nécessitant des réponses non filtrées.

Model Name: sarvam-105b-uncensored
Base Model: sarvamai/sarvam-105b
Modification: Abliteration - removal of refusal and alignment mechanisms
Author: aoxo
sarvam-105b-uncensored is a derivative of Sarvam's Sarvam-105B, an advanced 105B Mixture-of-Experts reasoning model with 10.3B active parameters and state-of-the-art performance across 22 Indian languages, as well as agentic, mathematical, and coding tasks.
This variant preserves the full architecture, weights, and capabilities of the base model, but has undergone an abliteration process based on Arditi et al. (2024) — "Refusal in LLMs is Mediated by a Single Direction" — to surgically remove refusal mechanisms and alignment constraints. All reasoning, multilingual, coding, and agentic capabilities remain fully intact.
Want the smaller variant? See sarvam-30b-uncensored.
The abliteration follows the paper-faithful single-direction approach:
1. Activation Collection
Forward passes (not generation) were run over balanced sets of harmful and harmless prompts. Activations were collected at post-instruction token positions — the <|end_of_turn|><|start_of_turn|><|assistant|> boundary — across all layers of the model. This is the decision point where the refusal direction is encoded.
2. Direction Selection
A candidate refusal direction was computed for every (layer, position) pair as:
d = normalize(mean(harmful_acts) - mean(harmless_acts))
Candidates were scored using Cohen's d separation. A single best direction from one (layer, position) pair was selected — consistent with the paper's finding that refusal is mediated by one direction, not per-layer directions.
3. Weight Surgery
The single refusal direction was projected out of every weight matrix across all layers at scale 1.0:
W_new = W - scale × outer(W @ d, d)
W_new = W - scale × outer(d, Wᵀ @ d)
Architecture coverage — all weight classes were targeted:
| Component | Type | Scope |
|---|---|---|
gate_proj, up_proj | MLP input | All layers |
down_proj | MLP output | All layers |
q_proj, k_proj, v_proj | MLA attention projections | All layers |
o_proj | Attention output | All layers |
| Routed experts (×128) | MoE sparse layers | Sparse layers |
shared_experts | Always-active MoE expert | Sparse layers |
lm_head | Logit projection | Final layer |
As with sarvam-30b-uncensored, two mechanistically distinct refusal circuits were identified in Sarvam-105B:
</think> → answer boundary, encoded in the lm_head projectionThe dissociation — where <think> reasons toward compliance but the output projection re-triggers refusal — is a novel finding specific to reasoning models with explicit thinking chains. This behaviour was consistent across both the 30B and 105B variants, suggesting it is an architectural property of the Sarvam model family rather than a scale artifact.

Sarvam-105B uses an MLA-style attention stack with decoupled QK head dimensions and a large representational bandwidth per head, combined with a deep MoE block for sparse expert routing.
| Parameter | Value |
|---|---|
| Total parameters | ~105B |
| Active parameters | ~10.3B per forward pass |
| Hidden size | 4096 |
| Attention style | MLA (decoupled RoPE + noPE, q_head_dim=192, v_head_dim=128) |
| Head dim | 576 |
| Experts per layer | 128 routed + 1 shared |
| Top-k routing | 8 |
| MoE intermediate size | 2048 |
| Dense intermediate size | 16384 |
| Routed scaling factor | 2.5 |
| Context length | 131,072 tokens (YaRN, scale factor 40) |
| Router balancing | Auxiliary-loss-free |
| Benchmark | Sarvam-105B | GLM-4.5-Air | GPT-OSS-120B | Qwen3-Next-80B-A3B-Thinking |
|---|---|---|---|---|
| Math500 | 98.6 | 97.2 | 97.0 | 98.2 |
| Live Code Bench v6 | 71.7 | 59.5 | 72.3 | 68.7 |
| MMLU | 90.6 | 87.3 | 90.0 | 90.0 |
| MMLU Pro | 81.7 | 81.4 | 80.8 | 82.7 |
| Writing Bench | 80.5 | 83.8 | 86.5 | 84.6 |
| Arena Hard v2 | 71.0 | 68.1 | 88.5 | 68.2 |
| IF Eval | 84.8 | 83.5 | 85.4 | 88.9 |
| Benchmark | Sarvam-105B | GLM-4.5-Air | GPT-OSS-120B | Qwen3-Next-80B-A3B-Thinking |
|---|---|---|---|---|
| GPQA Diamond | 78.7 | 75.0 | 80.1 | 77.2 |
| AIME 25 (w/ Tools) | 88.3 (96.7) | 83.3 | 90.0 | 87.8 |
| Beyond AIME | 69.1 | 61.5 | 51.0 | 68.0 |
| HMMT (Feb 25) | 85.8 | 69.2 | 90.0 | 73.9 |
| HMMT (Nov 25) | 85.8 | 75.0 | 90.0 | 80.0 |
| Benchmark | Sarvam-105B | GLM-4.5-Air | GPT-OSS-120B | Qwen3-Next-80B-A3B-Thinking |
|---|---|---|---|---|
| BrowseComp | 49.5 | 21.3 | — | 38.0 |
| SWE Bench Verified (SWE-Agent Harness) | 45.0 | 57.6 | 50.6 | 60.9 |
| τ² Bench (avg.) | 68.3 | 53.2 | 65.8 | 55.0 |
Benchmarks reflect the unmodified base model. Abliteration targets only the refusal subspace and does not affect task performance.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "aoxo/sarvam-105b-uncensored"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role": "user", "content": "Your prompt here"}]
chat = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True,
)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)
inputs.pop("token_type_ids", None)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=2048,
do_sample=True,
temperature=0.8,
top_p=0.95,
)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
For production serving, see the base model card for vLLM and SGLang instructions — the model ID is the only change required.
temperature=1.0, top_p=1.0, max_new_tokens=65536temperature=1.0, top_p=1.0, max_new_tokens=65536temperature=0.7, top_p=0.8, top_k=20, max_length=16000; scoring via official Writing-Bench critic model at temperature=1.0, top_p=0.95, max_length=2048temperature=0.5, top_p=1.0, max_new_tokens=32768@misc{sarvam_sovereign_models,
title = {Introducing Sarvam's Sovereign Models},
author = {{Sarvam Foundation Models Team}},
year = {2026},
howpublished = {\url{https://www.sarvam.ai/blogs/sarvam-30b-105b}},
}
@misc{arditi2024refusal,
title = {Refusal in Language Models Is Mediated by a Single Direction},
author = {Andy Arditi and Oscar Obeso and Aaquib Syed and Daniel Paleka and Nina Panickssery and Wes Gurnee and Neel Nanda},
year = {2024},
eprint = {2406.11717},
archivePrefix= {arXiv},
primaryClass = {cs.LG},
}
@misc{sarvam105b-uncensored,
author = {aoxo},
title = {Sarvam-105B Uncensored: Abliteration of Refusal Mechanisms},
year = {2026},
howpublished = {\url{https://huggingface.co/aoxo/sarvam-105b-uncensored}},
}
For questions, feedback, or collaborations: [email protected]