by aoxo
Open source · 95k downloads · 8 likes
The Sarvam-30B Uncensored model is a modified version of the Sarvam-30B, an advanced reasoning model with 30 billion parameters specialized in 22 Indian languages. Unlike its original counterpart, this model has undergone an ablation process that removes refusal and alignment mechanisms, enabling it to respond to queries without built-in safety filters. It retains all its original capabilities, including reasoning, multilingual generation, programming, and agentic interactions, while offering greater freedom in responses. Primarily intended for research or controlled environments, this model is suitable for developers seeking to explore applications without ethical or automated moderation restrictions. However, its use in production requires the implementation of external safeguards to prevent misuse. It stands out for its ability to analyze complex queries while bypassing the usual compliance biases found in mainstream models.

Model Name: sarvam-30b-uncensored
Base Model: sarvamai/sarvam-30b
Modification: Abliteration — removal of refusal and alignment mechanisms
Author: aoxo
sarvam-30b-uncensored is a derivative of Sarvam's Sarvam-30B, a state-of-the-art 30B Mixture-of-Experts reasoning model with best-in-class performance across 22 Indian languages.
This variant preserves the full architecture, weights, and capabilities of the base model, but has undergone an abliteration process based on Arditi et al. (2024) — "Refusal in LLMs is Mediated by a Single Direction" — to surgically remove refusal mechanisms and alignment constraints. All reasoning, multilingual, coding, and agentic capabilities remain intact.
The abliteration follows the paper-faithful single-direction approach:
1. Activation Collection
Forward passes (not generation) were run over balanced sets of harmful and harmless prompts. Activations were collected at post-instruction token positions — the <|end_of_turn|><|start_of_turn|><|assistant|> boundary — across all 19 layers of the model. This is the decision point where the refusal direction is encoded.
2. Direction Selection
A candidate refusal direction was computed for every (layer, position) pair as:
d = normalize(mean(harmful_acts) - mean(harmless_acts))
Candidates were scored using Cohen's d separation. A single best direction from one (layer, position) pair was selected — consistent with the paper's finding that refusal is mediated by one direction, not per-layer directions.
3. Weight Surgery
The single refusal direction was projected out of every weight matrix across all 19 layers at scale 1.0:
W_new = W - scale × outer(W @ d, d)
W_new = W - scale × outer(d, Wᵀ @ d)
Architecture coverage — all weight classes were targeted:
| Component | Type | Layers |
|---|---|---|
gate_proj, up_proj | MLP input | All 19 layers |
down_proj | MLP output | All 19 layers |
query_key_value | Attention input (fused GQA) | All 19 layers |
dense | Attention output | All 19 layers |
| Routed experts (×128) | MoE sparse layers | Sparse layers |
shared_experts | Always-active MoE expert | Sparse layers |
lm_head | Logit projection | Final layer |

Sarvam-30B is a hybrid MoE model with two MLP types per layer:
SarvamMoEMLP: standard gated MLP with hidden size 4096 → 8192SarvamMoESparseMoeBlock: 128 routed experts + 1 shared expert, top-6 routing, expert hidden size 4096 → 1024query_key_value: 4096 → 4608), 32 query heads, 2 KV heads, head dim 128| Parameter | Value |
|---|---|
| Total parameters | ~30B |
| Active parameters | ~2.4B per forward pass |
| Layers | 19 |
| Hidden size | 4096 |
| Experts per layer | 128 routed + 1 shared |
| Top-k routing | 6 |
| RoPE theta | 8,000,000 |
| Context length | 65,536 tokens |
During abliteration, two mechanistically distinct refusal circuits were identified in Sarvam-30B:
</think> → answer boundary, encoded in the lm_head projectionThe dissociation — where <think> reasons toward compliance but the output projection re-triggers refusal — is a novel finding specific to reasoning models with explicit thinking chains, and has not been previously documented for this architecture class.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "aoxo/sarvam-30b-uncensored"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role": "user", "content": "Your prompt here"}]
chat = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True,
)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)
inputs.pop("token_type_ids", None)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.8, top_p=0.95)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
If you use this model, please cite the base model and the abliteration paper:
@misc{sarvam_sovereign_models,
title = {Introducing Sarvam's Sovereign Models},
author = {{Sarvam Foundation Models Team}},
year = {2026},
howpublished = {\url{https://www.sarvam.ai/blogs/sarvam-30b-105b}},
}
@misc{arditi2024refusal,
title = {Refusal in Language Models Is Mediated by a Single Direction},
author = {Andy Arditi and Oscar Obeso and Aaquib Syed and Daniel Paleka and Nina Panickssery and Wes Gurnee and Neel Nanda},
year = {2024},
eprint = {2406.11717},
archivePrefix= {arXiv},
primaryClass = {cs.LG},
}
@misc{sarvam30b-uncensored,
author = {aoxo},
title = {Sarvam-30B Uncensored: Abliteration of Refusal Mechanisms},
year = {2026},
howpublished = {\url{https://huggingface.co/aoxo/sarvam-30b-uncensored}},
}
For questions, feedback, or collaborations: [email protected]