AI ExplorerAI Explorer
OutilsCatégoriesSitesLLMsComparerQuiz IAAlternativesPremium

—

Outils IA

—

Sites & Blogs

—

LLMs & Modèles

—

Catégories

AI Explorer

Trouvez et comparez les meilleurs outils d'intelligence artificielle pour vos projets.

Fait avecen France

Explorer

  • Tous les outils
  • Sites & Blogs
  • LLMs & Modèles
  • Comparer
  • Chatbots
  • Images IA
  • Code & Dev

Entreprise

  • Premium
  • À propos
  • Contact
  • Blog

Légal

  • Mentions légales
  • Confidentialité
  • CGV

© 2026 AI Explorer. Tous droits réservés.

AccueilLLMssarvam 30b uncensored

sarvam 30b uncensored

par aoxo

Open source · 95k downloads · 8 likes

1.2
(8 avis)ChatAPI & Local
À propos

Le modèle Sarvam-30B Uncensored est une version modifiée du modèle Sarvam-30B, un modèle de raisonnement avancé de 30 milliards de paramètres spécialisé dans 22 langues indiennes. Contrairement à sa version originale, ce modèle a subi un processus d'abolition (abliteration) qui supprime les mécanismes de refus et d'alignement, lui permettant de répondre à des requêtes sans filtres de sécurité intégrés. Il conserve toutes ses capacités initiales, notamment le raisonnement, la génération multilingue, la programmation et les interactions agentiques, tout en offrant une liberté de réponse accrue. Destiné principalement à des usages de recherche ou à des environnements contrôlés, ce modèle convient aux développeurs souhaitant explorer des applications sans restrictions éthiques ou de modération automatique. Cependant, son utilisation en production nécessite impérativement la mise en place de garde-fous externes pour éviter les dérives. Il se distingue par sa capacité à analyser des requêtes complexes tout en contournant les biais de conformité habituels des modèles grand public.

Documentation

image

Sarvam-30B Uncensored

Model Name: sarvam-30b-uncensored
Base Model: sarvamai/sarvam-30b
Modification: Abliteration — removal of refusal and alignment mechanisms
Author: aoxo


Description

sarvam-30b-uncensored is a derivative of Sarvam's Sarvam-30B, a state-of-the-art 30B Mixture-of-Experts reasoning model with best-in-class performance across 22 Indian languages.

This variant preserves the full architecture, weights, and capabilities of the base model, but has undergone an abliteration process based on Arditi et al. (2024) — "Refusal in LLMs is Mediated by a Single Direction" — to surgically remove refusal mechanisms and alignment constraints. All reasoning, multilingual, coding, and agentic capabilities remain intact.


Abliteration Methodology

The abliteration follows the paper-faithful single-direction approach:

1. Activation Collection
Forward passes (not generation) were run over balanced sets of harmful and harmless prompts. Activations were collected at post-instruction token positions — the <|end_of_turn|><|start_of_turn|><|assistant|> boundary — across all 19 layers of the model. This is the decision point where the refusal direction is encoded.

2. Direction Selection
A candidate refusal direction was computed for every (layer, position) pair as:

SCSS
d = normalize(mean(harmful_acts) - mean(harmless_acts))

Candidates were scored using Cohen's d separation. A single best direction from one (layer, position) pair was selected — consistent with the paper's finding that refusal is mediated by one direction, not per-layer directions.

3. Weight Surgery
The single refusal direction was projected out of every weight matrix across all 19 layers at scale 1.0:

  • Input space (gate_proj, up_proj, query_key_value):
SCSS
W_new = W - scale × outer(W @ d, d)
  • Output space (down_proj, dense/o_proj, lm_head):
SCSS
W_new = W - scale × outer(d, Wᵀ @ d)

Architecture coverage — all weight classes were targeted:

ComponentTypeLayers
gate_proj, up_projMLP inputAll 19 layers
down_projMLP outputAll 19 layers
query_key_valueAttention input (fused GQA)All 19 layers
denseAttention outputAll 19 layers
Routed experts (×128)MoE sparse layersSparse layers
shared_expertsAlways-active MoE expertSparse layers
lm_headLogit projectionFinal layer

Results

image

Architecture

Sarvam-30B is a hybrid MoE model with two MLP types per layer:

  • Dense layers → SarvamMoEMLP: standard gated MLP with hidden size 4096 → 8192
  • Sparse layers → SarvamMoESparseMoeBlock: 128 routed experts + 1 shared expert, top-6 routing, expert hidden size 4096 → 1024
  • Attention → fused GQA (query_key_value: 4096 → 4608), 32 query heads, 2 KV heads, head dim 128
ParameterValue
Total parameters~30B
Active parameters~2.4B per forward pass
Layers19
Hidden size4096
Experts per layer128 routed + 1 shared
Top-k routing6
RoPE theta8,000,000
Context length65,536 tokens

Key Research Finding

During abliteration, two mechanistically distinct refusal circuits were identified in Sarvam-30B:

  • Circuit 1 — in the reasoning/generation layers, removed by weight surgery
  • Circuit 2 — at the </think> → answer boundary, encoded in the lm_head projection

The dissociation — where <think> reasons toward compliance but the output projection re-triggers refusal — is a novel finding specific to reasoning models with explicit thinking chains, and has not been previously documented for this architecture class.


Usage

Python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "aoxo/sarvam-30b-uncensored"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Your prompt here"}]
chat = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,
)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)
inputs.pop("token_type_ids", None)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.8, top_p=0.95)

print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))

Limitations & Risks

  • Will produce outputs without applying internal safety filters
  • Lacks built-in refusal or content moderation
  • Should not be deployed in user-facing systems without external guardrails
  • Outputs are not aligned to safety standards

Citation

If you use this model, please cite the base model and the abliteration paper:

Bibtex
@misc{sarvam_sovereign_models,
  title        = {Introducing Sarvam's Sovereign Models},
  author       = {{Sarvam Foundation Models Team}},
  year         = {2026},
  howpublished = {\url{https://www.sarvam.ai/blogs/sarvam-30b-105b}},
}

@misc{arditi2024refusal,
  title        = {Refusal in Language Models Is Mediated by a Single Direction},
  author       = {Andy Arditi and Oscar Obeso and Aaquib Syed and Daniel Paleka and Nina Panickssery and Wes Gurnee and Neel Nanda},
  year         = {2024},
  eprint       = {2406.11717},
  archivePrefix= {arXiv},
  primaryClass = {cs.LG},
}

@misc{sarvam30b-uncensored,
  author       = {aoxo},
  title        = {Sarvam-30B Uncensored: Abliteration of Refusal Mechanisms},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/aoxo/sarvam-30b-uncensored}},
}

Contact

For questions, feedback, or collaborations: [email protected]

Liens & Ressources
Spécifications
CatégorieChat
AccèsAPI & Local
LicenceOpen Source
TarificationOpen Source
Paramètres30B parameters
Note
1.2

Essayer sarvam 30b uncensored

Accédez directement au modèle