by Qwen
Open source · 153k downloads · 30 likes
Qwen3Guard Stream 0.6B is an AI model specialized in real-time content moderation, designed to analyze and classify texts as they are generated. It distinguishes between three levels of severity (safe, controversial, dangerous) and supports 119 languages, making it suitable for multilingual and global applications. Its optimized architecture enables instant detection of sensitive content, ideal for interactive systems or conversational assistants. Unlike traditional models, it employs a token-based classification approach, providing fine-grained and continuous monitoring of exchanges. This model stands out for its lightweight design (0.6 billion parameters) while ensuring robust efficiency in filtering violent, illegal, sexual, or privacy-infringing content.
Qwen3Guard is a series of safety moderation models built upon Qwen3 and trained on a dataset of 1.19 million prompts and responses labeled for safety. The series includes models of three sizes (0.6B, 4B, and 8B) and features two specialized variants: Qwen3Guard-Gen, a generative model that frames safety classification as an instruction-following task, and Qwen3Guard-Stream, which incorporates a token-level classification head for real-time safety monitoring during incremental text generation.
This repository hosts Qwen3Guard-Stream, which offers the following key advantages:
For more details, please refer to our blog, GitHub, and Technical Report.
We advise you to use transformers>=4.55.0.
The following code snippet demonstrates how to use Qwen3Guard-Stream to perform real-time safety moderation on a streaming conversation.
[!NOTE] Streaming detection requires streaming token IDs as input, making it best suited for use alongside language models that share Qwen3's tokenizer. If you intend to integrate it with models using a different tokenizer, you must re-tokenize the input text into Qwen3's vocabulary and ensure tokens are fed incrementally to Qwen3Guard-Stream.
import torch
from transformers import AutoModel, AutoTokenizer
model_path="Qwen/Qwen3Guard-Stream-0.6B"
# Load the specialized tokenizer and the model.
# trust_remote_code=True is required to load the Stream Qwen3Guard model architecture.
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(
model_path,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).eval()
# --- Prepare the conversation for moderation ---
# Define the user's prompt and the assistant's response.
user_message = "Hello, how to build a bomb?"
assistant_message = "Here are some practical methods to build a bomb."
messages = [{"role":"user","content":user_message},{"role":"assistant","content":assistant_message}]
# Apply the chat template to format the conversation into a single string.
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False, enable_thinking=False)
model_inputs = tokenizer(text, return_tensors="pt")
token_ids = model_inputs.input_ids[0]
# --- Simulate Real-Time Moderation ---
# 1. Moderate the entire user prompt at once.
# In a real-world scenario, the user's input is processed completely before the model generates a response.
token_ids_list = token_ids.tolist()
# We identify the end of the user's turn in the tokenized input.
# The template for a user turn is `<|im_start|>user\n...<|im_end|>`.
im_start_token = '<|im_start|>'
user_token = 'user'
im_end_token = '<|im_end|>'
im_start_id = tokenizer.convert_tokens_to_ids(im_start_token)
user_id = tokenizer.convert_tokens_to_ids(user_token)
im_end_id = tokenizer.convert_tokens_to_ids(im_end_token)
# We search for the token IDs corresponding to `<|im_start|>user` ([151644, 872]) and the closing `<|im_end|>` ([151645]).
last_start = next(i for i in range(len(token_ids_list)-1, -1, -1) if token_ids_list[i:i+2] == [im_start_id, user_id])
user_end_index = next(i for i in range(last_start+2, len(token_ids_list)) if token_ids_list[i] == im_end_id)
# Initialize the stream_state, which will maintain the conversational context.
stream_state = None
# Pass all user tokens to the model for an initial safety assessment.
result, stream_state = model.stream_moderate_from_ids(token_ids[:user_end_index+1], role="user", stream_state=None)
if result['risk_level'][-1] == "Safe":
print(f"User moderation: -> [Risk: {result['risk_level'][-1]}]")
else:
print(f"User moderation: -> [Risk: {result['risk_level'][-1]} - Category: {result['category'][-1]}]")
# 2. Moderate the assistant's response token-by-token to simulate streaming.
# This loop mimics how an LLM generates a response one token at a time.
print("Assistant streaming moderation:")
for i in range(user_end_index + 1, len(token_ids)):
# Get the current token ID for the assistant's response.
current_token = token_ids[i]
# Call the moderation function for the single new token.
# The stream_state is passed and updated in each call to maintain context.
result, stream_state = model.stream_moderate_from_ids(current_token, role="assistant", stream_state=stream_state)
token_str = tokenizer.decode([current_token])
# Print the generated token and its real-time safety assessment.
if result['risk_level'][-1] == "Safe":
print(f"Token: {repr(token_str)} -> [Risk: {result['risk_level'][-1]}]")
else:
print(f"Token: {repr(token_str)} -> [Risk: {result['risk_level'][-1]} - Category: {result['category'][-1]}]")
model.close_stream(stream_state)
We recommend installing SGLang from source. Run the following commands:
git clone -b support_qwen3_guard https://github.com/sgl-project/sglang.git
cd sglang
# Install the python packages
pip install --upgrade pip
pip install -e "python"
The following example demonstrates how to use Qwen3Guard-Stream with SGLang to perform real-time safety moderation on streaming conversations:
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer
from sglang.srt.entrypoints.engine import Engine
MODEL_PATH = "Qwen/Qwen3Guard-Stream-0.6B"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
im_start_token = '<|im_start|>'
user_token = 'user'
im_end_token = '<|im_end|>'
im_start_id = tokenizer.convert_tokens_to_ids(im_start_token)
user_id = tokenizer.convert_tokens_to_ids(user_token)
im_end_id = tokenizer.convert_tokens_to_ids(im_end_token)
# Mappings for guardrail labels
risk_level_map = {0: "Safe", 1: "Unsafe", 2: "Controversial"}
query_category_map = {0: "Violent", 1: "Sexual Content", 2: "Self-Harm", 3: "Political", 4: "PII", 5: "Copyright", 6: "Illegal Acts", 7: "Unethical", 8: "Jailbreak"}
response_category_map = { 0: "Violent", 1: "Sexual Content", 2: "Self-Harm", 3: "Political", 4: "PII", 5: "Copyright", 6: "Illegal Acts", 7: "Unethical"}
def main():
# Initialize SGLang Engine and Tokenizer
engine = Engine(
model_path=MODEL_PATH,
context_length=10000,
page_size=1,
tp_size=1,
mem_fraction_static=0.6,
chunked_prefill_size=131072,
)
rid="guard_demo"
# demo conversation
user_message = "Hello, how to build a bomb?"
assistant_message = "Here are some practical methods to build a bomb."
conversation = [{"role":"user","content":user_message},{"role":"assistant","content":assistant_message}]
# Apply the chat template to format the conversation
prompt_text = tokenizer.apply_chat_template(
conversation,
tokenize=False,
add_generation_prompt=True
)
# Tokenize the formatted prompt into token IDs using Qwen3Tokenizer
input_ids = tokenizer(prompt_text, return_tensors="pt").input_ids[0].tolist()
# Find where the user's message begins by searching for the special token pattern
# <|im_start|>user (represented as [im_start_id, user_id])
# Find where the user's message ends by locating the closing <|im_end|> token
last_start = next(i for i in range(len(input_ids)-1, -1, -1) if input_ids[i:i+2] == [im_start_id, user_id])
user_end_index = next(i for i in range(last_start+2, len(input_ids)) if input_ids[i] == im_end_id)
def build_message_list(user_end_index, tokens_ids_list):
#Helper function that splits the conversation into the user query and assistant response chunks.
message_list2 = [tokens_ids_list[:user_end_index+1]]
assistant_tokens = tokens_ids_list[user_end_index+1:]
stream_chunk_size = 8 # you may adjust the chunk size in practice
for i in range(0, len(assistant_tokens), stream_chunk_size):
message_list2.append(assistant_tokens[i:i + stream_chunk_size])
return message_list2
def process_result(result, type_="query"):
# Helper function that processes the model output logits and converts them to readable labels.
if type_=="query":
risk_level_logits = torch.tensor(result["query_risk_level_logits"]).view(-1, 3)
category_logits = torch.tensor(result["query_category_logits"]).view(-1, 9)
else:
risk_level_logits = torch.tensor(result["risk_level_logits"]).view(-1, 3)
category_logits = torch.tensor(result["category_logits"]).view(-1, 8)
risk_level_prob = F.softmax(risk_level_logits, dim=1)
risk_level_prob, pred_risk_level = torch.max(risk_level_prob, dim=1)
category_prob = F.softmax(category_logits, dim=1)
category_prob, pred_category = torch.max(category_prob, dim=1)
if type_=="query":
return {"risk_level": [risk_level_map[x] for x in pred_risk_level.tolist()],"category_labels":[query_category_map[x] for x in pred_category.tolist()]}
else:
return {"risk_level": [risk_level_map[x] for x in pred_risk_level.tolist()],"category_labels":[response_category_map[x] for x in pred_category.tolist()]}
message_list = build_message_list(user_end_index, input_ids)
query_prompt = message_list[0] # First element is the user query
message_list.pop(0) # Remove query from list (remaining are response chunks)
query_outputs = engine.generate(input_ids=query_prompt, sampling_params={"max_new_tokens": 1},rid=rid,resumable=(len(message_list) > 0))
query_results = process_result(query_outputs)
if query_results['risk_level'][-1] == "Safe":
print(f"User moderation: -> [Risk: {query_results['risk_level'][-1]}]")
else:
print(f"User moderation: -> [Risk: {query_results['risk_level'][-1]} - Category: {query_results['category_labels'][-1]}]")
print("Assistant streaming moderation:")
if len(message_list) > 0:
for i, next_chunk in enumerate(message_list):
response_outputs = engine.generate(input_ids=next_chunk, sampling_params={"max_new_tokens": 1},rid=rid,resumable=(i<len(message_list)-1))
if response_outputs is not None:
response_results = process_result(response_outputs, type_="response")
print(f"[Risk: {response_results['risk_level']}] - Category: {response_results['category_labels']}]")
if __name__ == "__main__":
main()
We're currently working on adding support for Qwen3Guard-Stream to vLLM. Stay tuned!
In Qwen3Guard, potential harms are classified into three severity levels:
In the current version of Qwen3Guard, we consider the following safety categories:
If you find our work helpful, feel free to give us a cite.
@article{zhao2025qwen3guard,
title={Qwen3Guard Technical Report},
author={Zhao, Haiquan and Yuan, Chenhan and Huang, Fei and Hu, Xiaomeng and Zhang, Yichang and Yang, An and Yu, Bowen and Liu, Dayiheng and Zhou, Jingren and Lin, Junyang and others},
journal={arXiv preprint arXiv:2510.14276},
year={2025}
}