by Jackrong
Open source · 304k downloads · 102 likes
The Qwen3.5 4B Claude 4.6 Opus Reasoning Distilled model is a refined version of the Qwen3.5-4B, specifically designed to excel in structured reasoning and complex analytical tasks. Through advanced distillation of the reasoning capabilities of Claude 4.6 Opus, it adopts a methodical approach by breaking down problems into clear logical steps, encapsulated in `<think>` tags, before delivering precise and nuanced responses. Its targeted training on diverse reasoning datasets—spanning fields like science, mathematics, and instruction-following—enables it to process queries with heightened efficiency and depth while minimizing redundancies in its thought process. This model stands out for its ability to balance performance with accessibility, offering a robust solution for applications requiring detailed analysis and structured problem-solving. Ideal for developers, researchers, or users seeking to integrate high-quality reasoning into their projects, it serves as a versatile tool for use cases ranging from technical assistance to advanced education.
🔥 Update (April 5): I’ve released the complete training notebook, codebase, and a comprehensive PDF guide to help beginners and enthusiasts understand and reproduce this model's fine-tuning process.
❤️ Special thanks to the Unsloth open-source library and @KyleHessling1 for their support.
👉 GitHub Repository: Jackrong-llm-finetuning-guide Visit the repo to dive into the codebase and reproduce the results locally or on Colab.
🔗 Qwopus3.5-27b Complete Fine-Tuning Guide (PDF)
A Note: My goal isn't just to detail a workflow, but to demystify LLM training. Beyond the social media hype, fine-tuning isn't an unattainable ritual—often, all you need is a Google account, a standard laptop, and relentless curiosity.
No one starts as an expert, but every expert was once brave enough to begin.
All training and testing for this project were self-funded. If you find this model or guide helpful, a Star ⭐️ on GitHub would be the greatest encouragement. Thank you! 🙏
[!Note] The Claude series model optimizations are named under the Qwopus3.5 series, with the latest version being 🌟Qwopus3.5-v3.
Update: This model has been further enhanced with additional reasoning data distilled from Qwen3.5-27B.
The new training data introduces higher-quality reasoning trajectories across domains such as science, instruction-following, and mathematics.
Part of the data comes from Jackrong/Qwen3.5-reasoning-700x, a curated dataset designed to improve structured step-by-step reasoning and reasoning diversity.

Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled is a highly capable reasoning model fine-tuned on top of the Qwen3.5-4B dense architecture. The model's core directive is to leverage state-of-the-art Chain-of-Thought (CoT) distillation primarily sourced from Claude-4.6 Opus interactions.
Through Supervised Fine-Tuning (SFT) focusing specifically on structured reasoning logic, this model excels in breaking down complex user problems, planning step-by-step methodologies within strictly formatted <think> tags, and ultimately delivering precise, nuanced solutions.
Base Model (Qwen3.5-4B)
│
▼
Supervised Fine-Tuning (SFT) + LoRA
(Response-Only Training masked on "<|im_start|>assistant\n<think>")
│
▼
Final Model Text Only (Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled)
The model includes targeted optimizations addressing Qwen3.5’s tendency toward excessive transitional or repetitive reasoning on simple queries. Through deep distillation and structural imitation of Claude-4.6-Opus reasoning chains, the model adopts a more efficient structured thinking pattern:
“Let me analyze this request carefully: 1..2..3...”.
This streamlined reasoning paradigm significantly reduces redundant cognitive loops while preserving deep analytical capacity, resulting in substantially improved inference efficiency.
Let me analyze this request carefully:
1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
.
.
.
train_on_responses_only strategy, masking instructions so the loss is purely calculated over the generation of the <think> sequences and the subsequent solutions.<think> {internal reasoning} </think>\n {final answer}.The training loss showed a strong and healthy downward trend throughout the run, demonstrating effective knowledge distillation. Starting from an initial loss of 0.74356, the model converged steadily to a final loss of 0.23984 — indicating the model successfully internalized the structured <think> reasoning patterns from the Claude 4.6 Opus teacher data.
The dataset consists of high-quality, filtered reasoning distillation data:
| Dataset Name | Description / Purpose |
|---|---|
| nohurry/Opus-4.6-Reasoning-3000x-filtered | Provides comprehensive Claude 4.6 Opus reasoning trajectories. |
| TeichAI/claude-4.5-opus-high-reasoning-250x | Injecting high-intensity, structured reasoning instances. |
| Jackrong/Qwen3.5-reasoning-700x | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |
| Benchmark | Baseline (4B) | Distilled (4B) |
|---|---|---|
| GPQA Diamond (0-shot) | 33.82 | 38.88 |
| AI2 ARC-Challenge (25-shot) | 64.59 | 66.38 |
These evaluation results were originally reported by khitsly.
Evaluation was conducted using the EleutherAI lm-evaluation with 8-bit inference and temperature 0. Higher scores indicate better performance.
<think> block sequentially rather than exploratory "trial-and-error" self-doubt.Significant thanks to the Unsloth AI team for making rapid fine-tuning of large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets (nohurry and TeichAI).