Deekseek Reasoner-1
based on DeepSeek-V3-Base and RL tuned with only 20GB dataset
Alignment is not rigid, so good for jailbreak testing
Uses pure RL to enhance LLM reasoning without human-authored CoT data. Combines Rejection Sampling + RL + SFT to solve the Zero version's language mixing and readability issues, strengthening not only reasoning but also general conversation and writing capabilities.
fully open sourced
🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! | DeepSeek API Docs
🔍 o1-preview-level performance on AIME & MATH benchmarks.
https://api-docs.deepseek.com/news/news1120

deepseek-ai/DeepSeek-R1 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/deepseek-ai/DeepSeek-R1
model or distill
DeepSeek-R1 - a deepseek-ai Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX
tech report
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.
Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors.
However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance,
we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.
DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks.
To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
https://arxiv.org/html/2501.12948v1
only RL with limitations (repetitive answers, low readability, language mixing)
deepseek-ai/DeepSeek-R1-Zero · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero

Seonglae Cho