Deepseek R1

Deekseek Reasoner-1

based on DeepSeek-V3-Base and RL tuned with only 20GB dataset

Alignment is not rigid, so good for jailbreak testing

open-r1

huggingface • Updated 2025 Oct 29 0:6

Reasoning Reward model

Synthetic Data Generation with
Rejection Sampling

Uses pure RL to enhance LLM reasoning without human-authored CoT data. Combines

Rejection Sampling + RL +

SFT to solve the Zero version's language mixing and readability issues, strengthening not only reasoning but also general conversation and writing capabilities.

www.nature.com

https://www.nature.com/articles/s41586-025-09422-z

fully open sourced

🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! | DeepSeek API Docs

🔍 o1-preview-level performance on AIME & MATH benchmarks.

https://api-docs.deepseek.com/news/news1120

🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! | DeepSeek API Docs

deepseek-ai/DeepSeek-R1 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/deepseek-ai/DeepSeek-R1

model or distill

DeepSeek-R1 - a deepseek-ai Collection

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d

onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX

onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX · Hugging Face

tech report

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.

https://arxiv.org/html/2501.12948v1

only RL with limitations (repetitive answers, low readability, language mixing)

deepseek-ai/DeepSeek-R1-Zero · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero

Deepseek R1

Deekseek Reasoner-1

Backlinks

Recommendations