Deepseek R1

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jan 21 12:2
Editor
Edited
Edited
2025 Oct 24 1:5

Deekseek Reasoner-1

based on DeepSeek-V3-Base and RL tuned with only 20GB dataset
Alignment is not rigid, so good for jailbreak testing
 
 
Uses pure RL to enhance LLM reasoning without human-authored CoT data. Combines
Rejection Sampling
+ RL +
SFT
to solve the Zero version's language mixing and readability issues, strengthening not only reasoning but also general conversation and writing capabilities.
fully open sourced
model or distill
tech report
only RL with limitations (repetitive answers, low readability, language mixing)
 
 

Recommendations