Deepseek R1

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jan 21 12:2
Editor
Edited
Edited
2025 Oct 24 1:5

Deekseek Reasoner-1

based on DeepSeek-V3-Base and RL tuned with only 20GB dataset
Alignment is not rigid, so good for jailbreak testing
 
 
Uses pure RL to enhance LLM reasoning without human-authored CoT data. Combines
Rejection Sampling
+ RL +
SFT
to solve the Zero version's language mixing and readability issues, strengthening not only reasoning but also general conversation and writing capabilities.
www.nature.com
fully open sourced
🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! | DeepSeek API Docs
🔍 o1-preview-level performance on AIME & MATH benchmarks.
🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! | DeepSeek API Docs
deepseek-ai/DeepSeek-R1 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
deepseek-ai/DeepSeek-R1 · Hugging Face
model or distill
DeepSeek-R1 - a deepseek-ai Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
DeepSeek-R1 - a deepseek-ai Collection
onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX · Hugging Face
tech report
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
only RL with limitations (repetitive answers, low readability, language mixing)
deepseek-ai/DeepSeek-R1-Zero · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
deepseek-ai/DeepSeek-R1-Zero · Hugging Face
 
 

Recommendations