Paper link: https://arxiv.org/abs/2501.12948
By : https://x.com/vinodcodes
<aside>
🔖 We’ll Try to understand the DeepSeek paper over the top, learning some terminologies.
PS: This is my first blog so if you get any misinterpretations or so please let me know I have mentioned my twitter profile here. I hope I’ll improve more after writing more such blogs. So yeah Let’s get it. 💪🏻
</aside>
This aged well didn't it ?

Let’s set the context first:
Overview of DeepSeek R1-zero:

We’ll now look at what the abstract talks about
- Introduces it’s first gen reasoning models R1-zero and R1.
- R1-zero is trained using large-scale Reinforcement Learning.
- But R1-zero has its limitations such as poor readability and language mixing.
- To overcome that they release R1 which is trained at multiple stages along with a cold start data
Introduction:
- Gives shout out to other LLMs
- Highlights importance of **Post Training methods(**quantization,pruning,finetuning,distillation.. etc)
- OpenAI with their o1 model had introduced inference time scaling ie. taking time to think first and then answer it. This was a major reason why o1 had such good performance, so in technical terms its called generating long Chain of Thoughts( CoTs) .