TheaterFire

DeepSeek-R1’s paper was updated 2 days ago, expanding from 22 pages to 86 pages and adding a substantial amount of detail.

Posted by Nunki08@reddit | LocalLLaMA | View on Reddit | 55 comments

arXiv:2501.12948 \[cs.CL\]: https://arxiv.org/abs/2501.12948

Reply to Post

55 Comments

redditonc3again@reddit

This should significantly benefit Open-R1 right? Is that project still active?
View on Reddit #75715014

ResidentPositive4122@reddit

New arch about to drop? dsv4 + r2? Packing all the goodies learned from last year. Hopefully they try smaller sizes as well. Would be interesting to see how the arch improvements work at several sizes.
View on Reddit #75256651

swaglord1k@reddit

that would make sense. we'll finally see whether deepseek was a one trick pony or if they'll managed to drop another market-disrupting model
View on Reddit #75258742

ForsookComparison@reddit

R1-0528 and V3.2 were both equally SOTA-threatening especially when cost comes into play. The big shocker was *"you can do this without being a USA-bases hyperscaler!?"*
View on Reddit #75259747

Zestyclose839@reddit

I can 100% see a lab dropping something market-disrupting with a new architecture. World models would be a big one, letting an LLM imagine how something would play out in the real world instead of just guessing based on what's described in its training text. JEPA (joint embedding prediction architecture) is a great example, as it gets the model thinking in abstract concepts. Research labs are better able to "move fast and break things" in this way, as there aren't many commercial consequences for a failed model. Consider the success of DeepMind's releases vs. Bard, for instance. If DeepSeek were to drop an LLM with a world model and real-world use case, they could absolutely sweep the competition.
View on Reddit #75557675

espadrine@reddit

>especially when cost comes into play To emphasize this point, look at this graph from https://metabench.organisons.com/. There is one dot in the middle that stands out. https://preview.redd.it/n7mwph9w8ybg1.png?width=1598&format=png&auto=webp&s=f7ce556084b0fa4cb2282aac6e2b3f78a8c79558
View on Reddit #75276511

sdkgierjgioperjki0@reddit

That website has some wrong information. It doesn't correctly account for caching for some models while using caching price for others. Also it has exact parameter counts for Gemini 3 which isn't publicly known, and the size they provide is way too small to be plausible.
View on Reddit #75294186

espadrine@reddit

Unknown values are guessed from known values, so some information is uncertain (with uncertainty estimated in the table tab), but overall it gives a good picture IMO.
View on Reddit #75309551

sdkgierjgioperjki0@reddit

The pricing is still very wrong, Kimi for example have its price calculated based on cache hit while Deepseek 3.2 is using their cache miss price.
View on Reddit #75309639

espadrine@reddit

You're right, wrong data input. Fixed!
View on Reddit #75361308

No_Afternoon_4260@reddit

Idk but if they have the balls to release a 100-200B dense model that would be a revolution to me at least x)
View on Reddit #75278103

Bakoro@reddit

To a smaller extent, we already had a moment like that with HRM/TRM models, it just didn't crash the stock market because they aren't LLMs. The small recursive models proved that small, task specific models were viable in new way, and that's revived a whole branch of inquiry. Ultimately scale is still king, everyone know that DeepSeek has a ton of compute, even if it's not all the top of the line stuff. Research has been making some big moves in more disciplined architecture and training, so, I think we'll have at least one more "oh shit" moment where someone finds a way to make models more sample efficient. My money is on truly multimodal models providing highly structured latent spaces, and moving past raw transformers as a linear sequence of layers, and instead having multiple modules with cross attention. The problem with text-only models is that they have to reconstruct everything about the world, purely from the relationships between words. There is no ground truth beyond the adjacency and frequency of text. We've seen that Reinforcement Learning with Verifiable Rewards helps define the relationships in a more concrete way, and RL can refine the knowledge a pretrained model already has. There's a bunch of research into the geometric representations models make, and the manifolds of data, which tend to be low dimensional. My intuition is that, by anchoring latent dimensions with multimodal biases, we'll get models that can learn with a few examples, rather than thousands, after the initial structure is defined. That'll be the big "oh shit" moment where we'll be able to take a base model and do serious training on single GPUs, and be able to do extensive fine-tuning with minimal risk of catastrophic forgetting.
View on Reddit #75268121

swaglord1k@reddit

consider what they managed to do by squeezing v3 i think we might.
View on Reddit #75260429

Few_Painter_5588@reddit

Uhm, no. Deepseek were always market disrupting. Deepseek v2, v3, V3 0324, R1, 3.2. THey've only had three flops, V1, R1 0528 and V3.2 speciale.
View on Reddit #75263119

SlowFail2433@reddit

R1 0528 was very widely used
View on Reddit #75530001

my_name_isnt_clever@reddit

Speciale beat Gemini 3 Pro at pure reasoning, and it was never intended to be a standalone product. Not sure how that could be considered a flop.
View on Reddit #75281166

-dysangel-@reddit

I really liked R1 0528 - it's the only one of the Deepseek models I've tried that still worked well even at Q2
View on Reddit #75279837

swaglord1k@reddit

disagree, only r1 was disruptive, and even that was barely on par with the closed-source sota. i wouldn't be surprised if they've been cooking a new architecture that will mog existing llms
View on Reddit #75266427

Few_Painter_5588@reddit

Well, deepseek doesn't really compete with the frontier. But from a cost effective standpoint, they disrupted the markets quite significantly. ChatGPT mini, Claude Haiku and Gemini Flash basically got destroyed by Deepseek - and that's a market where the real money is made. I think deepseek v4 with their new architecture will disrupt the market. Especially if they follow up their theory and raise the parameter count and especially if they integrate their work on Janus, Deepseek OCR and Deepseek VL into v4.
View on Reddit #75266907

panic_in_the_galaxy@reddit

You don't update a paper with new results. You would just write a new paper. These are just explanations and clarifications.
View on Reddit #75257310

ab2377@reddit

but why do late?
View on Reddit #75270056

autoencoder@reddit

red herring for competitors?
View on Reddit #75270315

layer4down@reddit

I’m not sure how it would benefit DeepSeek.
View on Reddit #75423626

linkillion@reddit

A year is forever in the AI space right now but not all that long in scientific research. Most papers take anywhere from a couple months to a year from first draft to publishing in a peer review journal (and to make a draft requires months to years of experimentation). I'm not sayin this is what's happening here (it's just an arxiv update) however it's not at all surprising if you're doing good research.
View on Reddit #75322468

QuackerEnte@reddit

"slow and steady wins the race"
View on Reddit #75274694

panic_in_the_galaxy@reddit

Because they probably got a lot of answers from other researchers and tried to answer them. Writing papers takes time.
View on Reddit #75270351

ResidentPositive4122@reddit

No, I mean they've updated this with all they had, and preparing for a new arch. Like in closing a chapter.
View on Reddit #75257381

Elven77AI@reddit

This seems like it, dumping dozens of pages means its no longer relevant to their current research and they moved on to something far more effective(i.e. no competitor advantage), likely a new reasoning architecture built from https://huggingface.co/papers/2512.24880
View on Reddit #75258841

qtvivies@reddit

https://preview.redd.it/t6ic0x3nywbg1.png?width=1965&format=png&auto=webp&s=ae5ef60128b8a0cf89351e8673ea41eddafb037c Something interesting towards the end. Looks like someone forgot about this
View on Reddit #75258726

Intelligent-Time7704@reddit

real openai
View on Reddit #75514700

Mikasa0xdev@reddit

86 pages? That's a weekend read.
View on Reddit #75289532

Crisis_Averted@reddit

layman here: no idea what that implies.
View on Reddit #75283487

qtvivies@reddit

They intended to release the (mostly synthetic? by r1-zero) data used for the post training of R1, except I assume that \`xxx\` was left in accidentally as a placeholder in place of a link. Or they changed their mind and just forgot to remove it.
View on Reddit #75288590

Dany0@reddit

Supervised fine-tuning and Reinforcement learning data. Used to post-train
View on Reddit #75284021

Eyelbee@reddit

It's crazy it's only been one year, feels like ages
View on Reddit #75344993

badgerbadgerbadgerWI@reddit

The level of detail they're releasing is remarkable. This kind of transparency is what pushes the whole field forward. Really interested in their distillation approach - getting smaller models to match larger ones' reasoning is key for edge deployment.
View on Reddit #75329397

CryptoUsher@reddit

honestly the fact they went back and added 60+ pages is kind of wild. most papers just release and call it a day, maybe a small erratum if something's broken. wonder if this was all stuff they had internally but couldn't publish initially, or if they're responding to community feedback and trying to explain their approach better. either way it's good for reproducibility. the original paper was already dense but felt like it was missing implementation details. if they're actually filling in those gaps this could be huge for people trying to replicate or build on their work.
View on Reddit #75300327

Imaginary-Bit-3656@reddit

I think it was part of getting the paper accepted into Nature, where that same additional information was included with the work as supplimentary material (I think the standards for Nature are considered quite high) Many of the papers we see on Arxiv are preprints, works that have not been peer reviewed and may not ever be published in a journal.
View on Reddit #75321493

timfduffy@reddit

I think all this info was previously released as a [supplment to their R1 paper in Nature](https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-025-09422-z/MediaObjects/41586_2025_9422_MOESM1_ESM.pdf).
View on Reddit #75308738

CryptoUsher@reddit

honestly the fact they went back and added 60+ pages is kind of wild. most papers just release and call it a day, maybe a small erratum if something's broken. wonder if this was all stuff they had internally but couldn't publish initially, or if they're responding to community feedback and trying to explain their approach better. either way it's good for reproducibility. the original paper was already dense but felt like it was missing implementation details. if they're actually filling in those gaps this could be huge for people trying to replicate or build on their work.
View on Reddit #75299911

yoshiK@reddit

I did quickly throw the two papers into Gemini. It's really fun to live in the future. #Gemini summary: The paper **"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"** (arXiv:2501.12948) marks a significant milestone in open-source AI by demonstrating that advanced reasoning (similar to OpenAI’s o1) can be achieved through large-scale Reinforcement Learning (RL) with minimal human-annotated data. The two versions you provided represent the initial release (**v1**, Jan 22, 2025) and the latest updated version (which has been significantly expanded to **86+ pages** as of January 2026). ### 1. Overall Paper Summary The paper introduces two primary models: * **DeepSeek-R1-Zero:** A model trained via "pure RL" (using the GRPO algorithm) starting directly from a base model without any Supervised Fine-Tuning (SFT). It demonstrates that reasoning behaviors like self-correction and reflection can emerge purely from reward signals. * **DeepSeek-R1:** A more "user-friendly" version that uses a multi-stage pipeline (Cold-start SFT → Reasoning RL → Rejection Sampling/SFT → General RL) to fix the "readability" and "language mixing" issues of R1-Zero while maintaining state-of-the-art reasoning performance. * **Distillation:** The authors show that the reasoning patterns discovered by the 671B model can be distilled into smaller models (1.5B to 70B), allowing a 14B model to outperform much larger ones on math and coding benchmarks. --- ### 2. Comparison: Extensions in the New Version The newer version is a massive technical expansion (growing from roughly 22 pages to over 85 pages). The key additions and extensions include: #### A. The "Aha Moment" Expansion (Section 2.2.1) The new version provides a much deeper analysis of the **"Aha Moment"**—the point during RL training where the model unexpectedly learns to "re-think" its approach. The extension includes more qualitative examples and internal data showing the model's transition from linear solving to iterative self-correction without being prompted to do so. #### B. Detailed 4-Stage Training Pipeline While v1 outlined the stages, the new version details the specific composition of the **800k total training samples**: * **Stage 1 (Cold Start):** Expanded details on the ~5,000-10,000 long CoT (Chain of Thought) samples used to "prime" the model. * **Stage 3 (Rejection Sampling):** A deeper dive into how 600k reasoning-related and 200k non-reasoning samples were filtered and used to improve the model's general chat capabilities and prevent "forgetting" during the reasoning-heavy RL stages. #### C. Comprehensive Ablation Studies The new version adds extensive "What if?" scenarios that were absent or brief in v1: * **Distillation vs. RL:** New evidence explaining *why* distilling a large model's reasoning traces into a small model is more effective than training that small model directly with its own RL. * **Base Model Impact:** Analysis of how different base models (DeepSeek-V3 vs. Qwen vs. Llama) respond to the R1 training recipe. #### D. Expanded "Unsuccessful Attempts" (Section 4.2) One of the most valuable additions for researchers is the expanded section on what **did not work**. The new version elaborates on their failures with: * **Process Reward Models (PRM):** Detailed reasons why step-level rewards were difficult to scale or prone to "reward hacking" compared to the outcome-based rewards used in R1. * **Monte Carlo Tree Search (MCTS):** Technical explanation of why MCTS didn't provide the expected gains over simple RL in the context of LLM reasoning. #### E. New Benchmarks & Technical Specs * **Updated Results:** Includes more recent evaluations on benchmarks like **AIME 2025**, **LiveCodeBench**, and specialized medical/legal reasoning tests. * **Hyperparameters:** The new version includes exhaustive tables of training hyperparameters (learning rates, GRPO group sizes, KL divergence coefficients) which were previously withheld or summarized. ### Summary Table | Feature | v1 (Original) | Latest Version (Extension) | | :--- | :--- | :--- | | **Page Count** | ~22 Pages | **86+ Pages** | | **Methodology** | High-level 4-stage overview | Granular detail on each stage (SFT, RL, Rejection Sampling) | | **Behaviors** | Mentions "self-correction" | Deep dive into "Aha Moment" with case studies | | **Failed Paths** | Brief mention of PRM/MCTS | Exhaustive analysis of why PRM and MCTS underperformed | | **Distillation** | Introduced 1.5B to 70B models | Added deep ablation on distillation efficiency and data filtering | | **Hyperparameters** | Partial/Summary | **Complete Technical Specs** for reproducibility |
View on Reddit #75263481

pigeon57434@reddit

i dont understand people who just post completely slop summaries into comment sections by AI if we wanted a summary we would ask a model ourselves since it requires no effort
View on Reddit #75272113

Chickenbuttlord@reddit

Nobody asked for your opinion buddy, yes we do very much need summary posted into a comment incase we're too lazy to do it ourselves. Thank you op!
View on Reddit #75280255

pigeon57434@reddit

i guess maybe if it was a good summary it would be ok but this is literally such a bad and in some cases just objectively factually wrong
View on Reddit #75296861

menictagrib@reddit

The new version of someone with no background knowledge answering a technical question for dopamine by repeating whatever they see on the first page of Google
View on Reddit #75276117

arguingwithabot@reddit

I hear you, but at the end of the day they saved you a few clicks, keystrokes and tokens.
View on Reddit #75273770

DigThatData@reddit

there is no section 2.2.1.
View on Reddit #75265714

yoshiK@reddit

The discussion of the a-ha moment is actually section 2.3. Though I didn't check too closely since the summary is anyhow that I probably need to sit down and read the whole thing at some point.
View on Reddit #75280492

warnerbell@reddit

The original paper was light on implementation specifics. If they've added more on how they got the reasoning behavior to emerge, that's valuable.
View on Reddit #75285663

TelloLeEngineer@reddit

does arxiv have a diff UI?
View on Reddit #75266457

Freonr2@reddit

Don't think so, but you can download the raw TeX format versions of the paper and see the appendix.tex is new and the largest .tex file, or look for diffs that way from the raw .tex files.
View on Reddit #75270130

Ok_Technology_5962@reddit

Current research is linear attention. Save.2 with the cache and now they had a massive paper come out that puts the linear into the whole modelaking it possible to train more than 60 layers. So yes this one is done. The compute and thinking will now happen internally in 1000 layers
View on Reddit #75268801

jeffwadsworth@reddit

I was using the online chat version last night to improve a large Java class (40K tokens) with multiple methods. It did so beautifully with zero issues in one shot. The same task in my sub of Gemini 3 Pro chat interface failed in a few shots due to hallucinations. They have really improved that model a lot from a year ago.
View on Reddit #75265901

Aggressive-Bother470@reddit

New grpo details perhaps? From reading the hf page it implied it was maybe light in that regard? 
View on Reddit #75263192

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! [Come check it out!](https://discord.gg/PgFhZ8cnWW) You've also been given a special flair for your contribution. We appreciate your post! *I am a bot and this action was performed automatically.*
View on Reddit #75262719