Hacker News Clone

zug_zugDec 1, 2025, 6:36 PM

Well props to them for continuing to improve, winning on cost-effectiveness, and continuing to publicly share their improvements. Hard not to root for them as a force to prevent an AI corporate monopoly/duopoly.

make3Dec 1, 2025, 8:08 PM

I suspect they will keep doing this until they have a substantially better model than the competition. Sharing methods to look good & allow the field to help you keep up with the big guys is easy. I'll be impressed if they keep publishing even when they do beat the big guys soundly.

chistevDec 2, 2025, 12:15 PM

How do they make their money

binary132Dec 2, 2025, 2:58 PM

I suspect it is a state venture designed to undermine the American-led proprietary AI boom. I'm all for it, tbh, but as others have pointed out, if they successfully destroy the American ventures it's not like we can expect an altruistic endgame from them.

vitafloDec 2, 2025, 6:00 PM

Deepseek is owned by a Chinese hedge fund. It was originally created for finance and then generalized later. In any case you pay for it like any other LLM.

paulvnickersonDec 1, 2025, 9:38 PM

[flagged]

amunozoDec 1, 2025, 9:55 PM

Should I root for the democratic OpenAI, Google or Microsoft instead?

doctorwho42Dec 1, 2025, 11:05 PM

Further more, who thinks our little voices matter anymore in the US when it comes to the investor classes?

And if they did, having a counterweight against corrupt self-centered US oligarchs/CEOs is actually one of the biggest proponents for an actual powerful communist or other model world power. The US had some of the most progressive tax policies in its existence when it was under existential threat during the height of the USSR, and when their powered started to diminish, so too did those tax policies.

staredDec 1, 2025, 11:53 PM

There used to be memes „open source is communism”, vide https://souravroy.com/2010/01/01/is-open-source-pro-communis...

LucasoatoDec 1, 2025, 9:44 PM

> CrowdStrike researchers next prompted DeepSeek-R1 to build a web application for a Uyghur community center. The result was a complete web application with password hashing and an admin panel, but with authentication completely omitted, leaving the entire system publicly accessible.

> When the identical request was resubmitted for a neutral context and location, the security flaws disappeared. Authentication checks were implemented, and session management was configured correctly. The smoking gun: political context alone determined whether basic security controls existed.

Holy shit, these political filters seem embedded directly in the model weights.

tadfisherDec 2, 2025, 3:23 AM

LLMs are the perfect tools of oppression, really. It's computationally infeasible to prove just about any property of the model itself, so any bias will always be plausibly deniable as it has to be inferred from testing the output.

I don't know if I trust China or X less in this regard.

tehjokerDec 1, 2025, 11:12 PM

not convincing. have you tried saying "free palestine" on a college campus recently?

gradus_adDec 1, 2025, 8:51 PM

How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models? What hurt open source in the past was its inability to keep up with the quality and feature depth of closed source competitors, but models seem to be reaching a performance plateau; the top open weight models are generally indistinguishable from the top private models.

Infrastructure owners with access to the cheapest energy will be the long run winners in AI.

red2awnDec 1, 2025, 6:35 PM

Worth noting this is not only good on benchmarks, but significantly more efficient at inference https://x.com/_thomasip/status/1995489087386771851

embedding-shapeDec 1, 2025, 8:13 PM

> DeepSeek-V3.2 introduces significant updates to its chat template compared to prior versions. The primary changes involve a revised format for tool calling and the introduction of a "thinking with tools" capability.

At first, I thought they had gone the route of implementing yet another chat format that can handle more dynamic conversations like that, instead of just using Harmony, but looking at the syntax, doesn't it look exactly like Harmony? That's a good thing, don't get me wrong, but why not mention straight up that they've implemented Harmony, so people can already understand up front that it's compatible with whatever parsing we're using for GPT-OSS?

throwdbaawayDec 2, 2025, 11:22 AM

That DSML in the encoding directory looks quite a bit different from the Harmony chat template.

TIPSIODec 1, 2025, 6:30 PM

It's awesome that stuff like this is open source, but even if you have a basement rig with 4 NVIDIA GeForce RTX 5090 graphic cards ($15-20k machine), can it even run with any reasonable context window that isn't like a crawling 10/tps?

Frontier models are far exceeding even the most hardcore consumer hobbyist requirements. This is even further

zparkyDec 1, 2025, 2:37 PM

Benchmarks are super impressive, as usual. Interesting to note in table 3 of the paper (p. 15), DS-Speciale is 1st or 2nd in accuracy in all tests, but has much higher token output (50% more, or 3.5x vs gemini 3 in the codeforces test!).

futureshockDec 1, 2025, 6:02 PM

The higher token output is not by accident. Certain kinds of logical reasoning problems are solved by longer thinking output. Thinking chain output is usually kept to a reasonable length to limit latency and cost, but if pure benchmark performance is the goal you can crank that up to the max until the point of diminishing returns. DeepSeek being 30x cheaper than Gemini means there’s little downside to max out the thinking time. It’s been shown that you can further scale this by running many solution attempts in parallel with max thinking then using a model to choose a final answer, so increasing reasoning performance by increasing inference compute has a pretty high ceiling.

BoorishBearsDec 1, 2025, 1:32 PM

3.2-Exp came out in September: this is 3.2, along with a special checkpoint (DeepSeek-V3.2-Speciale) for deep reasoning that they're claiming surpasses GPT-5 and matches Gemini 3.0

https://x.com/deepseek_ai/status/1995452641430651132

deauxDec 2, 2025, 3:49 AM

The assumption here is that 3.2 (without suffix) is an evolution of 3.2-Exp rather than being the same model, but they don't seem to be explicitly stating anywhere whether they're actually different or that they just made the same model GA.

nickandbroDec 2, 2025, 2:47 AM

For anyone that is interested

"create me a svg of a pelican riding on a bicycle"

https://www.svgviewer.dev/s/FhqYdli5

chronogramDec 2, 2025, 11:02 AM

It created a whole webpage to showcase the SVG with animation for me: https://output.jsbin.com/qeyubehate

mcbuilderDec 1, 2025, 8:31 PM

After using it a couple hours playing around, it is a very solid entry, and very competitive compared with the big US relaeses. I'd say it's better than GLM4.6 and I'm Kimi K2. Looking forward to v4

energy123Dec 2, 2025, 2:41 AM

Did you try with 60k+ context? I found previous releases to be lacklustre which I tentatively attributed to the longer context, due to the model being trained on a lot of short context data.

singularity2001Dec 1, 2025, 11:38 PM

Why are there so few 32,64,128,256,512 GB models which could run on current consumer hardware? And why is the maximum RAM on Mac studio M4 128 GB??

eldenringDec 2, 2025, 1:59 AM

the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.

cowpigDec 2, 2025, 2:06 AM

More and more people I talk to care about privacy, but not in SF

misterchephDec 2, 2025, 4:46 PM

and sovereignty. I can go into the woods with a fuzzy approximation of all internet text in my backpack

jameslkDec 2, 2025, 12:04 AM

128 GB should be enough for anybody (just kidding). I hope the M5 Max will have higher RAM limits

aryonocoDec 2, 2025, 12:19 AM

M5 Max probably won’t, but M5 Ultra probably will

ainchDec 2, 2025, 5:48 AM

As LLMs are productionised/commodified they're incorporating changes which are enthusiast-unfriendly. Small dense models are great for enthusiasts running inference locally, but for parallel batched inference MoE models are much more efficient.

imbusy111Dec 2, 2025, 6:57 AM

Funny to see tau2-bench on the list of benchmarks, when tau2-bench is flawed and 100% score is impossible, unless you add the tasks to the training set: https://github.com/sierra-research/tau2-bench/issues/89

cgearhartDec 2, 2025, 2:42 PM

So DSA means a lightweight indexing model evaluated over the entire context window + a top-k attention evaluation. There’s no soft max in the indexing model, so it can run blazingly fast in parallel.

I’m surprised that a fixed size k doesn’t experience degrading performance in long context windows though. That’s a _lot_ of responsibility to push into that indexing function. How could such a simple model achieve high enough precision and recall in a fixed size k for long context windows?

mark_l_watsonDec 2, 2025, 1:38 PM

I used DeepSeek-v3.2 to solve two coding problems by pasting code and directions as one large prompt into a chat interface and it performed very well. VERY WELL!

I am still happy to pay Google because of their ecosystem or Gemini app, NotebookLM, Colab, gemini-cli, etc. Google’s moat for me is all the tooling and engineering around the models.

That said, my one year Google AI subscription ends in four months and I might try an alternative, or at least evaluate options. Alibaba Cloud looks like an interesting low cost alternative to AWS for building systems. I am now a retired ‘gentleman scientist’ now and my personal research is inexpensive no matter who I pay for inference compute, but it is fun to spend a small amount of time evaluating alternatives even though mostly using Google is time efficient.

sfdlkj3jk342aDec 2, 2025, 4:58 AM

What version is actually running on chat.deepseek.com?

It refuses to tell me when asked, only that it's been train with data up until July 2024, which would make it quite old. I turned off search and asked it for the winner of the US 2024 election, and it said it didn't know, so I guess that confirms it's not a recent model.

scottyeagerDec 2, 2025, 5:47 AM

You can read that 3.2 is live on web and app here: https://api-docs.deepseek.com/news/news251201

The pdf describes how they did "continued pre-training" and then post training to make 3.2. I guess what's missing is the full pre-training that absorbs most date sensitive knowledge. That's probably also the reason that the versions are 3.x still.

chistevDec 2, 2025, 5:39 AM

I've found it better than ChatGPT lately, at least the free version of GPT.

I don't know, but GPT seems to have regressed a lot, at least the free version.

jodleifDec 1, 2025, 3:56 PM

I genuinely do not understand the evaluations of the US AI industry. The chinese models are so close and far cheaper

HavocDec 1, 2025, 11:21 PM

Note combination of big frontier level model and MIT license.

swframe2Dec 2, 2025, 4:43 PM

The AI market is hard to predict due to the constant development of new algorithms that could emerge unexpectedly. Refer to this summary of Ilya's opinions for insights into the necessity of these new algorithms: https://youtu.be/DcrXHTOxi3I

DeepSeek is a valuable product, but its open-source nature makes it difficult to displace larger competitors. Any advancements can be quickly adopted, and in fact, it may inadvertently strengthen these companies by highlighting weaknesses in their current strategies.

FrannkyDec 4, 2025, 8:26 AM

Smart model—I use it as my main chat. It's interesting that markets were able to predict that it would lower the revenue of the paid ones.

spullaraDec 1, 2025, 8:05 PM

I hate that their model ids don't change as they change the underlying model. I'm not sure how you can build on that.

  % curl https://api.deepseek.com/models \          
    -H "Authorization: Bearer ${DEEPSEEK_API_KEY}"  
  {"object":"list","data":[{"id":"deepseek-chat","object":"model","owned_by":"deepseek"},{"id":"deepseek-reasoner","object":"model","owned_by":"deepseek"}]}

KronisLVDec 1, 2025, 8:23 PM

Oh hey, quality improvement without doing anything!

(unless/until a new version gets worse for your use case)

hnfongDec 1, 2025, 11:29 PM

Agree that having datestamps on model ids is a good idea, but it's open source, you can download the weights and build on those. In the long run, this is better than the alternative of calling API of a proprietary model and hoping it doesn't get deprecated.

deauxDec 2, 2025, 1:58 AM

Anthropic has done similar before (changing model behavior on the same dated endpoint).

cheriooDec 2, 2025, 9:53 AM

Allegedly deepseek is doing this because they don’t have enough gpu to serve two models concurrently.

nickstinematesDec 2, 2025, 6:47 AM

I am waiting for the first truly open model without any of the censorship built in.

I wonder how long it will take and how quickly it will try to get shut down.

naeqDec 2, 2025, 7:25 AM

Most open models have been converted to uncensored versions. Search for the model name with the suffix "abliterated".

Aldo_MXDec 4, 2025, 8:34 AM

That's not a realistic expectation.

Classic examples like:

  User: I'm feeling bad
  LLM: Have you considered k*****g yourself?

Are a good example of what an LLM "without censorship" looks like: Good at predicting the most common sequence of text (ex. the most common sarcastic reply from Reddit), but effectively useless.

In order to build a useful LLM (ie. one that actually follows instructions) you need to teach the LLM to prefer the most helpful answer, and that process by itself is already an implicit layer of "censorship" as it requires human supervision, and different humans have different perceptions on what the most helpful answer is, especially when their paycheck is conditioned to a list of "corporate values".

You can only pick between a parrot that repeats random text from the Internet, or a parrot lobotomized to follow the orders from their trainers (which occasionally repeats random text from the Internet, because the training isn't perfect).

Unsurprisingly, the lobotomized parrot is more useful to get actual work done, even if it won't tell you what the CIA[1] did to Mexican Students on October 2nd, 1968.

[1]: https://www.bbc.com/mundo/noticias-america-latina-45662739

htrpDec 1, 2025, 6:37 PM

what is the ballpark vram / gpu requirement to run this ?

rhdunnDec 1, 2025, 6:50 PM

For just the model itself: 4 x params at F32, 2 x params at F16/BF16, or 1 x params at F8, e.g. 685GB at F8. It will be smaller for quantizations, but I'm not sure how to estimate those.

For a Mixture of Experts (MoE) model you only need to have the memory size of a given expert. There will be some swapping out as it figures out which expert to use, or to change expert, but once that expert is loaded it won't be swapping memory to perform the calculations.

You'll also need space for the context window; I'm not sure how to calculate that either.

anvuongDec 1, 2025, 7:36 PM

I think your understanding of MoE is wrong. Depending on the settings, each token can actually be routed to multiple experts, called experts choice architecture. This makes it easier to parallelize the inference (each expert on a different device for example), but it's not simply just keeping one expert in memory.

petuDec 1, 2025, 7:21 PM

I think your idea of MoE is incorrect. Despite the name they're not "expert" at anything in particular, used experts change more or less on each token -- so swapping them into VRAM is not viable, they just get executed on CPU (llama.cpp).

jodleifDec 1, 2025, 10:11 PM

A common pattern is to offload (most of) the expert layers to the CPU. This combination is still quite fast even with slow system ram, though obviously inferior to a pure VRAM loading

johnnienakedDec 2, 2025, 6:00 AM

Are we the baddies?

a96Dec 2, 2025, 10:06 AM

The AI says shake... "Signs point to yes."

johnxieDec 2, 2025, 6:53 AM

Cool to see open models catching up fast. For builders the real question is simple. Which model gives you the tightest loop and the least surprises in production. Sometimes that is open. Sometimes closed. The rest is noise.

arthurcolleDec 2, 2025, 4:53 AM

Surely OpenAI will follow up with a gpt-oss-780b

twistedcheesletDec 1, 2025, 8:08 PM

How capable are these models at tool calling?

potsandpansDec 1, 2025, 10:44 PM

From some very brief experimentation with deepseek about 2 months ago, tool calling is very hot or miss. Claude appears to be the absolute best.

segmondyDec 2, 2025, 2:17 PM

Depends on if they are trained for tool calling, this model is experimentation with new architecture, training methods, etc. It's not designed for tool calling. If you want to tool call, then you should look into DeepSeekv3.1-Terminus.

matt-aliveDec 3, 2025, 1:33 PM

Is it open source vs enterprise or China vs US?

EternalFuryDec 2, 2025, 6:06 AM

It does seem good, but it’s slow.

orenaDec 1, 2025, 9:50 PM

Any results on frontier math or arc ?

lalassuDec 1, 2025, 7:53 PM

Disclaimer: I did not test this yet.

I don't want to make big generalizations. But one thing I noticed with chinese models, especially Kimi, is that it does very well on benchmarks, but fails on vibe testing. It feels a little bit over-fitting to the benchmark and less to the use cases.

I hope it's not the same here.

catigulaDec 1, 2025, 8:21 PM

To push back on naivety I'm sensing here I think it's a little silly to see Chinese Communist Party backed enterprise as somehow magnanimous and without ulterior, very harmful motive.

sidcoolDec 2, 2025, 2:27 AM

Can someone kind please ELI5 this paper?

HarHarVeryFunnyDec 2, 2025, 4:26 PM

They've developed a sparse attention mechanism (which they document and release source code for) to increase model efficiency with long context, as needed for fast & cost-effective extensive RL training for reasoning and agentic use

They've built a "stable & scalable" RL protocol - more capable RL training infrastructure

They've built a pipeline/process to generate synthetic data for reasoning and agentic training

These all combine to build an efficient model with extensive RL post-training for reasoning and agentic use, although they note work is still needed on both the base model (more knowledge) and post-training to match frontier performance.

0xeddDec 2, 2025, 9:01 AM

[dead]

samir123766Dec 2, 2025, 6:06 AM

nice

nimchimpskyDec 1, 2025, 11:43 AM

Pretty amazing that a relatively small Chinese hedge fund can build AI better than almost anyone.

HavocDec 1, 2025, 11:28 PM

Yeah they've consistently delivered. At the same time there are persistent whispers that they're not all that small and scruffy as portrayed either.

astrangeDec 2, 2025, 1:42 AM

Anthropic also said their development costs aren't very different.

JSR_FDEDDec 1, 2025, 11:15 PM

And gives it away for free!

Foobar8568Dec 1, 2025, 8:13 PM

At least, there is no doubt where he is from !

which version are you?

我是DeepSeek最新版本模型！如果你想了解具体的版本号信息，我建议你：

    查看官方文档 - DeepSeek官网和文档会有最准确的版本信息

    关注官方公告 - 版本更新通常会在官方渠道公布

    查看应用商店/网页版 - 使用界面通常会显示当前版本

我具备DeepSeek的所有最新功能特性，包括：

    强大的对话和推理能力

    128K上下文长度

    文件上传处理（图像、文档等）

    联网搜索功能（需手动开启）

    完全免费使用

如果你需要知道精确的版本号用于技术对接或其他特定用途，最好直接查阅官方技术文档，那里会有最准确和详细的技术规格说明。

有什么其他问题我可以帮你解答吗？

schlauerfoxDec 1, 2025, 10:01 PM

It's so strange when it obviously hits a preprogrammed non-answer in these models, how can one ever trust them when there is a babysitter that interferes in an actual answer. I suppose that asking it what version it is isn't a valid question in it's training data so it's programmed to say check the documentation, but still definitely suspicious when it gives a non-answer.

wosinedDec 1, 2025, 9:35 PM

Remember: If it is not peer-reviewed, then it is an ad.

HavocDec 1, 2025, 11:23 PM

Good general approach, but deepseek has thus far always delivered. And not just delivered, but under open license too. "Ad" as starting assumption seems overly harsh

vessenesDec 1, 2025, 9:40 PM

I mean.. true. Also, DeepSeek has good cred so far on delivering roughly what their PR says they are delivering. My prior would be that their papers are generally credible.

DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]

Comments