Hacker News Clone

ChyzwarMay 26, 2026, 2:43 PM

When discussing LLM pricing, people are missing the plot. The subscription token price is 10x-40x cheaper than API pricing. Your 90$ Claude subscriptions give you close to $1000 to $4000 in equivalent API token pricing.

The second issue is that the quality of the model “operator” makes a massive difference in the outcomes. Highly skilled senior devs who know how to prompt and have high agency will outperform team people that lack motivation and foundational skills.

Lastly, there is a massive difference in capabilities, determinism, and error handling between 5T SOTA models like Opus and tiny distillations from DeepSeek that perform well only in benchmarks.

stymaarMay 26, 2026, 3:31 PM

> Lastly, there is a massive difference in capabilities, determinism, and error handling between 5T SOTA models like Opus

What's your source for Opus being a 5T model?

> and tiny distillations from DeepSeek that perform well only in benchmarks.

I don't think you know what you're talking about. Local models aren't “distillations from Deepseek”.

And they don't perform well “only in benchmarks”, Qwen 3.6 is a very decent model (obviously it's not Opus, but it's also much faster and speed is a quality of its own).

gpugregMay 26, 2026, 3:41 PM

> What's your source for Opus being a 5T model?

Elon Musk tweeted that Grok is 0.5T or 1/10th the size of Opus. https://xcancel.com/elonmusk/status/2042123561666855235#m

While this source's reliability is certainly debatable, the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/

stymaarMay 26, 2026, 4:02 PM

> While this source's reliability is certainly debatable

Massive understatement. Nowadays it has become hard to find a single Musk statement that doesn't contain at least one lie.

> the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/

Thanks for the pointer. This estimation has Grok 6 times bigger than Musk claims it is, so maybe that's where the lie is.

(I'm quite skeptical about that number though, it would be quite disappointing for the US tech if their flagship models had to be that much larger than the Chinese ones for such a small edge in performance. Because I don't think US labs are incompetent, I'd bet that US flagships aren't more than 2/3 times bigger than Chinese flagship. Otherwise it really doesn't bode well.)

strikingMay 26, 2026, 4:21 PM

In tiny gray text right above the table is written "90% PI ≈ ±3.00× either side." Is GPT-5.5-Pro 3.4T or 30.8T in size, or somewhere in between? We just don't know.

UltraSaneMay 27, 2026, 4:53 AM

Elon Musk has absolutely no credibility anymore. I'm more likely to believe the opposite of what he claims to be true.

ChyzwarMay 26, 2026, 6:56 PM

https://arxiv.org/abs/2604.24827

From this paper

stymaarMay 26, 2026, 7:25 PM

That's not what the paper says though:

    Claude Opus 4.6 Anthropic 68.0% ∼5.3T [1.8–15.6T]
    Claude Opus 4.7 Anthropic 66.4% ∼4.0T [1.4–12.0T]
    Claude Opus 4.5 Anthropic 65.2% ∼3.4T [1.1–10.0T]
    Claude Opus 4.1 Anthropic 64.9% ∼3.2T [1.1–9.5T]
    Claude Opus 4 Anthropic 59.7% ∼1.4T [478B–4.2T

According to their estimation, Opus is likely between 1T and 15T, which really doesn't tell you much that you couldn't have guessed otherwise. It doesn't say “Opus is a 5T model”.

The fact that there's absolutely no consistency in the predicted size between models from the same lab should tell you all you need about the predictive power of this method (and they aren't really lying about their numbers, their confidence interval is huge enough to fit anything in it, but their prose is making very strong claims out of their statistical nothingburger).

(somebody already posted this paper earlier, and I spent some time reading it, and this paper is really not that good even though there are a bunch of interesting ideas in it).

ath3ndMay 26, 2026, 10:59 PM

[dead]

layer8May 26, 2026, 3:47 PM

> What's your source for Opus being a 5T model?

Probably Elon Musk: https://eu.36kr.com/en/p/3760679047267075

UltraSaneMay 27, 2026, 4:54 AM

I don't know why stymaar's comment is flagged and dead, he is 100% correct.

stymaarMay 26, 2026, 3:50 PM

[flagged]

ramesh31May 26, 2026, 6:03 PM

People can simultaeneously be reprehensible idiots while being a reliable expert on something they have personally invested billions of dollars into and operate at scale.

overfeedMay 26, 2026, 6:31 PM

> ...while being a reliable expert on something they have personally invested billions of dollars into and operate at scale.

Like "Full Self-Driving" from coast-to-coast by 2016?

awkwardpotatoMay 26, 2026, 6:16 PM

He's also invested billions of dollars in SpaceX and Tesla... which he regularly makes wild claims about that are untrue.

amanaplanacanalMay 26, 2026, 6:55 PM

I'm not saying he actually is an expert, but he could be an expert and still lie for any number of reasons.

stymaarMay 26, 2026, 6:56 PM

Elon is a specialist of lying about stuff he invested billions in to make it look more valuable than it is (he's been doing that for Tesla for years). It's not a lack of expertise, it's the lack of any sense of integrity (and self respect).

He's lagging the AI race despite having tons of compute available, so he tries to make a narrative about how it's not that the model is behind, it's just smaller than the competition.

xbmcuserMay 26, 2026, 4:14 PM

Its not like the non frontier are not improving. If someone can use deepseek to get 90% of the work done for $100 then pay another $100 to anthropic or openai to complete it I think they will rather do that than pay anthropic or openai for $1000.

protocoltureMay 27, 2026, 12:45 AM

>When discussing LLM pricing, people are missing the plot. The subscription token price is 10x-40x cheaper than API pricing. Your 90$ Claude subscriptions give you close to $1000 to $4000 in equivalent API token pricing.

These are loss leaders that will not be maintained over the long term. Already we see moves to restrict their usage and redirect people back to API pricing.

try-workingMay 27, 2026, 12:13 AM

DeepSeek and Xiaomi are so cheap there's no need to get a plan. Just use the API.

jason_sMay 27, 2026, 3:42 AM

something something something China something something intellectual property something something....

noman-landMay 27, 2026, 4:51 AM

You can just say the words instead of implying their meaning and letting everyone fill in the gaps themselves.

cyanydeezMay 26, 2026, 3:10 PM

Isn't the plot that it's like an infinite bikeshed but 10% of the biksheds are actually trailer parks and when you finally realize it's a trailer park and not a bike shed you're down 10-100$ because it's token gen is faster than you can actually validate?

Some might say the price wouldn't be great if you could actually process and validate it...

kelseyfrogMay 26, 2026, 3:17 PM

> The quality of the model “operator” makes a massive difference in the outcomes.

My hunch is that this is the source of much of the variability in outcomes upstream of HN commenters claiming extremes of, "This model changes everything!" to "This[same] model is crap."

We haven't operationalized what it means to "be good at prompting," nor developed proxies/heuristics/shibboleths for accessing prompting skill. There's community skepticism over whether prompting skill even exists. Besides even if prompting skill is real, who wants to hear, "Actually you kinda suck at prompting."

danielmarkbruceMay 26, 2026, 4:20 PM

It's 100% this. Many people suck at prompting. It's likely that habits from search are ingrained. But in general some people are just so bad at it .

jyounkerMay 26, 2026, 7:16 PM

Prompting is just writing specification documents. A lot of people are very bad at this. I suppose that more to the point, a lot of people are just bad at writing.

danielmarkbruceMay 26, 2026, 11:07 PM

This is probably correct. Perhaps prompting just brings out the very worst in specification.

FireCrackMay 27, 2026, 3:56 AM

IDK if it's just me, but I also find Claude, whether it be the model or the harness, is a lot more "forgiving" of poor prompts than many of the open models

latexrMay 26, 2026, 6:19 PM

According to Google, “there’s no wrong way to prompt”.

https://www.youtube.com/watch?v=9bBfYX8X5aU&t=48s

knollimarMay 26, 2026, 9:21 PM

No wrong way to [consume thing I sell that you'll consume more of if you do it poorly]

djeastmMay 26, 2026, 9:41 PM

Ehhh, their incentive in their marketing is to get normal people to not be intimidated by the big bad AI.

Power users are always going to have to take the messaging companies send out to the masses with a grain of salt.

jyounkerMay 26, 2026, 7:28 PM

The problem with outsourcing, as opposed to remote developers, is that it takes a really good manager and tech lead to make it work.

My experience is that you have to write extremely detailed design documents and work specifications in order to get effective results. These generally have to be as detailed as most effective prompts.

Once you've written specs that detailed, why do you need outsourced developers and frontier models?

treisMay 26, 2026, 2:29 PM

I think this misses the forest for the trees. Working with ChatGPT is eerily similar to working with offshore Indian devs back in my enterprise days. Productive if guided explicitly but if let run wild there's lots of WTF moments.

LLMs are likely to replace outsourced devs because your employees that know the context can use LLMs to do what offshore devs did before.

freediddyMay 26, 2026, 2:49 PM

My friend is an exec at a US software company and they are preparing to lay off a few teams of programmers in their Eastern European locations and replacing them with a small number of US programmers + AI. He said they are much more productive and produce new features much faster.

causalMay 26, 2026, 6:55 PM

This makes more sense to me. The bottleneck for me is less becoming "understanding code" and more "understanding users". Validating the latter is a task non-programmers can do.

NevermarkMay 27, 2026, 3:27 AM

That's an interesting reverse in dynamic.

Implication for manufacturing: Going robots first shouldn't aim at just re-localizing manufacturing, but aim higher. Become the new outsourced manufacturing destination.

piskovMay 27, 2026, 12:38 AM

How much time do you think it would pass before that guy comes back to reality and will lay off a bunch of agents? :-)

repeekadMay 26, 2026, 3:04 PM

I think the article is right about outsourcing but not from cheap offshored contractors, good experts will become more independent and be more enabled to support more clients with AI, meaning small and medium businesses won’t need internal as many engineers, finance, marketing, etc

CuriouslyCMay 26, 2026, 7:38 PM

The future of American frontier AI isn't API calls, it's you taking your task to OAI/Anthropic like a consultant/external entity, then getting a product or whatever back, without ever seeing a large volume of intermediate work. This is inevitable because of the combination of distillation threat and proprietary harness development effort required to push performance at the bleeding edge.

OAI/Anthropic are 100% going to try to take everyone's jobs, and "own" labor. The Chinese are the good guys here.

illusive4080May 26, 2026, 8:16 PM

No, because handing a project over the wall almost always ends in disaster. The requirements are never clear enough.

yandieMay 26, 2026, 11:06 PM

> it's you taking your task to OAI/Anthropic like a consultant/external entity

Good luck with that. This reminds me of the inspiration of declarative programming languages such as Prolog - you're supposed to declare the problems in such a way that the machine can solve it - rather than the imperative way where you tell the machine what to do. What they didn't realize that the definition is harder than the solution itself.

ecshaferMay 26, 2026, 3:06 PM

I have really been trying to get local models to work. I have tried different harnesses, tooling, skills, prompts, etc. But when I compare claude code with anthropic models or codex with gpt 5.5, vs qwen, glm or gemma and the same harnesses, the frontier models come out massively ahead. I am at the point where I just don't see the point of the non-frontier models, they waste more time than they save.

himata4113May 26, 2026, 10:37 PM

The more likely senario is that the bottom will disappear while the top becomes more productive via frontier models.

The weaker a developer is the higher capability AI requires. The entire premise of this article does not work because it confuses weak developers with weaker ai being better than strong developers with near atonomous ai. The weak developers with frontier ai already produce products that are worse than a capable developer paired with a weak (2 year old) AI.

To clarify: Strong developers 2 years ago could already leverage AI to produce high quality products whereas with latest and greatest AI weaker developers stills struggle strong developers can now delegate more of the work to the stronger AI increasing productivity further.

steve-atx-7600May 27, 2026, 6:00 AM

I a so happy that I currently work at a job with mostly competent senior engineers for once in my life. The nightmares of contractors or overhired new grads without supervision would just be so much more devastating on an organization these days.

joegibbsMay 27, 2026, 4:51 AM

Why would you ever offshore again now that we have LLMs? Offshored work was famous for its terrible quality and high prices, you'd just have to go back on everything, sit in a ton of useless meetings, make sure that you had very, very detailed design documents with every little piece accounted for.

Now you can put those detailed documents into the LLM and get a better result back in a couple of hours rather than weeks for a tenth or hundredth of the cost.

And the offshore devs are going to be using the LLMs themselves, why add another layer, level of bureaucracy, language barrier in between your requirements and the result?

steve-atx-7600May 27, 2026, 5:54 AM

There are plenty of folks that either don’t have a sense of pride or ownership for the products they are associated with. Or, they just do not have a deep sense of what it actually takes to ship quality products. You’d be surprised at how some less desirable places to practice software engineering are run.

zuzululuMay 26, 2026, 4:42 PM

I keep seeing this narrative involving Deepseek as an example of OSS LLMs but they are subsidizing a huge amount of tokens at cost and one can easily understand why they are doing it if one is not lazy and think critically.

It's still far too costly and not effective to use Local AI that can match what the frontier models can offer, especially when the inference hardware is being heavily restricted due to geopolitical risks. Claims about local LLMs somehow putting these frontier companies a run for their money I find especially doubtful in the long run.

Tokens are getting expensive because they are beginning to corner the market and will use that advantage to limit hardware distribution within and beyond the borders.

It's more likely that some workflows will see more local LLMs but those will never be the ones that require frontier model level or beat the price that a lighter smaller version of frontier model will offer to capture that tail end

throwa356262May 26, 2026, 5:15 PM

Do you have a source for your first claim?

My impression is that deepseek designed v4 specifically for cheap inference and they are not loosing money even at 75% lower price.

zuzululuMay 26, 2026, 10:06 PM

Do you ? Did you audit deepseek?

logicchainsMay 26, 2026, 5:04 PM

>they are subsidizing a huge amount of tokens at cost

This is absolutely false, because other providers serving the Deepseek models on OpenRouter are also able to offer very low prices, and they don't have the money to subsidize anything.

leonidasvMay 27, 2026, 4:14 AM

Sure, but they didn't spend on training the model. If DeepSeek is providing the model for the same price as third parties, then it's probably still losing money when you account for the training.

throwa356262May 27, 2026, 6:40 AM

Deepseek bypasses CUDA and has a few other optimisation that neither llama.cpp or vLLM support.

Furthermore, V4 pro was designed to run on 4 Huawei Ascend GPUs which are much cheaper than the nvidia setup others use, and deepseek probably also got some free hardware for their collab.

Hence it is entirely possible their inference costs are significantly lower than other providers.

zuzululuMay 26, 2026, 10:06 PM

That makes no sense....OpenRouter didn't create Deepseek

NortySpockMay 26, 2026, 11:13 PM

I don't think your counterpart is arguing that OpenRouter created DeepSeek. Rather I suspect their argument is that there are 13 providers listed on OpenRouter for DeepSeek v4 Pro that are competing on price. (That's the default balancing algorithm in OpenRouter, roughly: weighted towards the lowest price and was available in the last 30 seconds)

If any providers are able to turn able to sustainably turn a profit, OpenRouter allows them to compete in an open market to process your tokens (or anyone else's tokens).

Thus anyone subsidizing tokens bears the brunt of the compute load and gains not much more than name recognition and tokens to train on, but since switching to a different provider is a matter of changing one setting in the config panel (and can be set to auto-switch based on price), switching costs are very low. Providers of open models via OpenRouter have almost zero ability to lock-in users.

So this claim that all 13 providers are selling subsidized inference is... a tough claim to swallow. Maybe some of them are, but all of them? I assume at least some providers want to show profitablity, and are pricing their service accordingly.

https://openrouter.ai/deepseek/deepseek-v4-pro/pricing

https://openrouter.ai/docs/guides/routing/provider-selection

sourcecodeplzMay 26, 2026, 4:48 PM

Don't think so, from what i've heard deepseek isn't loosing money on inference.

illusive4080May 26, 2026, 8:14 PM

The author doesn’t address: A good engineer spends little comparative time coding versus other tasks for established projects. A good engineer understands the system end to end. Offshore developers are worse than Llama3.

digitaltreesMay 27, 2026, 3:22 AM

This is a really interesting point. It does seem that in recent days, probably driven by the approaching IPOs, anthropic and openAI have been testing increased prices and burning tokens to see just how much people are willing to spend and in all honesty seem to be targeting the salary of an SF engineer as their ceiling. They seem to think that people will be willing to spend nearly as much on tokens an a human engineer.

domrdyMay 26, 2026, 2:47 PM

For sure true for specialized ones like MedGemma (healthcare). In my testing, the 27b model is at least on the same level as frontier, and in some cases outperforms them. 4B is insanely good too for some lighter workloads. Thanks G for working on this!

leonidasvMay 27, 2026, 3:55 AM

We shouldn't take free open models for granted. They're a byproduct of the current AI craze, but the economics aren't on their side. It's not sustainable. Alibaba already stopped releasing the weights for their best models, for instance.

jmullMay 26, 2026, 4:46 PM

> (Human + an almost frontier LLM) vs Frontier LLM

I'm curious, who/what is operating the frontier LLM in this scenario?

The rest of the article is equally incoherent.

regexorcistMay 26, 2026, 5:02 PM

I've been saying this for a couple months now since I got decent hardware and started using my local Qwen 3.6 exclusively. I have no doubt the future for individuals and medium-sized companies is local private AI.

bobimMay 26, 2026, 8:00 PM

Could you share some of your hardware details for Qwen 3.6? And are you using the dense or MoE variant?

regexorcistMay 26, 2026, 9:31 PM

Sure, I have a 64G MBP with an M1 Ultra. The best model for me by far has been the 35B A3B, in particular the 8Q_KL unslouth variant. The dense model works but it's much slower, and I don't really see a difference in quality with a good harness.

koyoteMay 27, 2026, 1:01 AM

What do you use as a harness?

hypferMay 26, 2026, 9:41 PM

Qwen3.6-27B-UD-Q4_K_XL can run at 45t/s with 131k q8 context on an RTX 4090.

That is pretty usable. You could get 65t/s or more with MTP, but only if you drop the context size, which I would advise against.

Results are better with 256k context and a larger quant, however, that's not going to fit on the 4090 you already had lying around for playing cyberpunk 2077.

The MoE models make me rather unhappy. Idk. They feel braindead to me, but YMMV.

mark_l_watsonMay 26, 2026, 4:21 PM

Great article that reinforces my own opinion but adding the cleverness of adding low cost human labor into the equation. Nice.

I spent a month comparing Gemini Ultra plan to using much lower cost DeepSeek v4 with open source coding harnesses and, spoiler alert: I was happier using the much cheaper and more environmentally friendly open models: https://marklwatson.substack.com/p/my-evaluation-of-ai-agent...

kridsdale1May 26, 2026, 7:15 PM

FYI, Gemini is the most environmentally friendly model.

https://share.google/aimode/a0O95wzk2UUhIXLUI

https://cloud.google.com/blog/products/infrastructure/measur...

mark_l_watsonMay 26, 2026, 9:11 PM

Thanks for the references.

BTW, Google is my pick for the winner in the USA tech giants AI race. I worked at Google about 12 years ago and was impressed by their use of renewable energy, etc.

jillesvangurpMay 26, 2026, 2:34 PM

I've been pretty happy sticking with codex 5.4 medium. I don't see a good case for switching to 5.5 at the cost of going through my token budget quicker.

There are misaligned incentives here between users just trying to get stuff done and AI companies competing on having the "smartest" model that passes benchmarks and continuously does some nobel peace price winning stuff. It's mostly overkill for the more mundane stuff normal people actually do with them. It's nice to have the option when you need that. But defaulting to that is not economical and a bit unnecessary.

There's also a difference between smart models and bigger context windows. Most of the progress in the last year was simply the context windows getting big enough to fit all/most of the stuff needed to solve issues. Before then, you had to carefully manage the context to not run out of space and they wouldn't fit much more than small hobby projects.

With sub agents, the parent agent doesn't need to be a frontier model. It can delegate to smarter agents. And most stuff it delegates shouldn't need a frontier model. Wouldn't it be nice if it could decide on a case by case basis.

The walled gardens offered by OpenAI, Antrhopic, and others currently default to one size fits all "frontier" models. This is not sustainable. They should evolve to using smaller and effective models most of the time with complexity based escalation as needed based on either estimated complexity or when the small models fail. I'm guessing some open source based alternatives to these walled gardens are probably already heading that direction.

The irony here is that with a walled garden, these companies are selling a premium experience. But in the current market that boils down to burning billions of investor cash to keep the GPUs going without much hope on profitability. Eventually surviving companies are going to have to compete on quality, cost and margins. The smart approach would be to dynamically adapt token and context window sizes instead of blindly defaulting everything to the best possible. Don't boil the oceans for a simple email summary or a simple web UI. That stuff already worked well enough with models even a few years ago.

prasoon2211May 26, 2026, 2:49 PM

I used to be on 5.4 high for most of my work. I have switched completely to 5.5 medium now. I would highly recommend trying it out

- 5.5 is significantly more token efficient than 5.4 - the same task takes often a third of the tokens

- because of this, is it also much faster to do the task

- you get high "intelligence" per token even after accounting for token efficiency - 5.5 medium is just under 5.4 pro levels of intelligence (imo). It has found tricky bugs for me that all other models failed at

So overall, ideally you will end up with more intelligent, faster model for slightly cheaper.

thisisembarMay 26, 2026, 3:08 PM

This is embarrassing but I find 5.4-mini on Low covers a substantial part of my and my colleagues work.

Back when it became expensive I learned to live with it and I find my "AI skills" (mainly communication) have a substantial impact on the efficiency of the model. Not saying my work is difficult, it's not, but I find there is quite a bit of wiggle room. Smaller models can still perform useful work, but you have to do the heavy lifting yourself. It saves a ton of money.

I used to burn through 75% of my tokens in an hour or two. Now I can work all day and hit maybe 50-60% if I use it heavily.

wd021May 26, 2026, 10:34 PM

[dead]

dawnerdMay 26, 2026, 3:07 PM

We trialed 5.5 and the same queries produced worse results. Not worth the cost increase. Even if there’s a token efficiency gain the higher cost wipes that out.

samtheprogramMay 26, 2026, 2:31 PM

$1100/m for an outsourced engineer… am I missing something? That’s far too low. Even juniors in South America tend to ask for at least double that number before factoring in the DeepSeek cost.

ShalomboyMay 26, 2026, 3:06 PM

I thought the same thing. The author's reference point for LCOL developer seems a bit outdated. With what we pay our teammates in Colombia, the model pushed out to 22 months before crossover.

ZeroCool2uMay 26, 2026, 5:38 PM

A crucial factor tech industry folks tend to ignore is how much executives value predictable costs. Cloud migrations got away with this, but still had to argue fiercely, because 'the cloud' and its serverless tech had the potential to significantly decrease overall spend for unpredictable, bursty workloads.

The usual counter-argument is the operational burden, but human capital is also a relatively fixed cost. A dedicated team of 3-5 FTEs could probably handle inference ops for a F500 company.

Meanwhile, the capability delta is shrinking fast. We have more evidence that local open-source is viable with the release of DeepSeek v4, and the industry is only trending further in this direction. Especially as we rely more on test-time compute and task-specific harnesses rather than model size.

So, if you're an executive looking at a marginal but fixed operations cost, added flexibility, and a rapidly closing gap in capability, why wouldn't you just run open-source models on your own infrastructure to get those highly predictable costs? Plus, you decrease the risk of one of the frontier

bitmasher9May 26, 2026, 5:42 PM

Do you really want to buy the 3rd or 4th most intelligent AI?

There’s so much uncertainty, it seems like the safe option is to give everyone a Claude or OpenAI subscription/api key until the frontier isn’t changing every six months.

qudatMay 26, 2026, 10:27 PM

I'm still thinking this through but I was arguing this position to colleagues to some shock: LLM's are a race-to-the-bottom and frontier models will not be able to afford to work on coding specific models (or coding features at all) in the very near future.

27B is already really good at coding-specific tasks. Fundamentally, there is little innovation on the core architecture: LLMs are all designed essentially the same, with minor differences in how they are trained. They are all feed-forward multi-headed attention models; it doesn't matter if it's a 4B model or a 1T model, that's just scale.

Further, the frontier models cannot afford to innovate: they have to scale as quickly as possible to "beat out" their competition. The frontier models fundamentally will not create the next "attention is all you need" monumental jump in AI.

Frontier companies are stuck on scale with zero capacity to innovate. You cannot point capitalism at "basic science research" and expect any ROI. This is a known reality. Innovation is much more indirect and a "random walk" style of knowledge acquisition.

Finally, these LLMs are quite literally designed with a human-in-the-loop, and we do not give ourselves enough credit for how well we ourselves tool-call. We are doing a lot of heavy lifting to make these models useful and you cannot simply remove us from the equation without also removing ourselves from the training pipieline.

luguMay 26, 2026, 11:32 PM

There hasn't ever been in human histore more incentive to innovate than today, and you think, the best lab won't innovate. That is crazy. It is like anyone can do AI research. Of course there will be new architectures. We just discovered the steam engine and the combustion engine is coming.

bob1029May 26, 2026, 7:52 PM

> The current closed source frontier models are more capable than the latest from DeepSeek. But is the capability difference enough to justify a 30x price difference?

We talk about capability like it's some kind of linear scale. I am not paying 30x for 30x performance. I am paying 30x so that my use case goes from "haha nope" to a signed contract with the client. Works 0% of the time => works 3% of the time is an infinite improvement in capability. That is what the premium is paying for.

upcoming-sesameMay 26, 2026, 8:36 PM

Deepseek works just fine

lmeyerovMay 26, 2026, 3:35 PM

Fwiw, the cost per answer, which is what ultimately matters, is going down. In a competitive market with oss and multiple frontier labs, it is hard to maintain a premium long-term.

The big question is how subsidies vs technology improvement will play out. As we saw with Uber, selling at a loss can happen for a very long time, and technology improves relentlessly.

For reference, we publish https://botsbench.com/ that shows time and cost per answer are going down while quality is going up.

LetsGetTechniclMay 26, 2026, 8:24 PM

So all that hype for the AI revolution just for it to be... taking advantage of cheap overseas labor after all?

swader999May 26, 2026, 5:32 PM

I'm finding sound judgment, common sense, technical depth and breadth, a feel for the UX are skills that amplify Agentic coding. Deep knowledge of the problem domain and time with the customer (or SME's or end users) are what build these. Outsourcing this will never work, you can't put someone 12 hours ahead of the timezone your serving in front of the customer.

DonsDiscountGasMay 27, 2026, 2:03 AM

The article assumes 5% monthly growth in token prices continuously. That seems aggressive.

cautiouscatMay 26, 2026, 2:26 PM

The dark mode version of the site makes the tables unreadable.

the_arunMay 26, 2026, 2:30 PM

Agreed, but same data is listed right below the table.

GodelNumberingMay 26, 2026, 2:32 PM

Thanks for flagging, fixed

nyxtomMay 26, 2026, 4:47 PM

I've seen the $1000/mo engineer salary thrown around a bit and I'm not even sure where it comes from.

ianhxuMay 26, 2026, 2:29 PM

>frontier models are more capable than the latest from DeepSeek. But is the capability difference enough to justify a 30x price difference?

The contradiction here is that without frontier models, there'd be no foundation for models like DeepSeek to reference and catch up to. Is there an economic model that captures this kind of dynamic?

aftbitMay 26, 2026, 2:41 PM

Free market competition? This is a pretty classic pattern. Leaders capture market with quality but run into trouble scaling, followers compete on price and availability. Given time, leaders eventually run out of upgrade runway and find themselves swallowed up by followers. Or alternatively, leaders think their lead is inevitable and miss a sea change or iterative upgrade path. Think IBM PCs before Compaq and other cheap clones ate their lunch.

bee_riderMay 26, 2026, 3:17 PM

I guess they’d be hoping for very protective IP laws in that case.

throwa356262May 26, 2026, 5:29 PM

Hold on mate, do you realize that a significant number of recent major advances in AI came from deepseek?

hmokiguessMay 26, 2026, 3:53 PM

I think the biggest pull is yet to come, legislation around sovereignty and the US Cloud Act is sort of a challenge for the US hyperscalers, these local models may have more than just a price advantage against frontier labs but also policy and lobbying.

rastrojero2000May 26, 2026, 3:31 PM

It's particularly funny to me, but a minor point, that this post requires me to go through some kind of cloudflare armed checkpoint to dare read about AI.

A bigger issue is this thing calls AIs better coders than people and I have tried for the past 4 months to get one of the several I looked into to consistently produce a simple event-bus backed Java monorepo going with exactly zero success. Claude even repeatedly wanted to put my login logic at the actual event bus, for some reason.

What does "better coder" _exactly_ mean at this point?

Our_BenefactorsMay 26, 2026, 8:07 PM

AI has well and finally killed the idea that outsourcing saves cost. Local AI, sure, but not outsourcing. Been there, done that, doesn’t work.

alansaberMay 26, 2026, 2:22 PM

Always has been. People pay for the (not so) marginal performance gains.

economistbobMay 26, 2026, 6:17 PM

Deliberately combining hallucinations with a smaller fund of localized knowledge with which to spot said hallucinations seems like a bad business decision.

the_arunMay 26, 2026, 2:33 PM

Premium services need to allow enterprises to self host the services to reduce cost of inference. Another advantage is data doesn't leave the VPNs.

endofreachMay 26, 2026, 9:16 PM

"I phrase can this words a way not make sense. seems! but point across still!"

AI can turn it into a pseudo-poem or a 4 pages document. Or it can just fix the grammar. But it doesn't really change the point of the sentence– nor does it fix the actual issue with it.

Similar for code: There are codebases with lots of smells and really dirty parts, yet, that are still better than methodically clean ones that just don't "get to the point".

I am so sick of all the AI bloat. People were able to hide their incapability behind unnecessarily complex frameworks or obscuring it through "clean code" concepts. Now LLMs give those uninspired people the option to invest even less of what makes worthy software and hide it in more abstraction.

Just: AHA! (AI won't)

rightlaneMay 26, 2026, 3:07 PM

I disagree with every part of this.

Local LLMs are great and very useful but if you are claiming that their code quality is in the same ballpark as Claude Code or Codex with their best models I cannot consider you a serious person. I feel like this is analogous to the folks arguing that The Cloud is "someone else's computer." As if billions of dollars of spend gives these companies zero benefit over a Mac mini.

Regarding offshore, at least in my experience, better coding agent output is down to two factors. First, is subject matter expertise. Providing the right context to the coding agent based on the tech you are building for is beyond critical. That's the issue with the Vibe Coded slop projects. No expertise in a technology means no awareness of gotchas, React is the most obvious because the LLM default is to useEffect endlessly.

The bigger issue is that by their very nature LLMs are very sensitive to quality prompting in English. I have seen offshore devs fail endlessly because they don't have the English skills to successfully prompt the machine. That has caused more work for my US based devs to either carefully tune the work ticket so it is basically a coding agent prompt. Or to go through multi day exercises to enforce better prompting.

A single US dev with Claude Code is orders of magnitude better than typical offshore. Adding local models into the mix would make offshore completely useless. I'm sure many companies will see ballooning AI bills and expensive onshore devs and be very tempted to go to TCS or similar. I hope so, because that will give startups plenty of easy targets to disrupt.

hypferMay 26, 2026, 9:34 PM

The cloud is in fact someone else's computer(s).

ForHackernewsMay 26, 2026, 3:09 PM

Local models will get improve. They will get smaller, faster, cheaper. This is already happening https://dev.to/bspann/bitnet-microsofts-1-bit-llms-that-run-...

AI will become a commodity technology the same way virtual machines are a commodity.

themafiaMay 27, 2026, 2:04 AM

What if.. and this is a big if.. but.. what if we just paid people what they're actually worth?

NitpickLawyerMay 26, 2026, 2:45 PM

> But is the capability difference enough [..]

This is the (m/b)illion dollar question, isn't it? I think there's also a question of what do you think capability is exactly, and how the difference manifests itself.

On the one hand, when something becomes "good enough" that's a clear capability threshold. On the other hand, what's the limit of those capabilities, and equally as important, how does capability reflect on reliability?

We've seen "local models" lately improve on capabilities where they're "good enough" for some tasks. Reliability of solving those tasks is a bit harder to measure/benchmark/test. It'll get better as more people work with those models. But, something I've noticed in the past ~6months is that the frontier models are gaining a lot in both the breadth of capabilities, as well as the reliability of solving those tasks that they're capable of solving. I think this is where scaling (both compute and data) is showing, and where having more compute is simply better (more parallel exploration, more training data output, more broad data, etc).

There's also the problem of benchmarking true capabilities. The popular ones are getting old, and aren't as reliable as they used to be (not even touching on the subject of benchmaxxing, just thinking about their saturation, even with honest intentions).

So the question then becomes what will users prefer? Do you get the best of the best, or the one that's good enough? There might be a market for both, honestly. Not everyone does SotA stuff. And a lot of what people used to do in a company is probably mundane enough that a "good enough" model with "good enough" reliability can probably handle (w/ some supervision ofc).

What I'm more interested in is if things like Thaalas succeed and they get to provide local hardware that runs models "burned in silicon". That would be interesting, because speed and all the advantages of local models are a "quality" on their own. For example, right now I'd pay ~1k$ for an external hdd-sized block that can run a ~32B model that's popular right now, even knowing that it can only run that model. I have no idea if that's feasible or not, if it makes sense from a financial pov. But I'd buy one. And local inference on dedicated chips doesn't need to be "oss only". I'm sure oAI / etc would probably take the risk of licensing one of their -mini / -lite models provided that the risk of the weights leaking is small enough (and it probably is).

> This keeps a ceiling on how much or how fast the frontier labs can raise prices.

I generally agree, but from a different perspective. Up till now we've seen that the 3 labs influence each other's price points. When gpt5 came out at a radically smaller price, the others lowered them as well. Now with opus being SotA for coding, w/ 5.5 close behind, they've raised them back. Google seems to follow slowly. But there's hope that, being 3 top labs + 2 trailing (xAI & Meta), there'll be pressure once again. If any of those trailing labs manage to get to SotA again, the prices will drop once more. Some people say that open source also provides a pressure here, but I'm not yet convinced of this. There's still a question of who'll serve the models, at what scales, etc.

jqpabc123May 26, 2026, 12:38 PM

The current closed source frontier models are more capable than the latest from DeepSeek. But is the capability difference enough to justify a 30x price difference?

"Frontier models" are caught in a financial dilemma of their own making --- they have spent such huge sums on development and as a result, they may have inadvertently priced themselves out of the market.

Energy costs are a huge factor for AI. He who has the lowest energy costs will likely be able to dictate market prices. And fossil fuels dependence doesn't look to be advantageous for AI.

scotty79May 26, 2026, 7:50 PM

> Human + an almost frontier LLM

I tried this. My role as a human boiled down to recognizing when I need to switch to frontier model for the last mile.

lowbloodsugarMay 26, 2026, 6:17 PM

If IT is a cost center, then a company has likely already outsourced (and if it's called IT it probably is). If you are a software development company, that makes money from software, then a local team of SDEs using what-ever AI they want is a competetive advantage vs local team of SDEs trying to deal with an 11.5 hour gap to India. AI is coming for software developer jobs, and its coming for: a/ the low skill ones and b/ the high skill ones where turn-around and iteration matters. I've worked with great engineers in India, but the time difference was brutal for our fast moving business.

crimsoneerMay 26, 2026, 2:26 PM

I think this is a compelling argument, but I think 2 issues:

1. I remain unconvinced LocalAI can work well for majority of businesses. It looks vaguely comparable on benchmarks, but it tends to be fragile and a lot of management overhead in reality.

2. Similarly, while Deepseek is comparable to Opus/Codex on benchmarks, for agentic work at scale I definitely notice the difference. That's not to say it's not economical, just that I definitely miss the big boys when I swap.

I kind of wish this was true, because the UK would be in a great place to compete with the US. But somehow people are happy to pay 3x the salary for an engineer in SF.

hobofanMay 26, 2026, 2:44 PM

> It looks vaguely comparable on benchmarks, but it tends to be fragile and a lot of management overhead in reality.

I'm working on an self-hostable LLM (web) UI[0] that aims to provide a comparable good UX to e.g. ChatGPT, and you are right that there is a decent amount of fragility involved, and more management overhead than most people would expect.

However, we usually find that those details happen a lot more in e.g. the harness (= out application), or some prompt tuning that's required for each of the models, rather than model quality itself. We have seen customers using self-hosted LLMs with similar user satisfaction across their organization to other customers that heavily lean on latest GPT-5 models on Azure. Especially given that you have to do some level of tuning and setup anyways, you might as well invest it in "local"/self-hosted AI (if you can make the financials of the inference cost work out for you).

I think it should also be noted that the inference providers on hyperscalers also tend to be quite fragile, each in their own way (e.g. Google with a horrible rate limit system or Azure with almost weekly intermittent 500-error incidents).

[0]: https://github.com/EratoLab/erato

GodelNumberingMay 26, 2026, 2:42 PM

Fair points. I used to think that until some months ago but the latest generation of OSS models are surprisingly good. Plus maybe it is the way I work, but I find myself constantly overriding the decisions of frontier LLMs (because they start degenerating towards god objects and spaghettification) so most use I have gotten out of the AI agents is really their ability to code quickly and syntactically correctly.

Also worth noting that it doesn't have to be full either-or, there can be a two tier enterprise deployment that routes to locally hosted vs frontier model, over time more and more usecases could get routed to local LLM

aftbitMay 26, 2026, 2:43 PM

I wish Deepseek could read images. I've been having good luck guiding it around on personal projects, but anything that needs to render to a screen really needs to be looked at to see bugs.

dyauspitrMay 26, 2026, 2:24 PM

Only if you don’t allow construction of local data centers

rgbrennerMay 26, 2026, 2:29 PM

US has over 10x the number of data centers as China; and produces 2x more energy per capita than China.

chrisweeklyMay 26, 2026, 2:37 PM

what about energy consumption per capita?

aftbitMay 26, 2026, 2:45 PM

What about it? Energy production basically has to equal energy consumption in the medium term, so if the grandparent comment is correct, it is 2x per capita.

Dunno how trustworthy this source is, but it says ~35 MWh/person in China and 77 MWh/person in USA.

https://ourworldindata.org/grapher/per-capita-energy-use

joe_mambaMay 26, 2026, 2:26 PM

I can name one big country that won't disallow data centers.

theologanMay 26, 2026, 3:07 PM

This is bogus.

mahmedalamMay 26, 2026, 3:13 PM

First fix your website navbar and hero on mobile that was broken, and it shows that you vibe coded a slop!!!

byrohitrajanMay 27, 2026, 4:28 AM

[flagged]

78DegreesMay 26, 2026, 8:46 PM

[flagged]

stelsmindMay 26, 2026, 10:02 PM

[flagged]

falcons-edgeMay 26, 2026, 8:41 PM

[flagged]

codecharmhqMay 27, 2026, 5:09 AM

[flagged]

skeeganMay 26, 2026, 10:34 PM

[dead]

MagicMoonlightMay 26, 2026, 7:47 PM

[dead]

StevvoMay 26, 2026, 3:01 PM

I don't see local AI taking off. Memory costs make it impractical. Deepseek API pricing is not a suitable analogue because it's not local.

Outsourcing plus local AI will soon become more economical vs. frontier labs

Comments