On the hardware side you can run some benchmarks on the hardware (or use other people's benchmarks) and get an idea of the tokens/second you can get from the machine. Normalize this for your usage pattern (and do your best to implement batch processing where you are able to, which will save you money on both methods) and you have a basic idea of how much it would cost per token.
Then you compare that to the cost of something like GPT5, which is a bit simpler because the cost per (million) token is something you can grab off of a website.
You'd be surprised how much money running something like DeepSeek (or if you prefer a more established company, Qwen3) will save you over the cloud systems.
That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.
Or with somebody else's.
If you don't have strict data residency requirements, and if you aren't doing this at an extremely large scale, doing it on somebody else's hardware makes much more economic sense.
If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size. If you don't have enough requests to keep GPUs properly fed 24/7, those GPUs will end up underutilized.
Sometimes underutilization is okay, if your system needs to be airgapped for example, but that's not an economics discussion any more.
Unlike e.g. video streaming workloads, LLMs can be hosted on the other side of the world from where the user is, and the difference is barely going to be noticeable. This means you can keep GPUs fed by bringing in workloads from other timezones when your cluster would otherwise be idle. Unless you're a large, worldwide organization, that is difficult to do if you're using your own hardware.
Isn't that true for any LLM, MoE or not? In fact, doesn't that apply to most concepts within ML, as long as it's possible to do batching at all, you can scale it up and utilize more of the GPU, until you saturate some part of the process.
What's cheap nowdays? I'm out of the loop. Does anything ever run on integrated AMD that is Ryzen AI that comes in framework motherboards? Is under 1k americans cheap?
[1] https://youtube.com/@digitalspaceport?si=NrZL7MNu80vvAshx
When used with crush/opencode they are close to Claude performance.
Nothing that runs on a 4090 would compete but Deepseek on openrouter is still 25x cheaper than claude
Is it? Or only when you don’t factor in Claude cached context? I’ve consistently found it pointless to use open models because the price of the good ones is so close to cached context on Claude that I don’t need them.
Things get a lot more easier at lower quantisation, higher parameter space, and there's a lot of people's whose jobs for AI are "Extract sentiment from text" or "bin into one of these 5 categories" where that's probably fine.
And without specifying your quantization level it's hard to know what you mean by "not usable"
Anyway if you really wanted to try cheap distilled/quantized models locally you would be using used v100 Teslas and not 4 year old single chip gaming GPUs.
Uh, Deepseek will not (unless you are referring to one of their older R1 finetuned variants). But any flagship Deepseek model will require 16x A100/H100+ with NVL in FP8.
We can't judge on training cost, that's true.
You are right that we can directly observe the cost of inference for open models.
A few days ago I read an article saying the Chinese utilities have a pricing structure that favors high-tech industries (say, an AI data center), making the difference by charging more the energy-intensive but less sophisticated industries (an aluminium smelter, for example).
Admittedly, there are some advantages when you do central and long-term economic planning.
That's how it supposed to work.
I hope it does, though I'm doubtful because distribution is important. You can't beat "ChatGPT" as a brand in laypeople's minds (unless perhaps you give them a massive "Temu: Shop Like A Billionaire" commercial campaign).
Closed source AI is almost by design morphing into an industrial, infrastructure-heavy rocket science that commoners can't keep up with. The companies pushing it are building an industry we can't participate or share in. They're cordoning off areas of tech and staking ground for themselves. It's placing a steep fence around tech.
I hope every such closed source AI effort is met with equivalent open source and that the investments made into closed AI go to zero.
The most likely outcome is that Google, OpenAI, and Anthropic win and every other "lab"-shaped company dies an expensive death. RunwayML spent hundreds of millions and they're barely noticeable now.
These open source models hasten the deaths of the second tier also-ran companies. As much as I hope for dents in the big three, I'm doubtful.
Even when the technical people understood that, it would be too much of a political quagmire within their company when it became known to the higher ups. It just isn’t worth the political capital.
They would feel the same way about using xAI or maybe even Facebook models.
https://sg.finance.yahoo.com/news/airbnb-picks-alibabas-qwen...
The fact that it's customer service means it's dealing with text entered by customers, which has privacy and other consequences.
So no, it's not "pretty inconsequential". Many more companies fit a profile like that than whatever arbitrary criteria you might have in mind for "consequential".
2020 - I was a mid level (L5) cloud consultant at AWS with only two years of total AWS experience and that was only at a small startup before then. Yet every customer took my (what in hindsight might not have been the best) advice all of the time without questioning it as long as it met their business goals. Just because I had @amazon.com as my email address.
Late 2023 - I was the subject matter expert in a niche of a niche in AWS that the customer focused on and it was still almost impossible to get someone to listen to a consultant from a shitty third rate consulting company.
2025 - I left the shitty consulting company last year after only a year and now work for one with a much better reputation and I have a better title “staff consultant”. I also play the game and be sure to mention that I’m former “AWS ProServe” when I’m doing introductions. Now people listen to me again.
All tech companies offering free services.
I’m not saying this to insult the technical capabilities of Uber. But it doesn’t have the economics that most tech companies have - high fixed costs and very low marginal costs. Uber has high marginal costs saving a little on inference isn’t going to make a difference.
Obviously, some US brands do not compete on price, but other than maybe Jeep and Tesla, those have a small market penetration.
All the clouds compete on price. Do you really think it is that differentiated? Google, Amazon and Microsoft all offer special deals to sign big companies up and globally too.
Microsoft doesn’t compete on price. Their major competitive advantage is Big Enterprise is already big into Microsoft and it’s much easier to get them to come onto Azure. They compete on price only when it comes to making Windows workloads Bd SQL Server cheaper than running on other providers.
AWS is the default choice for legacy reasons and it definitely has services an offerings that Google doesn’t have. I have never once been on a sales call where the sales person emphasizes that AWS is cheaper.
As far as GCP, they are so bad at evterprise sales, we never really looked at them as serious competition.
Sure AWS will throw credits in for migrations and professional services both internally and for third party partners. But no CFO is going to look at just the short term credits.
Despite all that and whatever you say, the fact is you do compete. It doesn't have to be a race to the bottom.
So Cloudfront free tier and the latest discount bundles etc aren't to compete? People have also negotiated private pricing way below list price and a lot cheaper than competitors.
Similarly was the Dynamodb price cuts not due to competition?
I can give way more examples...
All technology gets cheaper over time. There is a difference between lowering price in response to competitors and finding the profit maximizing price based on supply and demand.
AWS was lowering prices to increase demand before GCP and Azure were a thing.
Jassy said right before he became CEO of Amazon and he was still over AWS that only 5% of IT spend was on any cloud provider. They are capturing non consumption and marketing value of AWS vs that.
While I don’t have any insider experience about Azure, looking on the outside, I would think that Azure’s go to market is also not competing against AWS on price, but trying to get on prem customers on Azure.
And most startups are just doing prompt engineering that will never go anywhere. The big companies will just throw a couple of developers at the feature and add it to their existing business.
Before that I spent 6 years working between 3 companies in health care in a tech lead role. I’m 100% sure that any of those companies would I have immediately questioned my judgment for suggesting DeepSeek if had been a thing.
Absolutely none of them would ever have touched DeepSeek.
If you'd spent anytime working at one for swe you won't have access to popular open source frameworks, let alone Chinese LLMs. The LLM development is mostly occurring through collaborations with the regional LLM businesses or internal labs.
https://www.ecfr.gov/current/title-17/chapter-II/part-240/su...
Note: I am neither a lawyer nor in financial circles, but I do have an interest in the effects of market design and regulation as we get into a more deeply automated space.
https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-mode...
Of course you’ll always have exceptions (government, military and etc.), but for private, winner will take it all.
Any kind of hardware that is somehow connected to the wired or wireless communication interfaces is much more dangerous than any software.
Backdoors embedded in such hardware devices may be impossible to identify before being activated by the reception of some "magic" signals from outside.
Companies just need to get to the “if” part first. That or they wash their hand by using a reseller that can use whatever it wants under the hood.
Although I did just check what regions AWS bedrock support Deepseek and their govcloud regions do not, so that's a good reason not to use it. Still, on prem on a segmented network, following CMMC, probably permissable
Chinese models generally aren't but DeepSeek specifically is at this point.
Well for non-American companies, you have the choice between Chinese models that don't send data home, and American ones that do, with both countries being more or less equally threatening.
I think if Mistral can just stay close enough to the race it will win many customers by not doing anything.
I'm not sure if technical people who don't understand this deserve the moniker technical in this context.
American companies chose to manufacturer in China and got all surprised Pikachu when China manufactured copies for themselves.
https://www.reuters.com/world/europe/us-security-agency-spie...
American intelligence has penetrated most information systems and at least as of 10 years ago, was leading all other nations in the level of sophistication and capability. Read Edward Snowden.
Rather, I'd say it speaks more about how deranged the post-snowden/anti-neocon figures have become, from critiquing creeping authoritarianism to functionality acting at the behest of an even more authoritarian regime. The funny thing is that behavior of deflection, moralizing and whataboutism is exactly the kind of behavior nationalists employ, not addressing arguments head on like the so-called "American nationalists".
Now on the HN frontpage: "Google Antigravity just wiped my hard drive"
Sure going to be hard to distinguish these Chinese models' "intentionally malicious actions"!
And the cherry on top:
- Written from my iPhone 16 Pro Max (Made in China)
Even if China did manage to embed software on the iPhone in Taiwan, it would soon hopefully be wiped since you usually end up updating the OS anyway as soon as you activate it.
You should remember that all iPhones had for several years an undetected hardware backdoor, until a couple of years ago, when independent researchers have found it and reported the Apple bugs as CVEs, so Apple was forced to fix the vulnerabilities.
The hardware backdoor consisted in the fact that writing some magic values to some supposedly unused addresses allowed the bypassing of all memory protections. The backdoor is likely to have consisted in some memory test registers, which are used during manufacturing, but which should be disabled before shipping the phone to customers, which Apple had not done.
This hardware backdoor, coupled with some bugs in a few Apple system libraries, allowed the knowledgeable attackers to send remotely an invisible message to the iPhone, which was able to take complete control over the iPhone, allowing the attacker to read any file and to record from cameras and microphones. A reboot of the iPhone removed the remote control, but then the attacker would immediately send another invisible message, regaining control.
There was no way to detect that the iPhone was remotely controlled. The backdoor was discovered only externally in the firewalls of a company, because the iPhones generated a suspiciously high amount of Internet traffic, without apparent causes.
This has been widely reported at the time and discussed on HN, but some people continue to be not aware about how little you can trust even major companies like Apple to deliver the right hardware.
The identity of the attackers who exploited this Apple hardware backdoor has not been revealed, but it is likely that they had needed the cooperation of Apple insiders, at least for access to secret Apple documentation, if not for intentionally ensuring that the hardware backdoor remained open.
Thus the fact that Apple publishes only incomplete technical documentation has helped only the attackers, allowing them to remain undiscovered for many years, against the interests of the Apple customers. Had the specifications of the test registers been public, someone would have quickly discovered that they had remained unprotected after production.
Therefore, for many years the iPhones of certain valuable targets had magically intercepted all their communications and they have sent them to an unknown country (due to the nature of some of the identified targets and the amount of resources required to carry the attacks, it has been speculated that the country could have been Israel, but no public evidence exists; a US TLA is the main plausible alternative, as some targets were Russians).
put another way, how do you propose to tell this subtle nefarious chinese sabotage you baselessly imply to be commonplace from the very real limitations of this technology in the first place?
This makes EU countries more reliable partners for cooperation than China. The same goes for the US from an European perspective, and even with everything going on over there it is still not remotely close.
All states are fundamentally adversaries because they have conflicting interests. To your point however, adversaries do indeed cooperate all the time.
Please don't engage in political battle here, including singling out a country for this kind of criticism. No matter how right you are or feel you are, it inevitably leads to geopolitical flamewar, which has happened here.
remember when the US bugged EU leader's phones, including Merkel from 2002 to 2013?
Please don't be snarky or condescending in HN comments. From the guidelines: Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.
The topic itself, like any topic, is fine to discuss here, but care must be taken to discuss it in a de-escalatory way. The words you use and the way you use them matter.
Most importantly, it's not OK to write "it is however entirely reasonable to assume that the comment I replied to was made entirely in bad faith". That's a swipe and a personal attack that, as the guidelines ask, should be edited out.
Other than that let's be very clear that there was no personal attack. You left out the part where I explain why I think the comment was made in bad faith. I.e. the part that makes it not a personal attack. And a part which I, upon request, elaborated on in the same comment tree.
As you said: Words matter.
And yes I am a moderator and it's my role to prevent flamewars and to encourage everyone to raise the standard of discourse here. In my comment I was trying to convey that multiple comments of yours were crossing too far into political battle and personal attack, and here are the main instances:
> That is just objectively incorrect, and fundamentally misunderstanding the basics of statehood
This counts as a personal swipe, and as fulminating.
> It is however entirely reasonable to assume that the comment I replied to was made entirely in bad faith
People can be mistaken or wrong, or just of a different opinion/assessment, without acting “entirely in bad faith”.
> "Baselessly" - I'm sorry but realpolitik is plenty of basis. China is a geopolitical adversary of both the EU and the US. And China will be the first to admit this, btw.
This is phrased in a snarky way.
The points you've made are fine to make, but the way you make them matters. Snarkiness, swipes, put-downs, accusations of bad faith (giving your reason "why" you think it was in bad faith doesn't make it OK) are all clearly against the guidelines.
I can accept that you didn't mean to break the guidelines, which is why I've politely asked you to familiarise yourself with them and try harder to follow them in future. It's a request not a scolding. It's not necessary to announce you want to quit HN in protest. (Though of course, eventually we would rather people leave if they prefer not to follow the guidelines.) Just making an effort to respect the guidelines and the HN community would be great.
I have not made a single personal swipe in this entire comment tree. I have stated (implied) that certain views are not consistent with a cursory introduction to the topic at hand.
I absolutely assumed a basic familiarity with the concept of a state from a comment on the relationship between states. That is good faith and basic respect for the human you are conversing with as I view it.
Overall, I have kept a tone I would prefer be kept towards myself; fake politeness is just condescending.
That being said: Your site, your rules, and your power to arbitrarily interpret and enforce said rules. I.e., message received, regardless of my thoughts on your interpretation of events.
We don't want you to be fake. We just want you to make the effort to share your perspective in a way that is kind and is conducive to curious conversation, which is HN's primary objective. We know it can be hard to get this right when commenting on the internet. It's common for people to underestimate how hostile their words can come across to others, when they seem just like reasonable, matter-of-fact statements when formulated in one's own mind.
> That being said: Your site, your rules, and your power to arbitrarily interpret and enforce said rules
That's not really it. The community holds the power here; when we try to override broad community sentiment and expectations, the community pushes back forcefully.
Your comments got my attention because they were attracting flags and downvotes from the community, and from looking at these comments and earlier ones in your feed, my assessment is "yes, I can see why". (We don't let community sentiment, or "mob rule" win out all the time; we often override flags if we think they're unfair, but in your case, given the pattern we observe over time, we think the community's response is reasonable.)
If you are getting at the fact that it is sometimes beneficial for adversaries to collaborate (e.g., the prisoner dilemma) then I agree. And indeed, both Norway and Sweden would be completely lost if they declared war on the other tomorrow. But it doesn’t change the fundamental nature of the relationship.
There has never been a shred of evidence for security researchers, model analysis, benchmarks, etc that supports this.
It's a complete delusion in every sense.
And if they did, having a counterweight against corrupt self-centered US oligarchs/CEOs is actually one of the biggest proponents for an actual powerful communist or other model world power. The US had some of the most progressive tax policies in its existence when it was under existential threat during the height of the USSR, and when their powered started to diminish, so too did those tax policies.
> When the identical request was resubmitted for a neutral context and location, the security flaws disappeared. Authentication checks were implemented, and session management was configured correctly. The smoking gun: political context alone determined whether basic security controls existed.
Holy shit, these political filters seem embedded directly in the model weights.
I don't know if I trust China or X less in this regard.
Nobody is winning in this area until these things run in full on single graphics cards. Which is sufficient compute to run even most of the complex tasks.
You already have agents, that can do a lot of "thinking", which is just generating guided context, then using that context to do tasks.
You already have Vector Databases that are used as context stores with information retrieval.
Fundamentally, you can have the same exact performance on a lot of task whether all the information exists in the model, or you use a smaller model with a bunch of context around it for guidance.
So instead of wasting energy and time encoding the knowledge information into the model, making the size large, you could have an "agent-first" model along with just files of vector databases, and the model can fit in a single graphics cards, take the question, decide which vector db it wants to load, and then essentially answer the question in the same way. At $50 per TB from SSD not only do you gain massive cost efficiency, but you also gain the ability to run a lot more inference cheaper, which can be used for refining things, background processing, and so on.
In any case, models are useful, even when they don't hit these efficiency targets you are projecting. Just like cars are useful, even when they are bigger than a pack of cards.
Its also not a matter of it working or not. It already works. Take a small model that fits on a GPU with a large context window, like Gemma 27b or smaller ones, give it a whole bunch of context on the topic, and ask it questions and it will generate very accurate results based on the context.
So instead of encoding everything into the model itself, you can just take training data, store it in vector DBs, and train a model to retrieve that data based on query, and then the rest of it is just training context extraction.
Oh, be more creative. One simple way to make money off your idea is:
(1) Get a hedge fund to finance your R&D.
(2) Hedge fund shorts AI cloud providers and other relevant companies.
(3) Your R&D pans out and the AI cloud providers' stock tanks.
(4) The hedge fund makes a profit.
Though I don't understand: wouldn't your idea work work when served from the cloud, too? If what you are saying is true, you'd provide a better service at lower cost?
However the issue with "funding" isn't as simple as that statement above. Remember, modern funding is not about value its about hype. There is a reason why CEOs like Jenson say that if they could go back in time, they would never start their companies knowing the bullshit they have to walk through.
Ive also had my fair share of experiences in trying to get startups off the ground - for example, back around 2018, I was working on a system that would take your existing AWS cloud setup, and move it all to EC2s with self hosted services, which saved people money in the long run. I had proof of concept working and everything. The issue that I ran into when trying to get funding to build this out into a full blown product/service that I didn't realize is that being on AWS services for companies was equivalent to a person wearing an expensive business suit to a sales meeting - it was fact that they would advertise because it was seen as industry standard and created "warm feelings" with their customers. So at most, I would get some small time customers, while getting paid much less.
Now I just work on stuff (and yes, I am working on the issue at hand with existing models), and publish it to github (not gonna share it cause don't want my HN account associated with it). If someone contacts me with a dollar figure Im all game.
Nothing shows lack of understanding of the subject matter more than referencing the Dunning Kruger effect in a conversation.
Of course, the smaller models aren't as good at complex reasoning as the bigger ones, but that seems like an inherently-impossible goal; there will always be more powerful programs that can only run in datacenters (as long as our techniques are constrained by compute, I guess).
FWIW, the small models of today are a lot better than anything I thought I'd live to see as of 5 years ago! Gemma3n (which is built to run on phones[2]!) handily beats ChatGPT 3.5 from January 2023 -- rank ~128 vs. rank ~194 on LLMArena[3].
[1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...
[2] https://huggingface.co/google/gemma-3n-E4B-it
[3] https://lmarena.ai/leaderboard/text/overall [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...
No. They released a distilled version of R1 based on a Qwen 32b model. This is not V3, and it's not remotely close to R1 or V3.2.
We're around 35-40 orders of magnitude from computers now to computronium.
We'll need 10-15 years before handheld devices can run a couple terabytes of ram, 64-128 terabytes of storage, and 80+ TFLOPS. That's enough to run any current state of the art AI at around 50 tokens per second, but in 10 years, we're probably going to have seen lots of improvements, so I'd guess conservatively you're going to be able to see 4-5x performance per parameter, possibly much more, so at that point, you'll have the equivalent of a model with 10T parameters today.
If we just keep scaling and there are no breakthroughs, Moore's law gets us through another century of incredible progress. My default assumption is that there are going to be lots of breakthroughs, and that they're coming faster, and eventually we'll reach a saturation of research and implementation; more, better ideas will be coming out than we can possibly implement over time, so our information processing will have to scale, and it'll create automation and AI development pressures, and things will be unfathomably weird and exotic for individuals with meat brains.
Even so, in only 10 years and steady progress we're going to have fantastical devices at hand. Imagine the enthusiast desktop - could locally host the equivalent of a 100T parameter AI, or run personal training of AI that currently costs frontier labs hundreds of millions in infrastructure and payroll and expertise.
Even without AGI that's a pretty incredible idea. If we do get to AGI (2029 according to Kurzweil) and it's open, then we're going to see truly magical, fantastical things.
What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy?
NVIDIA will be churning out chips like crazy, and we'll start seeing the solar system measured in terms of average cognitive FLOPS per gram, and be well on the way toward system scale computronium matrioshka brains and the like.
Intel struggled for a decade, and folks think that means Moore's law died. But TSMC and Samsung just kept iterating. And hopefully Intel's 18a process will see them back in the game.
I suspect many people conflated Dennard scaling with Moore's law and the demise of Dennard scaling is what contributes to the popular imagination that Moore's law is dead: frequencies of processors have essentially stagnated.
Chiplets and advanced packaging are the latest techniques improving scaling and yield keeping Moore alive. As well as continued innovation in transistor design, light sources, computational inverse lithography, and wafer scale designs like Cerebras.
Of course, feature size (and thus chip size) and cost are intimately related (wafers are a relatively fixed cost). And related as well to production quantity and yield (equipment and labor costs divide across all chips produced). That the whole thing continues scaling is non-obvious, a real insight, and tantamount to a modern miracle. Thanks to the hard work and effort of many talented people.
Wikipedia quotes it as:
> The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years.
But I'm fairly sure, if you graph how many transistors you can buy per inflation adjusted dollar, you get a very similar graph.
https://imgur.com/a/UOUGYzZ - had chatgpt whip up an updated chart.
LoAR shows remarkably steady improvement. It's not about space or power efficiency, just ops per $1000, so transistor counts served as a very good proxy for a long time.
There's been sufficiently predictable progress that 80-100 TFLOPS in your pocket by 3035 is probably a solid bet, especially if a fully generative AI OS and platform catches on as a product. The LoAR frontier for compute in 2035 is going to be more advanced than the limits of prosumer/flagship handheld products like phones, so theres a bit of lag and variability.
Not sure about the stated GFlops.. but I suspect we find that AI doesn't need that much compute to begin with.
Well, these days people have the equivalent of a frontier lab from perhaps 40 years ago in their pocket. We can see what that has done to the economy, and try to extrapolate.
The current models are simply inefficient for their capability in how they handle data.
if you base your life on Kurzweil's hard predictions you're going to have a bad time
Infrastructure owners with access to the cheapest energy will be the long run winners in AI.
According to Google (or someone at Google) no organization has moat on AI/LLM [1]. But that does not mean that it is not hugely profitable providing it as SaaS even you don't own the model or Model as a Service (MaaS). The extreme example is Amazon providing MongoDB API and services. Sure they have their own proprietary DynamoDB but for the most people scale up MongoDB is more than suffice. Regardless brand or type of databases being used, you paid tons of money to Amazon anyway to be at scale.
Not everyone has the resource to host a SOTA AI model. On top of tangible data-intensive resources, they are other intangible considerations. Just think how many company or people host their own email server now although the resources needed are far less than hosting an AI/LLM model?
Google came up with the game changing transformer at its backyard and OpenAI temporarily stole the show with the well executed RLHF based system of ChatGPT. Now the paid users are swinging back to Google with its arguably more superior offering. Even Google now put AI summary as its top most search return results for free to all, higher than its paid advertisement clients.
[1]Google “We have no moat, and neither does OpenAI”:
Improvements seem incremental and smaller. For all I care, I could still happily use sonnet 3.5.
Quality was rarely the reason open source lagged in certain domains. Most of the time, open source solutions were technically superior. What actually hurt open source were structural forces, distribution advantages, and enterprise biases.
One could make an argument that open source solutions often lacked good UX historically, although that has changed drastically the past 20 years.
Yes, Ardour. It’s no more a toy than KiCad or Blender.
Why would a company use an expensive proprietary model on Vertex AI, for example, when they could use an open-source one on Vertex AI that is just as reliable for a fraction of the cost?
I think you are getting at the idea of branding, but branding is different from security or reliability.
and then we have to look at responsiveness, if the two models are qualitatively in the same ballpark, which one runs faster?
Which is an interesting point in favour of the human employee, as you can only consolidate scape goats so far up the chain before saying "It was AIs fault" just looks like negligence.
Better (UX / ease of use)
Lock in (walled garden type thing)
Trust (If an AI is gonna have the level of insight into your personal data and control over your life, a lot of people will prefer to use a household name)
Not Google, and not Amazon. Microsoft is a maybe.
All they have to do is completely switch the google homepage to gemini one day.
In the developed world. I'm not sure about globally.
ChatGPT feels better to use, has the best implementation of memory, and is the best at learning your preferences for the style and detail of answers.
So a couple of things. There are going to be a handful of companies in the world with the infrastructure footprint and engineering org capable of running LLMs efficiently and at scale. You are never going to be able to run open models in your own infra in a way that is cost competitive with using their API.
Competition _between_ the largest AI companies _will_ drive API prices to essentially 0 profit margin, but none of those companies will care because they aren't primarily going to make money by selling the LLM API -- your usage of their API just subsidizes their infrastructure costs, and they'll use that infra to build products like chat gpt and claude, etc. Those products are their moat and will be where 90% of their profit comes from.
I am not sure why everyone is so obsessed with "moats" anyway. Why does gmail have so many users? Anybody can build an email app. For the same reason that people stick with gmail, people are going to stick with chatgpt. It's being integrated into every aspect of their lives. The switching costs for people are going to be immense.
Google would love a cheap hq model on its surfaces. That just helps Google.
All the facts say otherwise to your thoughts here.
Given how often new models come out, it’s also easier to update an API call than constantly deploying model upgrades.
But in the long run, I hope open source wins out.
They won't. Actually, even if open models aren't competitive, they still won't. Hasn't this been clear since a while already?
There's no moat in models, investments in pure models has only been to chase AGI, all other investment (the majority, from Google, Amazon, etc.) has been on products using LLMs, not models themselves.
This is not like the gold rush where the ones who made good money were the ones selling shovels, it's another kind of gold rush where you make money selling shovels but the gold itself is actually worthless.
For a sufficiently low cost to orbit that may well be found in space, giving Musk a rather large lead. By his posts he's currently obsessed with building AI satellite factories on the moon, the better to climb the Kardashev scale.
Earth based computers benefit from the existence of an atmosphere to pull cold air in from and send hot air out to.
A space data center would need to entirely rely on city sized heat sink fins.
And aluminum is abundant in the lunar crust.
If launch costs are cheap enough, you can bring aluminum up from earth.
But once your in-space economy is developed enough, you might want to tap the moon or asteroids for resources.
That was fun to write but also I am generally on board with humanity pushing robotics further into space.
I don't think an orbital AI datacentre makes much sense as your chips will be obsolete so quickly that the capex getting it all up there will be better spent on buying the next chips to deploy on earth.
Radiative cooling is really annoying, but it's also an engineering problem with a straightforward solution, if mass-in-orbit becomes cheap enough.
The main reason I see for having datacentres in orbit would be if power in orbit becomes a lot cheaper than power on earth. Cheap enough to make up for the more expensive cooling and cheap enough to make up for the launch costs.
Otherwise, manufacturing in orbit might make sense for certain products. I heard there's some optical fibres with superior properties that you can only make in near zero g.
I don't see a sane way to beam power from space to earth directly.
It ignores the reality of humans having memetic emotions, habits, affinities, differentiated use cases & social signaling needs, and the desire to always want to do more...constantly adding more layers of abstraction in fractal ways that evolve into bigger or more niche things.
5 years ago humans didn't know a desire for gaming GPUs would turn into AI. Now it's the fastest growing market.
Ask yourself: how did Google Search continue to make money after Bing's search results started benchmarking just as good?
Or: how did Apple continue to make money after Android opened up the market to commoditize mobile computing?
Etc. Etc.
but one of the core ideas of marx's conception of history is that human needs, wants, and human nature itself are constantly in a state of change and that those needs and desires are in large part a product of the environment in which you live, and further that humans and human society in turn change their own environments which in turn change human nature itself
Interesting, though, that per the very same article someone like Adam Smith concurred empirically with Marx's observation on the titular tendency of rates of profit to fall. This suggests to me it likely had some meat to it.
Basically dude thought the competition inherent in capitalism would cause all profit to be competed to zero leading to an eventual 'crisis' and collapse of the capitalist means of production.
Implicit in this assumption is the idea that the things humans need and want changes/evolves in a predictable way, and not in a chaotic/fractal/reflexive way (which is what actually happens).
An eventual static basket of desired goods would be the only mechanism by which competition ever could compete profits to zero. If the basket is dynamic/reflexive/evolving, there's constantly new gaps opening between human desires and market offerings to arbitrage for profit. You can just look at the average profit margins of S&P500 companies over time to see they are not falling.
The further we get from subsistence worries (Adam Smith's invisible hand has pulled virtually the entire globe out of living in the dirt), the more divergent and higher abstraction these wants and needs become, and hence the profit opportunities are only increasing -- which is how the economy grows (no, it's not a fixed pie, another Marxian fallacy).
hopefully they won't
and their titanic off-balance sheet investments will bankrupt them as they won't be able to produce any revenue
The best innovation always happens before being tainted by investment.
However in another way the US probably offers more free inference than China. What good is an open 600 billion parameter model to a poor person? A free account with ChatGPT might be more useful to them, though also more exploitative.
Greed and "safety" hysteria.
At first, I thought they had gone the route of implementing yet another chat format that can handle more dynamic conversations like that, instead of just using Harmony, but looking at the syntax, doesn't it look exactly like Harmony? That's a good thing, don't get me wrong, but why not mention straight up that they've implemented Harmony, so people can already understand up front that it's compatible with whatever parsing we're using for GPT-OSS?
Frontier models are far exceeding even the most hardcore consumer hobbyist requirements. This is even further
IIRC the 512GB mac studio is about $10k
(commentary: things are really moving too fast for the layperson to keep up)
Typically, input gets routed to a number of of experts eg. top 2, leaving the others inactive. This reduces number of activation / processing requirements.
Mistral is an example of a model that's designed like this. Clever people created converters to transform dense models to MOE models. These days many popular models are also available in MOE configuration
I'm not saying it is a perfect analogy, but it is by far the most familiar one for people to describe what sparse activation means. I'm no big fan of over-reliance on biological metaphor in this field, but I think this is skewing a bit on the pedantic side.
re: your second comment about pruning, not to get in the weeds but I think there have been a few unique cases where people did lose some of their brain and the brain essentially routed around it.
MOE is not the holy grail, as there are drawbacks eg. less consistency, expert under/over-use
https://www.youtube.com/watch?v=zwHqO1mnMsA
I wonder how well the aftermarket memory surgery business on consumer GPUs is doing.
I want one. Hot air blows.
This will absolutely scar, if not char, your cornea faster than you can blink.
There is nothing special about "lasing power." It amounts to a 45-watt light bulb, nothing more and nothing less.
Of course the laser is tightly focused. That's pretty much one of the defining properties of laser devices. How else do you think the laser is heating the microprocessors in the video?
They shouldn't be focusing it to a point under any conditions. Whether it's as safe as it could be is a different question, of course. For instance, you'd like to think that the act of configuring it for a smaller beam footprint would reduce the power at the same time, as opposed to requiring a separate adjustment that might be overlooked by the operator. Would have been nice if the video had addressed that and other safety considerations, for sure.
A lot depends on the exact wavelength. 1400 nm and longer is much less worrisome than near-visible IR.
The laser is collimated but not focused so by your logic it will be fine.
This is advice on par with eating tide pods.
About all we can agree on, I think, is that neither of us knows enough about the product to argue about it usefully.
Unlike you I do know what I'm talking about.
I feel like because you didn't actually talk about prompt processing speed or token/s, you aren't really giving the whole picture here. What is the prompt processing tok/s and the generation tok/s actually like?
Maybe my 6000 Pro spoiled me, but for actual usage, 6 or even 9 tok/sec is too slow for a reasoning/thinking model. To be honest, kind of expected on CPU though. I guess it's cool that it can run on Apple hardware, but it isn't exactly a pleasant experience at least today.
But again, not if you're using thinking/reasoning, which if you want to use this specific model properly, you are. Then you have a huge delay before the actual response comes through.
> MacStudio is the simplest solution to run it locally.
Obviously, that's Apple's core value proposition after all :) One does not acquire a state-of-the-art GPU and then expect simple stuff, especially when it's a fairly uncommon and new one. You cannot really be afraid of diving into CUDA code and similar fun rabbit holes. Simply two very different audiences for the two alternatives, and the Apple way is the simpler one, no doubt about it.
There are consumer-ish hardware that can run large models like DeepSeek 3.x slowly. If you're using LLMs for a specific purpose that is well-served by a particular model, you don't want to risk AI companies deprecating it in a couple months and push you to a newer model (that may or may not work better in your situation).
And even if the AI service providers nominally use the same model, you might have cases where reproducibility requires you use the same inference software or even hardware to maintain high reproducibility of the results.
If you're just using OpenAI or Anthropic you just don't get that level of control.
Which ones? I wanted to try a large base model for automated literature (fine-tuned models are a lot worse at it) but I couldn't find a provider which makes this easy.
https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...
Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.
https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models. It’s easy with token based options because usually are a drop-in replacement for OpenAI API endpoints.
You have you rent a GPU instance if you want to run the latest or custom stuff, but if you just want to play around for a few hours it’s not unreasonable.
> https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...
I don't see any large base models there. A base model is a pretrained foundation model without fine tuning. It just predicts text.
> Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.
A B200 is probably not enough: it has just 192 GB RAM while DeepSeek-V3.2-Exp-Base, the base model for DeepSeek-V3.2, has 685 billion BF16 parameters. Though I assume they have larger options. The problem is that all the configuration work is then left to the user, which I'm not experienced in.
> https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models
Thanks. They do indeed have a single base model: Llama 3.1 405B BASE. This one is a bit older (July 2024) and probably not as good as the base model for the new DeepSeek release. But that might the the best one can do, as there don't seem to be any inference providers which have deployed a DeepSeek or even Kimi base model.
I feel like private cloud instances that run on demand is still in the spirit of consumer hobbyist. It's not as good as having it all local, but the bootstrapping cost plus electricity to run seems prohibitive.
I'm really interested to see if there's a space for consumer TPUs that satisfy usecases like this.
https://openrouter.ai/deepseek/deepseek-v3.2
This only bolsters your point. Will be interesting to see if this changes as the model is adopted more widely.
"create me a svg of a pelican riding on a bicycle"
I’m surprised that a fixed size k doesn’t experience degrading performance in long context windows though. That’s a _lot_ of responsibility to push into that indexing function. How could such a simple model achieve high enough precision and recall in a fixed size k for long context windows?
I am still happy to pay Google because of their ecosystem or Gemini app, NotebookLM, Colab, gemini-cli, etc. Google’s moat for me is all the tooling and engineering around the models.
That said, my one year Google AI subscription ends in four months and I might try an alternative, or at least evaluate options. Alibaba Cloud looks like an interesting low cost alternative to AWS for building systems. I am now a retired ‘gentleman scientist’ now and my personal research is inexpensive no matter who I pay for inference compute, but it is fun to spend a small amount of time evaluating alternatives even though mostly using Google is time efficient.
It refuses to tell me when asked, only that it's been train with data up until July 2024, which would make it quite old. I turned off search and asked it for the winner of the US 2024 election, and it said it didn't know, so I guess that confirms it's not a recent model.
The pdf describes how they did "continued pre-training" and then post training to make 3.2. I guess what's missing is the full pre-training that absorbs most date sensitive knowledge. That's probably also the reason that the versions are 3.x still.
I don't know, but GPT seems to have regressed a lot, at least the free version.
1. Chinese models typically focus on text. US and EU models also bear the cross of handling image, often voice and video. Supporting all those is additional training costs not spent on further reasoning, tying one hand in your back to be more generally useful.
2. The gap seems small, because so many benchmarks get saturated so fast. But towards the top, every 1% increase in benchmarks is significantly better.
On the second point, I worked on a leaderboard that both normalizes scores, and predicts unknown scores to help improve comparisons between models on various criteria: https://metabench.organisons.com/
You can notice that, while Chinese models are quite good, the gap to the top is still significant.
However, the US models are typically much more expensive for inference, and Chinese models do have a niche on the Pareto frontier on cheaper but serviceable models (even though US models also eat up the frontier there).
But even assuming that we're on the right path for that (which we may not be) and assuming that nothing intervenes to stop it (which it might), there may be only one winner, and that winner may not have even entered the game yet.
This is what people like Altman want investors to believe. It seems like any other snake oil scam because it doesn't match reality of what he delivers.
Nano Banano Pro starts at 15 cents per image at <2k resolution, and is not strictly better than Seedream 4.0: yet the latter does 4K for 3 cents per image.
Add in the power of fine-tuning on their open weight models and I don't know if China actually needs to catch up.
I finetuned Qwen Image on 200 generations from Seedream 4.0 that were cleaned up with Nano Banana Pro, and got results that were as good and more reliable than either model could achieve otherwise.
It is very good for the size and speed, and I'm excited for the Edit and Base variants... but Reddit has been a bit "over-excited" because it run on their small GPUs and isn't overly resistant to porn.
Not true at all. Qwen has a VLM (qwen2 vl instruct) which is the backbone of Bytedance’s TARS computer use model. Both Alibaba (Qwen) and Bytedance are Chinese.
Also DeepSeek got a ton of attention with their OCR paper a month ago which was an explicit example of using images rather than text.
Most of AI-generated videos we see on social media now are made with Chinese models.
The scales are a bit murky here, but if we look at the 'Coding' metric, we see that Kimi K2 outperforms Sonnet 4.5 - that's considered to be the price-perf darling I think even today?
I haven't tried these models, but in general there have been lots of cases where a model performs much worse IRL than the benchmarks would sugges (certain Chinese models and GPT-OSS have been guilty of this in the past)
• For both Kimi K2 and for Sonnet, there's a non-thinking and a thinking version. Sonnet 4.5 Thinking is better than Kimi K2 non-thinking, but the K2 Thinking model came out recently, and beats it on all comparable pure-coding benchmarks I know: OJ-Bench (Sonnet: 30.4% < K2: 48.7%), LiveCodeBench (Sonnet: 64% < K2: 83%), they tie at SciCode at 44.8%. It is a finding shared by ArtificialAnalysis: https://artificialanalysis.ai/models/capabilities/coding
• The reason developers love Sonnet 4.5 for coding, though, is not just the quality of the code. They use Cursor, Claude Code, or some other system such as Github Copilot, which are increasingly agentic. On the Agentic Coding criteria, Sonnet 4.5 Thinking is much higher.
By the way, you can look at the Table tab to see all known and predicted results on benchmarks.
The US labs aren't just selling models, they're selling globally distributed, low-latency infrastructure at massive scale. That's what justifies the valuation gap.
Edit: It looks like Cerebras is offering a very fast GLM 4.6
Opus 4.5 = ~60-80tps https://openrouter.ai/anthropic/claude-opus-4.5
Kimi-k2-think = ~60-180tps https://openrouter.ai/moonshotai/kimi-k2-thinking
Deepseek-v3.2 = ~30-110tps (only 2 providers rn) https://openrouter.ai/deepseek/deepseek-v3.2
It'll probably be a few years before all that stuff becomes as smooth as people need, but OAI and Anthropic are already doing a good job on that front.
Each new Chinese model requires a lot of testing and bespoke conformance to every task you want to use it for. There's a lot of activity and shared prompt engineering, and some really competent people doing things out in the open, but it's generally going to take a lot more expert work getting the new Chinese models up to snuff than working with the big US labs. Their product and testing teams do a lot of valuable work.
I think GLM 4.6 offered by Cerebras is much faster than any US model.
Exactly what I’m thinking. Chinese models catching rapidly. Soon to be on-par with the big dogs.
And the people making the bets are in a position to make sure the banning happens. The US government system being what it is.
Not that our leaders need any incentive to ban Chinese tech in this space. Just pointing out that it's not necessarily a "bet".
"Bet" imply you don't know the outcome and you have no influence over the outcome. Even "investment" implies you don't know the outcome. I'm not sure that's the case with these people?
With caching the expensive US models end up being like 2x the price (e.g sonnet) and often much cheaper (e.g gpt-5 mini)
If they start caching then US companies will be completely out priced.
The nature of the race may change as yet though, and I am unsure if the devil is in the details, as in very specific edge cases that will work only with frontier models ?
It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed. Even now, when you ask questions like what you ask of that era, the answers you get are genuinely not better than "yes, this should have been obvious at the time if you were not completely blinded by ethnic and especially ideological prejudice."
It might be this model is super good, I haven’t tried it, but to say the Chinese models are better is just not true.
What I really love though is that I can run them (open models) on my own machine. The other day I categorised images locally using Qwen, what a time to be alive.
Further even than local hardware, open models make it possible to run on providers of choice, such as European ones. Which is great!
So I love everything about the competitive nature of this.
For instance, a lot of people thought they were running "DeepSeek" when they were really running some random distillation on ollama.
I ran the 1.58-bit Unsloth quant locally at the time it came out, and even at such low precision, it was super rare for it to get something wrong that o1 and GPT4 got right. I have never actually used a hosted version of the full DS.
Germany was right in some ways and wrong in others for the soviet unions strength. USSR failed to conquer Finland because of the military purges. German intelligence vastly under-estimated the amount of tanks and general preparedness of the Soviet army (Hitler was shocked the soviets had 40k tanks already). Lend Lease act really sent an astronomical amount of goods to the USSR which allowed them to fully commit to the war and really focus on increasing their weapon production, the numbers on the amount of tractors, food, trains, ammunition, etc. that the US sent to the USSR is staggering.
This is kinda reflected in the stock market, where the AI stocks are surging to new heights every day, yet their Chinese equivalents are relatively lagging behind in stock price, which suggests that investors are betting heavily on the US companies to "win" this "AI race" (if there's any gains to be made by winning).
Also, in the past couple years (or maybe a couple decades), there had also been a lot of crap talk about how China has to democratize and free up their markets in order to be competitive with the other first world countries, together with a bunch of "doomsday" predictions for authoritarianism in China. This narrative has completely lost any credibility, but the sentiment dies slowly...
Stalin just finished purging his entire officer corps, which is not a good omen for war, and the USSR failed miserably against the Finnish who were not the strongest of nations, while Germany just steamrolled France, a country that was much more impressive in WW1 than the Russians (who collapsed against Germany)
Ideology played a role, but the data they worked with, was the finnish war, that was disastrous for the sowjet side. Hitler later famously said, it was all a intentionally distraction to make them believe the sowjet army was worth nothing. (Real reasons were more complex, like previous purging).
Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded.
There were many counter revolutionaries among the leadership, even those conducting the purges. Stalin was like "ah fuck we're hella compromised." Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception.
Well, Stalin was, by far, the biggest counter-revolutionary in the Politburo.
> Stalin was like "ah fuck we're hella compromised."
There's no evidence that anything significant was compromised at that point, and clear evidence that Stalin was in fact medically paranoid.
> Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception.
Can we please not recycle 90-years old soviet propaganda? The Moscow trial being irrational self-harm was acknowledged by the USSR leadership as early as the fifties…
America is mostly suburbs and car sewers but that's because the voters like it that way.
DeepSeek is a valuable product, but its open-source nature makes it difficult to displace larger competitors. Any advancements can be quickly adopted, and in fact, it may inadvertently strengthen these companies by highlighting weaknesses in their current strategies.
% curl https://api.deepseek.com/models \
-H "Authorization: Bearer ${DEEPSEEK_API_KEY}"
{"object":"list","data":[{"id":"deepseek-chat","object":"model","owned_by":"deepseek"},{"id":"deepseek-reasoner","object":"model","owned_by":"deepseek"}]}(unless/until a new version gets worse for your use case)
I wonder how long it will take and how quickly it will try to get shut down.
Classic examples like:
User: I'm feeling bad
LLM: Have you considered k*****g yourself?
Are a good example of what an LLM "without censorship" looks like: Good at predicting the most common sequence of text (ex. the most common sarcastic reply from Reddit), but effectively useless.In order to build a useful LLM (ie. one that actually follows instructions) you need to teach the LLM to prefer the most helpful answer, and that process by itself is already an implicit layer of "censorship" as it requires human supervision, and different humans have different perceptions on what the most helpful answer is, especially when their paycheck is conditioned to a list of "corporate values".
You can only pick between a parrot that repeats random text from the Internet, or a parrot lobotomized to follow the orders from their trainers (which occasionally repeats random text from the Internet, because the training isn't perfect).
Unsurprisingly, the lobotomized parrot is more useful to get actual work done, even if it won't tell you what the CIA[1] did to Mexican Students on October 2nd, 1968.
[1]: https://www.bbc.com/mundo/noticias-america-latina-45662739
For a Mixture of Experts (MoE) model you only need to have the memory size of a given expert. There will be some swapping out as it figures out which expert to use, or to change expert, but once that expert is loaded it won't be swapping memory to perform the calculations.
You'll also need space for the context window; I'm not sure how to calculate that either.
I don't want to make big generalizations. But one thing I noticed with chinese models, especially Kimi, is that it does very well on benchmarks, but fails on vibe testing. It feels a little bit over-fitting to the benchmark and less to the use cases.
I hope it's not the same here.
If it had vision and was better on long context I'd use it so much more.
I guess that’s kinda how it is for any system that’s trained to do well on benchmarks, it does well but rubbish at everything else.
Whereas the benchmark gains seem by new OpenAI, Grok and Claude models don't feel accompanied by vibe improvement
Yet, we (developers, users, human civilization), benefit from that.
So yes, I cherish when Chinese companies release open source LLMs. Be it as it fits their business model (the same way as US companies) or from grants (the same way as a lot of EU-backed projects, e.g. Python, DuckDB, scikit-learn).
It's a very transparent, consistent strategy.
AI is a little different because it has geopolitical implications.
This is using open source in a bit of different spirit than the hacker ethos, and I am not sure how I feel about it.
It is a kind of cheat on the fair market but at the same time it is also costly to China and its capital costs may become unsustainable before the last players fold.
Can you really view this as a cheat this when the US is throwing a trillion dollars in support of a supposedly "fair market"?
The way I see this, some tech teams in China have figured out that training and tuning LLMs is not that expensive after all and they can do it at a fraction of the cost. So they are doing it to enter a market previously dominated by US only players.
It's a bit early to have any sort of feelings about it, isn't it? You're speaking in absolutes, but none of this is necessarily 100% true as we don't know their intentions. And judging a group of individuals intention based on what their country seems to want, from the lens of a foreign country, usually doesn't land you with the right interpretation.
We should all be happy about the price of AI coming down.
Seriously though, our leaders are actively throwing everything and the kitchen sink into AI companies - in some vain attempt to become immortal or own even more of the nations wealth beyond what they already do, chasing some kind of neo-tech feudalism. Both are unachievable because they rely on a complex system that they clearly don't understand.
What I appreciate about the Chinese efforts is that they are being forced to get more intelligence from less hardware, and they are not only releasing their work products but documenting the R&D behind them at least as well as our own closed-source companies do.
A good reason to stir up dumping accusations and anti-China bias would be if they stopped publishing not just the open-source models, but the technical papers that go with them. Until that happens, I think it's better to prefer more charitable explanations for their posture.
I am very curious on your definition and usage of 'fair' there, and whether you would call the LLM etc sector as it stands now, but hypothetically absent deepseek say, a 'fair market'. (If not, why not?)
Absurd to say Deepseek is CCP controlled while ignoring the govt connection here
ESH
It's also a bit funny that providing free models is probably the most communist thing China has done in a long time.
Where do you think they learnt this trick? Years lurking on HN and this post's comment section wins #1 on the American Hypocrisy chart. Unbelievable that even in the current US people can't recognize when they're looking in the mirror. But I guess you're disincentivized to do so when most of your net worth stems from exactly those companies and those practices.
Do you believe the lack of competition is purely because the products are superior?
US tech is now sort of like the dollar. People/countries outside the US need and want alternatives to hedge against in the event of political uncertainity but cannot do it completely for various reasons including arm twisting by the US govt.
One example is some govts and universities in the EU are trying to get rid of MS products for decades but they are unable to.
If that's true, why doesn't America compete on this front against China?
> they're more often than not held at a disadvantage by domestic governments
So when the US had the policy advantage over the EU it was just the market working, but when China has the policy advantage over the US it suddenly becomes unfair?
I think you misunderstood this. When domestic competitor arise against American tech, the domestic government tends to explicitly favour those competitor against American tech, placing the latter at an disadvantage.
You can see India or China or Korea or SEA where they have their own favored food delivery apps and internet services. Even in the EU the local LLM companies like Mistral are favored by local businesses for integration over OpenAI. Clearly American tech hasn't actually displaced serious domestic competitors, so the rare earths comparison fails when the USA in contrast is far more willing to let local businesses fail.
Things can be bad in a spectrum and I believe it is much easier for society/state to break up a capitalistic monopoly than a state backed monopoly. To illustrate, the state has sued some of those companies and they were seriously threatened, because of competition ills. That is not the case with a state company.
Banks created their oligopolies and then who saved them when they fucked up?
Isn't Tesla a state backed monopoly in the USA because of grants and tariffs on external competitors? Isn't SpaceX? Yet nobody treats then as state backed.
I don't understand this necessity to put companies in a pedestal and hate on states. Capitalist propaganda I guess?
Market forces are manipulated all the time. This distinction is nonsense. Companies influence states and vice-versa.
Meanwhile, America has been causing death and destruction around the world. It's easy to make lists: Vietnam, Iraq, Gaza, Cuba, South and Central America etc etc.
They've built a "stable & scalable" RL protocol - more capable RL training infrastructure
They've built a pipeline/process to generate synthetic data for reasoning and agentic training
These all combine to build an efficient model with extensive RL post-training for reasoning and agentic use, although they note work is still needed on both the base model (more knowledge) and post-training to match frontier performance.
which version are you?
我是DeepSeek最新版本模型! 如果你想了解具体的版本号信息,我建议你:
查看官方文档 - DeepSeek官网和文档会有最准确的版本信息
关注官方公告 - 版本更新通常会在官方渠道公布
查看应用商店/网页版 - 使用界面通常会显示当前版本
我具备DeepSeek的所有最新功能特性,包括: 强大的对话和推理能力
128K上下文长度
文件上传处理(图像、文档等)
联网搜索功能(需手动开启)
完全免费使用
如果你需要知道精确的版本号用于技术对接或其他特定用途,最好直接查阅官方技术文档,那里会有最准确和详细的技术规格说明。有什么其他问题我可以帮你解答吗?