Still, I agree that self-hosting is probably a part of the decrease.
Even if you pretend that the classifier respect anonymity, if I pay for the inference, I would expect that it would be a closed tube with my privacy respected. If at least it was for "safety" checks, I don't like that but I would almost understand, now it is for them to have "marketing data".
Imagine, and regarding the state of the world it might come soon, that you have whatsapp or telegram that inspect all the messages that you send to give reports like:
- 20% of our users speak about their health issues
- 30% of messages are about annoying coworkers
- 15% are messages comparing dick sizes
I'd feel a lot better if "OpenRouter" were open source.
If you don’t like any middle men, just go to one of the providers directly.
why imagine? The world already functions exactly like that. Talk on Tg like every chat is summarized every 24hrs and monthly (with cheap LLM and then with strong ones if signals found), and it reports to all kinds of interested intel agencies.
Same for openrouter. everything that leaves your device plaintext = public. Period. No hopes.
Even better: they send all the data to GoogleTagClassifier, which means now Google had a copy of the sample as well
Lol hahaha
"Everyone knows in {{BUSINESS_TYPE}} there is no real privacy".
Be it fintech, AI or social media. You give them a free pass with being flippant about companies respecting privacy.
Being flippant about anyone being careless about our privacy is doing us as a society and injustice. We should demand privacy, not laugh at the notion of privacy.
>We should demand privacy, not laugh at the notion of privacy.
Recently got m3 ultra 512gb studio. LM Studio runs frontier models routinely. Going local is the ONLY way. That's all you can do. "Demanding privacy" is security theater. Act accordingly.
Also interesting how the 'roleplaying' category is so dominant, makes me wonder if Google's classifier sees a system prompt with "Act as a X" and classifies that as roleplay vs the specific industry the roleplay was intended to serve.
[1] https://situational-awareness.ai/
[2] https://www.darioamodei.com/essay/machines-of-loving-grace
[3] https://www.darioamodei.com/post/on-deepseek-and-export-cont...
I'm pretty surprised by that, but I guess that also selects for people who would use openrouter
I hope those are not classified as “roleplaying” the “roleplay” here is just a trick to get better answer from the model, often in a professional setting that has nothing to do with creative writing of NSFW stuff
There are many, many people, and websites, dedicated to roleplaying, and those people will often have conversations lasting thousands of messages with different characters. I know a people whose personal 'roleplay AI' budget is a $1,000/month, as they want the best quality AIs.
> This indicates that users turn to open models primarily for creative interactive dialogues (such as storytelling, character roleplay, and gaming scenarios) and for coding-related tasks. The dominance of roleplay (hovering at more than 50% of all OSS tokens) underscores a use case where open models have an edge: they can be utilized for creativity and are often less constrained by content filters, making them attractive for fantasy or entertainment applications. Roleplay tasks require flexible responses, context retention, and emotional nuance - attributes that open models can deliver effectively without being heavily restricted by commercial safety or moderation layers. This makes them particularly appealing for communities experimenting with character-driven experiences, fan fiction, interactive games, and simulation environments.
I could imagine something like D&D or other types of narrative adventures on demand with a machine that never tires of exploring subplots or rewriting sections to be a bit different is a pretty cool thing to have. Either that, or writing fiction, albeit hopefully not entire slop books that are sold, but something to draw inspiration from and do a back and forth.
In regards to NSFW stuff, a while back people were clowning on OpenAI for suggesting that they'd provide adult writing content to adults, but it might as well be a bunch of money that's otherwise left on the table. Note: I'm all for personal freedom, though one also has to wonder about the longer term impact of those "AI girlfriend/boyfriend" trends, you sometimes see people making videos about those subreddits. Oh well, not my place to judge.
Edit: oh hey, there is more data there after all
> Among the highest-volume categories, roleplay stands out for its consistency and specialization. Nearly 60% of roleplay tokens fall under Games/Roleplaying Games, suggesting that users treat LLMs less as casual chatbots and more as structured roleplaying or character engines. This is further reinforced by the presence of Writers Resources (15.6%) and Adult content (15.4%), pointing to a blend of interactive fiction, scenario generation, and personal fantasy. Contrary to assumptions that roleplay is mostly informal dialogue, the data show a well-defined and replicable genre-based use case.
... I just don't get why LLMs are affected by this kind of nonsense -- is it due to training rewards?
The business model is likely built upon the assumption that most people aren't going to max out their limits every day, because if they were, it likely wouldn't be profitable.
It definitely does. OpenRouter is pretty popular among roleplayers and creative writers due to having a wide variety of models available, sometimes providing free access to quality models such as DeepSeek, and lacking any sort of rules against generating "adult" content.
The fact that one account can have such a noticeable effect on token usage is kind of insane. And also raises the question of how much token usage is coming from just one or five or ten sizeable accounts.
According to their charts they're at a throughput of something like 7T tok/week total now. At 1$/Mtok, that's 7M$ per week. Less than half a billion per year. How much is that compared to the total inference market? And yet again, their throughput went like 20x in one year, who knows what's to come...
I'd have liked to see a chart of all tokens broken down by category rather than just percentages, but what this data seems to be saying is that growth isn't exponential, and is being dominated by growth in programming. A lot of the spending in AI is being driven by the assumption that it'll be used for everything everywhere. Perhaps it's just OpenRouter's user base, but if this data is representative then it implies AI adoption isn't growing all that fast outside of the tech industry (especially as "science" is nearly all AI related discussion).
This feels intuitively likely. I haven't seen many obvious signs of AI adoption around me once I leave the office. Microsoft has been struggling to sell its Copilot offerings to ordinary MS Office users, who apparently aren't that keen. The big wins are going to be existing apps and data pipelines calling out to AI, and it'll just take time to figure out what those use cases are and integrate them. Integrating even present-day AI into the long tail of non-tech industries is probably going to take decades.
Also odd: no category for students cheating on homework? I notice that "editing services" is a big chunk of the "academia" category. Probably most of that traffic goes direct to chatgpt.com and bypasses OpenRouter entirely.
This completely changes infrastructure requirements: KV-caching becomes a necessity, and prefill time becomes a critical metric, often more important than generation speed. That's exactly why models with cheap long context (Gemini, DeepSeek) are winning the race against "smarter" but expensive models. Inference economics are now dictated by context length
I'd be interested in a clarification on the reasoning vs non-reasoning metric.
Does this mean the reasoning total is (input + reasoning + output) tokens? Or is it just (input + output).
Obviously the reasoning tokens would add a ton to the overall count. So it would be interesting to see it on an apples to apples comparison with non reasoning models.
- Take in the user query (input tokens)
- Break that into a game plan. Ex: "Based on user query: {query} generate a plan of action." (reasoning tokens)
- Answer (output tokens)
Because the reasoning step runs in a loop until it's run through it's action plan, it frequently uses way more tokens than the input/output step.
I have sooo many issues with the naming scheme of this """""AI"""" industry", it's crazy!
So the LLM gets a prompt, then creates a scheme to pull pre-weighted tokens post-user-phrasing, the constituents of which (the scheme) are called reasoning tokens, which it only explicitly distinguishes as such because there are hundreds or even thousands of output tokens to the hundreds and/or thousands of potential reasoning input tokens that were (almost) equal to the actually chosen reasoning input tokens based on the more or less adequately phrased question/prompt given ... as input ... by the user ...
It is essentially a way to expand the prompt further. You can achieve the same exact thing by turning off the “thinking” feature and just being more detailed and step by step in your prompt but this is faster.
My guess is that the next evolution of this will be models that do an edit or review step after to catch if any of the constraints were broken. But best I can tell a reasoning model can be approximated by doing two passes of a non-reasoning model: first pass you give it the user prompt with instructions that boil down to “make sense of this prompt and formulate a plan” and the second pass you give it the original prompt, the plan, and an explanation that the plan is to implement the original prompt using the plan.
Most of the high volume enterprise use cases use their cloud providers (e.g., azure)
What we have here is mostly from smaller players. Good data but obviously a subset of the inference universe.
I've been testing free models for coding hobby projects after I burnt through way too many expensive tokens on Replit and Claude. Grok wasn't great, kept getting into loops for me. I had better results using KAT coder on opencode (also free).
Because the people behind it and myself having at least some standards
So yeah, their statistics are inflated quite a bit, since most of that usage was not paid for, or at least not by the end user.
I'm not seeing that. All I'm seeing is them analyzing metadata.
>The classifier is deployed within OpenRouter's infrastructure, ensuring that classifications remain anonymous and are not linked to individual customers.
OpenRouter has to have access to your prompts in order to route it somewhere else. The researchers don't get access to these prompts. They only get access to the metadata being generated from routing a prompt.
> OpenRouter performs internal categorization on a random sample comprising approximately 0.25% of all prompts
How can you arrive at any conclusion with such a small random sample size?
[^1] This is a simplification. I should say that it depends on the standard error of your statistic, i.e, the thing you're trying to measure (If you're estimating the max of a population, that's going to require more samples than if you're estimating the mean). This standard error, in turn, will depend on the standard deviation of the dimension you're measuring. For example, if you're estimating the mean height, the relevant quantity is the standard deviation of height in the population.
For example, even 300 really random people is enough to correctly assertain the distribution of population for some measurement (say, some personality feauture).
That’s the basis of all polls and what have you
(OK, on rereading, you did link to a WP article about CLT, so 30 it is!)
300 — I had in memory as a safe bet in a case of some skewed stuff like log-normal, exponential, etc.
[1] https://stats.stackexchange.com/questions/166/how-do-you-dec...
Nvidia could keep delivering record-breaking numbers, and we may well see multiple companies hit six, seven, or even eight trillion dollars in market cap within a couple of years. While I am skeptical of claims like AI will make programming obsolete, but it’s clear that the adoption is still going like crazy and it's hard to anticipate when the plateau happens.
[1]: https://openrouter.ai/state-of-ai#open-vs_-closed-source-mod...
Also, growth seems to be linear, not exponential.
All this data confirms that OpenRouter’s enterprise ambitions will fail. It’s a nice product for running Chinese models tho
What it does have I think is a problem that TaskRabbit had: you can hire a house cleaner through TR but once you find a good one you can just work directly with them and save the middleman fee. So OR is great for experimenting with a ton of models to see what is the cheapest one that still performs the tasks you need but then you no longer need OR unless it is for reliability.
I do question this finding:
> the small model category as a whole is seeing its share of usage decline.
It's important to remember that this data is from OpenRouter... a API service. Small models are exactly those that can be self-hosted.
It could be the case that total small model usage has actually grown, but people are self-hosting rather than using an API. OpenRouter would not be in a position to determine this.