LLMs learn what programmers create, not how programmers work

I ran an experiment to see if CLI actually was the most intuitive format for tool calling. (As claimed by a ex-Manus AI Backend Engineer) I gave my model random scenarios and a single tool "run" - i told it that it worked like a CLI. I told it to guess commands.

it guessed great commands, but it formatted it always with a colon up front, like :help :browser :search :curl

It was trained on how terminals look, not what you actually type (you don't type the ":")

I have since updated my code in my agent tool to stop fighting against this intuition.

LLMs they learn what commands look like in documentation/artifacts, not what the human actually typed on the keyboard.

Seems so obvious. This is why you have to test your LLM and see how it naturally works, so you don't have to fight it with your system prompt.

This is Kimi K2.5 Btw.

Comments

tucazMar 27, 2026, 9:48 PM
What changed after this discovery? What was better or worse and how better or worse?
KalskiTheDanMar 27, 2026, 10:35 AM
I ran into this building data pipelines for LLMs. Kept feeding structured JSON with numbers and the model would do bizarre arithmetic on them.

The moment I shifted to computing the analysis upstream and describing results in plain english, the outputs were much more coherent.

The model doesn't know what numbers/spreadsheets of data mean inherently, as in... the LLM inherently does not compute a math equation/formula within itself. Rather, an LLM would calls/creates calculation code on the side to then read off the results. And still with this side-work, the code/calculator created would give contextual wordings to the numbers; highest, average, x% growth, consolidating at y% rate, etc.

mememememememoMar 27, 2026, 8:25 AM
Not sure it is true LLMs don't see code or cli commands directly in their training. They go through reinforcement learning and they could easily be trained on a command line. People are paid to give human feedback. See https://huyenchip.com/2023/05/02/rlhf.html
mpalmerMar 24, 2026, 3:49 AM
The novice came to the master. "I have figured it out, the rules for how LLMs understand CLIs. It gives the right commands, but adds colons. It was trained on the visual shape of terminals, not keystrokes."

"Clear the session," the master said. "Run the same prompt again."

The novice pressed return. The model output: `ls -R /tmp`

"The colons are gone," the novice said. "But my theory explained them perfectly."

"You built a cage for a cloud," the master said. "Do not mistake a single roll of the dice for the rulebook."

noemitMar 24, 2026, 7:43 AM
I ran tests of 100 attempts with different prompt/scenario combinations. Each "attempt"/theory had 3 different system prompts wordings. Most of the prompts did not mention a colon, but it kept appearing. When I added negative instructions against using a colon, the quality went down (most of the tool calls were malformed, one common issue was markdown ticks in front) It was only when my system prompt acted like colons were normal that I kept getting 100/100 perfect expected tool calls. I ranked my system prompts by which returned the most consistent commands.
shompMar 24, 2026, 12:05 AM
Great observation. The brain of a programmer is still a "black box" to the feed-forward network of nodes . But in theory, if you pumped a lot of the live-coding videos from something like youtube into the process, you could get a bit of that "what's your approach"-erism to bleed into the model. There might not be enough material there to truly "train it to think" but it would be interesting to try and "fill the gaps" of black-box-ed-ness in the LLM with supplemental "here was the process that got us there" video feeds. The next natural move might actually be recording thousands of hours of footage of developers working with the LLMs directly like in Cursor or another IDE that has LLM live-pair-programming , maybe calling it "pair programming" is generous , but it might be a reasonable foray into teaching the next generation of LLMs the "thought process" behind things. In reality you'd be teaching it which files to inspect, which windows to open/close, which tools to switch to and focus on. And while it might be imperfect, it might just be enough.
paulcoleMar 24, 2026, 1:20 PM
[dead]
paulweltyMar 25, 2026, 2:50 PM
You found an interesting example of the fallacy of following and defining rules You can't define a system with only rules. Eventually, you run out of rules on how to apply the rules. Eventually, you need judgment and interpretation.
Areena_28Mar 24, 2026, 6:55 AM
I know even we hit the same thing building internal security tooling. our model kept formatting output like documentation, not like how we would or any person in place of us would read in a terminal at 2am during an incident.

I am a bit curious, did you find this behavior consistent across models or is it more pronounced with certain ones?

noemitMar 24, 2026, 7:51 AM
I ran into it while building - I should have tested different temps too - I was just trying to get cli style tool calls to be more reliable
Areena_28Mar 27, 2026, 10:14 AM
yeah temperature is probably worth a run, we noticed even small adjustments changed how the model interpreted formatting expectations quite a bit.
stuaxoMar 24, 2026, 10:28 AM
Literate programming is about to become mainstream in the funniest way possible.
Areena_28Mar 27, 2026, 10:16 AM
oh yesss, except literate programming was still the human explaining intent to other humans. this is more like the human explaining intent to a machine that then explains it back to other humans. hahaha this is actually funny.
actersMar 24, 2026, 6:50 AM
Instead of telling the LLM that "run"works like a cli, maybe just tell the LLM that "run" will execute sh/bash/zsh/etc scripts?
noemitMar 24, 2026, 7:32 AM
I tried over 20 variations of different system prompts. Once I changed my tool to expect the colon, it also felt like it was running/calling tools faster, but I need to do a larger test to be sure.
shivang2607Mar 26, 2026, 5:07 PM
And then they say AGI is coming
infosecphoenixMar 24, 2026, 8:06 PM
not just for coding, for any profession, LLMs only learned what people created but not how professionals work.
seertaakMar 24, 2026, 5:49 AM
Is that really true? I would have expected by now that AI companies nowadays are doing RL on git histories, not just on the HEAD.
noemitMar 24, 2026, 7:36 AM
I also expected this. Please run some experiments and maybe other models are different
muzaniMar 24, 2026, 10:10 AM
Claude definitely does
AbanoubRodolfMar 25, 2026, 2:29 AM
[dead]
memolife23Mar 26, 2026, 5:20 AM
[dead]
productinventorMar 24, 2026, 7:49 PM
[dead]
allinonetools_Mar 24, 2026, 3:04 AM
[dead]
moyet75472Mar 24, 2026, 10:46 AM
[dead]
freelancedataMar 24, 2026, 1:01 AM
[flagged]
QubridAIMar 25, 2026, 8:20 PM
[flagged]
Art9681Mar 24, 2026, 3:22 AM
Is "how programmers work" a useful and provable metric? No? Then it belongs in philosophy discussions. How you work and how I work is different. Your work may have ended up in the LLM training and my work did not. Or vice versa.

Can you objectively analyze how VSCode adapts to your way of working without our interference?

Did you test your theory with the actual frontier LLMs (which Kimi K2.5 is not BTW?)