" the original Luddites were primarily protesting against machinery used to "fraudulently and deceitfully" manufacture inferior goods, bypass labor standards, and strip skilled artisans of their livelihoods."
Of course hand made tables are expensive. They service a sliver of the market. Ikea serves the rest of us who'd prefer not to eat off the floor.
Fundamentally, Luddites didn't like being replaced by a machine. They were skilled workers, who used to have very desirable skills. Most people didn't need their standard of quality (but customers had no choice.)
Their name is well known today because we never stopped replacing people with machines. Every industry as been "optimized" over and over again since the Luddite times.
AI is the first threat to the Artisans of today (ie programmers). We are just the most recent in a long history of Luddites.
In every change of this nature, some move on embracing the change, others do not. Some will find other jobs, possibly new jobs, others won't. Carriage drivers became Chauffeurs, some grooms became mechanics.
So sure, I'm a Luddite - I don't want to see my skills become cheap - but I'm also pragmatic. The change is here. I'd rather adapt than die.
This is only true in the beginning, when machines are still primitive (e.g. first automatic looms). Nowadays machines mostly yield much better quality than any human can produce (e.g. automated welding, anything CNC controlled). Many things are only possible to build with machines (e.g. semiconductors).
> A hand-crafted solid wood table is still superior to something from Ikea.
This is by choice. Ikea chooses to produce the cheapest furniture possible, using cheap, crappy materials. Other manufacturers still produce high quality furniture, which is much more expensive.
Have a look at this if you're interested:
This may simply be due to a lack of demand, but regardless, I assure you that machine-produced furniture can’t touch human-produced at the apex of fine craftsmanship.
False. Ikea is not representative of machine-made products, just their own brand of cheap and poorly-designed machine-made products which outsource assembly to the customer for novelty value. Machines are much more precise than humans and do tedious and complex work without cutting corners or getting tired. Buying handcrafted wooden furniture is a class and wealth flex, just like buying any other customized product. The superiority, if any, is entirely in the materials and bespoke design rather than the work - machines can do almost any of the work involved much better than humans.
In that there are a dozen shades of grey between "fully-automated, no humans assembly line" and "person using only hand tools maintained by hand". Tools _are_ machines, are technology, they've just been around so long we forgot that industrially forged steel needs an industrial steel forge.
Probably the best quality product you'll get is from a person who cares, sourcing materials they care about, working with it using their expertise and discernment, and using the most effective tools to get the job done - most, but not all of which will be power or machine tools.
But the point is that the human needs to direct the machines - sometimes that's just a thinking task, and sometimes it's a bunch more physically hands-on.
The reason is that knotted wood is dangerously unwieldy to machine without a lot of additional preparation. End grain work is just hard to automate. Lots of gluing into a shape that can’t be planed easily and prone to exploding if one is careless.
If you want to peek in to the weird long tail, the guys over at Sawmill Creek love to one-up each other in a never ending contest to be the king of traditional wood working.
And yet my working class grandparents didn't eat off the floor, they had great quality tables.
Mine is made of disguised cardboard.
This is a big part of the problem, there is zero trust that any potential improvement in cost or access will reach consumers. Companies don't even bother telling us it will.
We will just be slowly moved into accepting a degradation as the new normal.
You see them in the fact that every single home you'll visit to buy or rent has a fully equipped kitchen including a fridge, oven, likely a microwave, dishwasher and even a washing machine (which alone has a huge economic impact: https://www.youtube.com/watch?v=_gvsz_vc7B0)
You see it in the fact that your home is safer from fires than it ever has been. That hot water is a cheap passive thing you don't even think about, rather than something you have to plan for. That a TV is a nice add-on to it all, rather than a huge deal to get.
Your grandparents' table was more expensive because they had less things, and the massive wood table that they saved months for was what was kept and stood the test of time for you to see today. Because let's not forget, this is also what furnishing 100 years ago can look like:
https://hips.hearstapps.com/hmg-prod/images/furnishing-for-h...
Because that requires manufacturers ready to give up stealth corner cutting as the cornerstone of their earnings in favour of the hard and long task of developing an image of reliability.
------
Three cases I know enough about: cars, loudspeakers and computer monitors.
You can still buy some Mazda/Toyota models to really get more thoughtful engineering and QC for your money, but the Germans with a similar image of quality (Mercedes, BMW) have partially or fully shed the underlying quality.
Genelec remains the only (non-PA) loudspeaker manufacturer you can sincerely trust to take reliability, performance and transparency seriously. There was also Klein + Hummel (K+H) but since being bought by Sennheiser and integrated with Neumann, things have been going downhill... to the point where some curious people found CapXon caps (bottom of the barrel) in their KH80s.
Computer monitors? Since Panasonic (Eizo's supplier of yore) exited the panel market and left it as LG vs Samsung, it's been a complete disaster. Oh, you wanna pay 1~2k $currency for a fancy OLED monitor? Get used to appalling panel QC (banding, uniformity), VRR flicker and DSC crap.
The available choice for "pay more to get better" continues to dwindle...
You do at the bottom of unregulated markets. For dishwashers and ovens, safety regs generally impose a high floor on the market. There is no $40 oven, because it's physically impossible to make a safety-compliant oven for $40. If it weren't for market regulation, $40 death-trap ovens would be a thing for sure.
The very cheapest compliant unit isn't _much_ worse than a mid-market unit, it might be a bit flimsier and wear out sooner; high-end luxury units aren't much better than mid-market units - because there's not much innovation driving progress at the top end. AEG and Bosch are still generally solid engineering, but there's not much point in paying more than that unless you like the aesthetics.
Mercedes and BMW - small-volume performance models aside - are like the big fashion brands, Vuitton etc., they're selling the idea of luxury to people who aren't even nouveau-riche, more like borrowing money to cosplay loudly as nouveau-riche. Compare old 1970s Merc convertibles with today's, the modern ones are just kind of ugly, aggressive and sad.
ADAM Audio loudspeakers are pretty good or were last time I bought a pair. They're designed as studio monitors but great for listening too. Perhaps they've gone downhill since being bought by a listed company a few years ago?
The Focusrite buyout (unless there was another after it) seem to have improved quality and transparency (i.e. publicly available official measurements for their current range). Still, performance remains lacking for the asking price of the A/S models; the A7V has a massive port resonance near 650 Hz, for example.
Interesting post about an old Adam engineer reminiscing about A5X issues: https://www.audiosciencereview.com/forum/index.php?threads/a...
Affordable quality is perhaps harder to find in the US than in some other countries. Because professional salaries are so high, the top 10% is responsible for ~50% of consumer spending. That makes middle spenders a less lucrative market than in countries with more equal incomes.
Sure, the LACK coffee table is recycled honeycombed cardboard. That's on purpose, and totally fine. I hope people don't eat their dinner of those outside of student life though.
The truth is people mostly want not an improved version but cheaper one.
I can get a ultra cheap mdf Walmart table, a slightly less cheap Ikea table, maybe a midrange crate and barrel table, or a very expensive hard wood table from the local furniture store. Here in the Midwest we even have hand made Amish furniture available. So buy what you want!
Customers don't really have a choice either way. Good luck finding quality clothing, services with decent customer support, etc.
Supply chains supporting quality work are destroyed when an industry gets commoditized, and whenever a company doing quality stuff emerges, it eventually gets bought out and the product gets watered down in order to milk its reputation with inferior product.
I would argue that this is quite the opposite. We may have this perception due to how mass-manufactured product are pushed to insane cost-saving measure due to harsh competition. But machine are far, far more accurate than human, and have been for years. A commercial CNC has insane tolerance, a pick and place machine can accurately place parts that human can barely see, a miter saw can make straight and angle cut that would be very hard using hand tools, ...
And I would argue that your example is even wrong. Almost all Ikea furniture use MDF, which is very dimensionally stable, and once protected with a veneer, is decently resistant to moisture. A solid wood table will contract, warp, etc, depending on the grain of the wood, the humidity, ... And will require much more care and regular use of surface treatment. Of course, "real" wood has its own advantage, but it is a matter of requirement. And even that "hand-craft" table is not hand-crafted. Any woodworking shop today use machines. Circular saw of many types, power drill, planing machines, ... Which are faster and more accurate than hand tool (although hand tool still have their place).
> Fundamentally, Luddites didn't like being replaced by a machine. They were skilled workers, who used to have very desirable skills. Most people didn't need their standard of quality (but customers had no choice.)
And that's the thing. As you mentioned, very few go to a woodworker to buy a several k$ furniture. Most go to mass-produces-cheap-but-decent furniture companies like Ikea because they don't have a whole month of salary to put in furniture. Machine can absolutely help create far better quality product. But the way of the world has always been to favor cheap but good enough goods.
The big difference with LLMs and "IA" is that they are not a circular saw, they are not a CNC, etc. They are not a tool made for a specific purpose and optimize for it that can reach insane tolerances that no human could match (and especially not as fast). They are, as the post mentioned, "a highly sophisticated statistical model designed to mimic the distribution of programming". There is not really any equivalent in human history. This is a bullshit machine that is scarily good at producing valid output.
It's why I think it is so controversial and why the dust still hasn't settled and why the usefulness of LLM are still subject of such heated debate. A miter saw will cut your plank at a 45-degree angle very fast and very accurately. If you do a lot of that, the benefit is obvious. But if you had a “magical” woodworking tool that could cut at an angle, drill counter-sink, glue veneer, etc, all-in-one but the tolerances are completely random, how useful would that be ? How much time would it save you ? It would be really tough to say.
That's pants as opposed to skirts. It's a gender implication, not a scarcity one.
They were not against more clothes for everyone - quite the opposite. They were against fast fashion bad quality clothes made in horrible conditions by people (or children) who had no other choice.
I thought this was a pretty good post about it: https://www.verysane.ai/p/against-the-luddites
In other words. The Luddites as elite group that fought to preserve status view is not entirely wrong, but also misses a lot.
In context of audience here - programmers (elite group). You can say that programmers fighting with AI do it to preserve their own status. Or we might trust those (like geohot) who are angry because it's just leading to bad results. It's enshitification - of the result, the working conditions, the ethics. The whole chain.
Economic competition should prove either position right in the long term. If AI really is BAD and is just waiting energy, not providing a real benefit and not really making reasonable quality software/content cheaper produce, then the hype will eventually crash. If its SWEs trying to protect their jobs, then one country or the other is going to lean into it for an overwhelming economic advantage and the rest will be forced to follow regardless of what they feel.
The same thing happened during the industrial revolution or even the neolithic revolution. Superior technology will diffuse no matter what barring an overwhelming geographical barrier.
You could translate it as "a pair of pants", and that's the appropriate way to put it in English, but really it says "one pants".
(It's not the case, however, that all measure words require nouns. 天 ("day") and 年 ("year") are measure words that are almost always used on their own. There might be an implicit notion of "one day of time" or similar.)
I figure the most natural mistake to make when counting pants is to use 件, since that's the measure word that applies to "clothes", but you'd have to already know at least some Chinese in order to make that mistake.
I'm thinking more Kouyu than writing. If I didn't think hard about it, I might say yi shuang kuzi or even, for some reason, yi jian kuzi.
There is too much money involved for any rational debate.
Black and white thinking is not *limited to* monetary concerns.
For the Sam Altmans of this world, sure, but how much money is the average AI booster commenting on HN actually standing to make?
The other side is the stability of your job or job prospects, and we are adversely affected by that instead.
The current state of the stock market is not exactly inspiring confidence about stability over the next few years. Number goes up over sufficient timescales, but if we get a Dotcom-level bust when AI investment slows, there may be a ways to climb back to current levels...
It is batshit insane to me that these days we put a microphone to CEOs, ask them to predict the future, and believe it. The only correct answer that they can say into that microphone is whatever is beneficial to their company, nothing less or more.
> the adoption of AI agents into software development will be one of the most costly mistakes in the field’s history
I don’t use agents, myself. I use a simple chat interface, and a running dialogue, to build software at a function-building level. The resulting workflow is quite “chimerical,” and benefits greatly from my own experience and expertise. The LLM simply lubricates the process.
In my case, it seems to be working well. I would not want to go back.
But the big question is "where will '80-95% of the way' get you?"
Do you grind-out the last 5-20% in a period that's disappointingly long compared to the initial step? Or do you another 80% complete thing on top, and another and another until the whole structure collapses?
The post is talking about what groups might go what directions, which seems fair.
On a tangent, this often gets misinterpreted as "LLMs reduce the time it takes to do the thing by 80-95%". That's not what it means.
That's just those of us with longer memory holding the AI companies to the standards they declared themselves. Nobody forced Sam Altman to blab about a team of pocket PhDs, did they? I don't want the crap that does it correct 60℅ of the time - where is the god damn nation of PhDs in a datacemter already? Where is the AI doing all the SWE work "in 3-6 months"?
That's my experience too, but it's 60-95% solutions in my case[1], with about 120-140% of lines of code required. I wish there was a harness that would let me mask code it should/n't change, because prompt-based refactors fail from the same over-eagerness.
1. I try faster, smaller models first.
The Luddites were (violent) activists. They were more than just "non-believers"
Generally, those being labeled "Luddites" in today's "discourse" are people who dare to question the "AI" hype. GGenerally, these people are not activists
Why on earth would we ever remotely compare a 'tool' to 'a software engineer' ?
The 'great delusion' is not that 'AI can't code' - because obviously it can, and very well.
The problem is the 'anthropomorphism' and all this AGI nonsense.
If we called it 'Stochastic Mechanisms' and did not 'personalize' our prompts, refer to them as 'chat' or give them 'personalities' but remained in the domain of 'Stochastic Language CLI' ... then our metaphors would pbably not cloud our judgments.
Let the philosophers argue about AGI.
By anthropomorphizing it, we give it some sort of authorship, which clears our collective conscience from what's really happening.
For all I know you're the same guy that says we don't need to talk about nuclear weapons in 1937 because we're nowhere near them.
Edit: I don't mean tool as a perjoritive.
The 'tool of the system' analogy is not an unreasonable point of discussion but it does not help us in this scenario.
Both people and AI make mistakes. Perhaps the AI makes more, a lot more, but its so fast, and works around the clock, and has no ego, there is a chance that the benefits outweigh the costs.
Are they are all suddenly turning into zombies? No. Do they have any real idea what that is going to do to their body a few years down the line? Also no. Could it be catastrophic? Maybe!
I think about this when I think about how violently much of the industry has pivoted into AI being the primary generator of code in the last 6ish months. AI is the peptide, your codebase[2] is the body. Literally no one knows how maintainable this approach is, because there simply hasn't been enough time to find out. It could be fine. It could be a complete mess, with your entire engineering team falling asleep at the wheel, lulled into thinking they understand what is being built when they don't, completely impotent to fix or maintain it once the LLM is no longer able to.
[1] https://www.bbc.co.uk/news/articles/cdr268m5pxro
[2] Well, _their_ codebase. I've stopped doing it with my own personal codebases, unless I genuinely don't care about maintainability or longevity
It is huge for token usage also, Claude grepping the codebase for context it doesn't have is the main consumer of input tokens from what I can see.
- humans and companies somehow stop being greedy, selfish and cutting corners to optimize for revenue and time-to-market
- LLMs are the path to artificial superintelligences that will be able to deal with the exponential increase in tech debt from throwing AI slop at the wall (vibecoding) because no one has time to do things “the proper way”
The former is impossible. The latter is extremely unlikely and an existential threat to humanity.
The so called Luddites are the only ones to have even engaged at all with these concerns. Everybody else is just focused on the selfish game (see bet #1) of staying afloat in a rapidly changing ecosystem.
Programmers are rewriting and reinventing the same techniques more often than any other vocation I can think of, and so we were primed for a really good search over prior art. The fact that AI can also adapt that prior art to your particular use case makes it even more powerful.
Much like how great success never came from cobbling together various bits of copy-pasted code from Stack Overflow though, current AI can't really build your whole project.
And the answer to that is clearly a tool that makes rewriting/reinventing cheaper than actually packaging nice reusable libraries
I think in this instance, the only thing worse than a zero day in your dependency tree, is a zero day you don't know your LLM vendored directly into your codebase...
If I were to use it against a legacy, rather poorly written codebase, where the code may be hard to understand without some in-depth analysis. I could certainly ask an AI agent to read the code (How does application X do Y, for example), but I wouldn't have it start hammering out features or have it do any type of refactoring. That would cause far too many commits and confusion amongst the development team, leading to even more slop than whatever we'd already be dealing with.
Just leaving this comment here so I can come back to your comment. I've been getting a bit discouraged by AI lately, but this sums up my experience with it well enough.
We're currently using it to build out a full-scale application. It does as well as you care to coax into doing tbh. You have to invest heavily in harness engineering, and at least my experience has been that as you do that, the results improve.
That is also my experience.
When starting a project I observe how the agent fails, I add new rules to the harness to prevent it from falling and repeat the process until I am happy with the output.
https://www.anthropic.com/engineering/harness-design-long-ru...
https://www.anthropic.com/engineering/effective-harnesses-fo...
These were some of the first major articles on it. It's becoming a popular topic, so there's more content on it all the time.
I learned by reading articles, success stories failure stories and mostly by doing, trying stuff, see how it works and adjusting it and burning a lot of tokens along the way.
What I would do in your shoes, I would ask an AI chat to find new articles on the matter (including on HN), explain how Codex, Claude, Pi are managing agents.
My compressed view is: you need to have a great specification both business and architecture wise that doesn't leave anything important for the model to guess because chances are it will make the wrong choices. That comprehensive spec should not be in one huge chunk. Have your plan divided in phases that each fit in a context window and have the spec for each phase. Use TDD, strive for 100% coverage. Force the model to behave: if it doesn't do what is supposed to, give it feedback and ask it to retry and don't allow it to progress to the next stage unless everything is perfect. I also like to write comprehensive integration tests before building anything. The agents are not allowed to touch or read the integration tests, only run them and they will get feedback where the tests fail. I like to build the integration tests in a different language than the software I am building, to make sure there isn't something platform specific that the tests rely on. I use C#, Go, Rust and Zig for development and Python for the integration tests.
For now, to get good results, I can't just copy and paste the setup from a project to another, I have to work a lot to tailor the process for each new codebase.
And that's why I am working on an agent harness to try to force the agents to do the right things in most common development scenarios without wasting much tokens. By common development scenarios I mean that is a large goal, right now I am working towards backend web development and microservices.
Well that's what everyone is claiming anyway
The hardest thing in software engineering is solving the right problem. The ability to identify the right problem to solve, is IMO, what distinguishes the top senior engineers. And we could have endless discussions about what constitutes the right problem, but for the sake of this discussion, let's reduce it to: the problem whose resolution adds the most value to the product for the amount of complexity and afferent costs that it incurs.
Once upon a time, long ago, I worked on a Web product whose original junior designer had figured it would be neat to be able to manage the backend with LDAP tools. So the database schema and structure that the product used mimicked that of OpenLDAP, with compound CN keys, and the entire codebase had to deal with that structure whenever reading from or writing to the DB. LDAP compatibility was not the right problem to solve when designing the DB schema.
But software that solves the right problems can be hard to identify because, quite often, how it does things seems so obvious that it's not readily apparent what other designs might have been chosen.
Now, the thing that usually keeps the blast radius of wrong-problem designs limited over time, is the very friction that they introduce. Development slows down, including the development of more wrong-problem designs. It's a self-limiting phenomenon.
And that's one major thing which worries me about LLM coding agents:
They paper over this friction. They don't repair it; they just make it so its cost is deferred.
So you gradually end up with codebases that grow unboundedly complex for the value they provide, with no controlling mechanisms.
You end up with juniors who never face the feedback loop from which they'd develop the engineering instincts and the taste for what makes a problem the right problem to solve in a given design.
At scale, as a field, you might end up forgetting there ever was such a thing as solving the right problem.
And I don't know what to do about that. Plan for an early retirement, maybe.
At the same time in no part of his post is any code snippet or anything to latch on to of "the model performed poorly here when it should have done this" - this style of criticism seems to be a pattern of most of these "the LLMs will never work" style posts on blogs and twitter.
They obviously can perform better than autocomplete and in my own day to day development build out huge portions of a codebase that I would have expected a junior or midlevel engineer to perform at.
How are we really supposed to grasp their actual capabilities when no one will actually cite specifically what mistakes they are making.
The mistakes they make are pretty subtle. Coding with LLMs can be like that scene in Whiplash – <excellent drumming >, not quite my tempo, <excellent drumming >, downbeat on 18, <excellent drumming>, you’re rushing, <excellent drumming>, dragging, …
Like yeah it produces working code almost always and the code usually does what you asked. And yet it makes you want to throw a chair because it’s not quite right in frustrating ways and it doesn’t even have the taste to know how it’s wrong.
Why are we not showing the bad choices? On my computer I have hundreds of diffs stored by my agent code review tool that point to style/architecture failures (and in the end, the result of that iteration on the AI output)
I'm not quite sure how people are generating unsalvageable outputs. I'd never ship the result of a first AI pass, either. I review all the code and the architecture, within reason (eg: in Rust I don't preoccupy myself anymore with precisely scoping pub, or whatever, unless I'm making a library crate). I sent a "changes requested" prompt+json to my agent, and it interactively fixes everything (even style, even comments with manual patches with my in-review-tool editor)
I feel like with LLMs, it's like a situation where you are close to some feature or project and have a pretty good idea in your head already of how you'd implement it yourself "I'd do this and have an API with that and a database table foo for storing bar with index on baz" and you're keen to get started on it ...but then someone else gets assigned to work on it not you.
They do it a totally different way than you would have thought of doing it, and the code feels alien and weird because it doesn't follow your "design" and decisions you already had in your head before they started work on it. Is it "bad" or just not how you'd have done it?
I think that is ok. So long as the code works and meets all stated requirements and is secure and performant and uses good abstractions and is not full of hacks, then it's ok to let go. Sure maybe you'd have done it a different way but ultimately that doesn't matter.
That is the problem. The code often is full of hacks and bad abstractions. LLMs write code like a junior or mid-level engineer – perfectly overfitted to today’s request. Oh you need to work on this code tomorrow and there’s a laundry list of future requirements? Throw away and rewrite, I guess.
You can most easily see this when you ask LLMs to write tests. They have a tendency to write convoluted tests that absolutely definitely pass. Even when you know the code has a bug, they’ll write the test in a way that fits the code as written and passes. Because they know tests should pass.
Getting an LLM to write a failing test against a currently working function because you know the business requirements have changed is like pulling teeth.
You don’t see writing about this stuff because it doesn’t neatly fit in an article or video (I’ve tried). Plus it goes against the zeitgeist so you’d never get traction (even if people write these posts, we don’t see them)
So we can't make arguments by citing specific examples, and also can't make arguments by not citing specific examples. Whelp, I guess that's the ball game.
(yes yes, I'm committing a group attribution error, but still)
I have no doubt the top nth percent of coders could write circles around Claude or Codex, but how much worse are they than your average schnook?
The more experience you bring to the table, the more value you get from these tools.
Look, about 12 years ago articles about how if you're not pair programming you're doing it wrong were on HN's home page every day. Doing well prompted plan -> agent -> debug cycles is like pair programming with someone that knows every SDK and API intuitively and doesn't have to pick up their kids from daycare at 4pm.
When I think of how much money gets wasted on gambling apps and how much human potential gets wasted watching reality television and compare that to Steve going full Alexander Shulgin with LLMs, the comparison really falls flat.
My main issue has been the inconsistent quality across between model releases and the tendency to insert older APIs or documentation, especially with command line tools.
I can understand if the model struggles with a million line monolithic codebase with a decade of cruft but can't think of why it'd be too much of a pain with new codebases.
How long do you think it will be before you can't write any code because you're out of practice?
One of the dangers of engineering management is that it can turn you into a person that can no longer do the thing.
Does that even matter?
Having said that, I won't use AI for production system if I don't understand the programming constructs in enough detail.
And how much is that?
Me: Do it anyway
10 minutes later
AI: Perfect!
After a few hours of this I still look at the codebase and think "wtf is this?".
I think writing the code is a very important part of understanding it. LLM driven development is like doing maintenance programming from day one.
I’m a little more hopeful than the author though. I feel like it’s possible to manage the process so that does not happen.
I've also managed to use LLMs to cut a lot of manual duplication in code where we typically didn't do enough investment: "Claude, evaluate code duplication in the functional test suite" will have no problem finding things like insufficient helpers, or tests that are testing simpler things as prerequisites, so they can rely on each other. So I am not seeing my codebases growing all that much. There's some risks of functional changes that before would be rejected due to cost which now are not, but I am not all that sure of how much that is controllable without being relatively antagonistic with management.
This is the gold, right here.
It doesn't engineer. It writes code. Enthusiastically. Usually without thinking about the bigger picture, the design, the architecture, the trade-offs, etc.
It's up to us to manage that process.
It's why senior engineers are finding LLMs a really useful tool - because we've learned to think about all that other stuff before opening the text editor. Writing the actual code was always the easy (and least valuable) bit.
https://blog.k10s.dev/im-going-back-to-writing-code-by-hand/
However I find their claim "I've lead teams of really competent engineers and I can leave them without supervision for months and come back and not feel like throwing away the entire code base." dubious. We all know how much effort it is to keep the quality of even small patches consistent.
Design, architecture, style and refactoring still require significant involvement. Providing only a description and a criteria will likely produce hopelessly messy code, which is also what you get with most corporate dev teams.
Coding isn't very hard, so it's often easier to just code than read and write English. I write Haskell exclusively though so this might bias me.
Seems like it is completely hopeless at doing anything netcode consistency and performance related in game dev. Seems like unique game mechanics it doesn't do well either.
Seems like asking it specific UI stylistic changes is basically like throwing darts at a board and hoping it sticks.
Ah I see your org hasnt yet had an outage caused by a bad LLM code push.
This is the wrong mental model.
The way to think about an LLM is like a human: prone to following bad examples if it sees them, needs guardrails to catch mistakes, needs code review. It also needs access to what "correct" looks like: architectural design documents, skills that explain each type of change, etc. It needs prompting/skills telling it to follow a safe workflow, telling it to consider how a safe rollout would work, what a safe rollback would look like, what the performance implications are - just like a human.
The nice thing is that you now have a very knowledgable assistant that can help write additional guardrails that would have always ended at the bottom of your backlog. Perhaps it used to take many hours to research and understand how to write a custom linter to catch a specific coding pattern. Today, ask Claude to do it and an hour later you'll have a custom linter rule for your language of choice, guaranteeing the same mistake can't happen again because CI will block it.
"We went back to shovelling by hand because someone ran over the pole with the front-loader, even though he had no experience driving it."
This is definitely user error; obviously it's a hard tool to wrangle but it's entirely possible to use it safely.
I've had a lot of success in using LLMs in smaller tasks that I peal off myself, but it's largely due to having an existing architecture that makes sense. The times in which I've tried to let the agent loose to make architectural decisions, it tends to wildly-overcomplicate stuff (which I'm only able to recognize because it's a problem I suffer from too).
That's not to say those smaller tasks aren't useful or time-consuming. They're important, but I try to remove the critical assumptions that might need to be made.
If everyone who uses it correctly finds it frustrating, and if the only people who love it produce a mountain of unmantainable slop, we will quickly abandon it to the dustbin of history.
A lot of things have "potential" but never amount to anything.
We're going to keep using LLMs but the utility of agentic coding has already peaked in my opinion.
When I got into agentic coding a year or two ago I was sure it was only good at autocomplete. Something happened earlier this year where the models hit a new level of capability.
Everyone I know now just does agentic coding, and it’s really amazing. I think we should just try pushing this as far as we can possibly go, it really feels like the acceleration of the human race is upon us.
Besides, I have been hearing "this is the limit" since the doomers of "this is just a markov chain and can't be useful".
Yet the limits keep being broken.
Yes, something happened, it got better at autocomplete. What else could be? The underlying model hasn't changed.
>acceleration of the human race
Please just stop with this bullshit. Nobody's curing cancer, climate change, inequality or whatever important real problem there is with LLMs. Nobody.
If this tech is good enough to make you more productive is just because you're not working in anything new or cutting edge or innovative. The only reason a LLM knows how to do your job is because that code has been literally written before enough times to appear in the training data. Try to use llms to write C++26, some HDL or in any niche stack and you'll get a nice reality check about LLMs.
Why do you think that is actually a good argument against? Most “business” problems have already been solved in some way and the times I had to write really novel code in my career have been very very few.
Also sure LLMs haven’t solved cancer or unequality in the few years they exist - but humans also failed here in the last couple thousand
being able to program is not the only skill required to be a successful software engineer, so no ai agents cannot be software engineers
very important distinction - i personally like the radiologist example - looking at scans is a part of a radiologist job, AI can do it better than most of them, but looking at scans is a small part of the job, most of it communicating with doctors to help their patients
I always wonder whether HN suffers from periodic influxes of newbies who don't get it yet and rile up the regulars.
Eternal September is the kind of trivia that let me know if someone is a grey beard or not.
I hope that professionalism still matters as these new ways of doing things strikes me as unprofessional as f...
Yeah, the next macOS will be worse... time to place bet on prediction market
But it can write working code much faster than I can.
And in a lot of cases, unfortunately, faster beats better.
I think you have just written the epitaph for corporate software.
I am at 2 for 2 at the moment in the "infrastructure as code" arena (I wasn't involved with choosing to use a consultant, just dealing with the output). Which is an area that AI was supposed to eat for lunch. And it seems like it should, DSLs with a narrower scope seem perfect for an LLM, but I'm not convinced.
I think the issue is, infrastructure DSLs like Terraform or Azure Bicep are distilling down an architecture that has complex interactions and often needs a lot of "inside baseball" knowledge from outside of the code to create a congruent result. Unless you feed a bible of markdown files to the LLM to guide it in the right direction the output goes off the rails fast. The time spend creating the bible might as well be spent creating the code.
Of course there are areas where an LLM will definitely help, like re-factoring, stamping out boiler-plate or even building on a solid base. But attempt to create even a semi-complex architecture from scratch using a few paragraphs of prompt and you are asking for trouble.
The trouble with the consultants I have interacted with is they don't write the bible first, as far as i can tell they just iterate on slop and you end up with multiple 800+ line PowerShell scripts in IaC pipelines and other craziness that is almost impossible to unpick after they have gone.
Agents now are writing extremely consistent, normalized canonical code, that usually compiles the first time.
Right out of the 'textbook'.
For what it's trying to do - it writes nearly perfect code.
The only thing you could nominally disagree with are some of the conventions and idioms.
It 'writes a perfect novel, in perfect prose'.
What it will not do however, is 'write the novel that's in your head'.
And that's the crux of it.
It's not even your job to 'write code' at this point, but rather to be the storyteller - and a very good editor who has enough taste and grasp of gammar to be able to know when it's going awry.
It will make mostly what you tell it too, the quality of the output is the quality of your guidance, but at the lowest levels it's generating extremely high quality syntactic prose.
Those matrix multiplications aren't a divine perfect thing. They suffer from floating point precision issues and training data issues and there's still debate if adversarial examples are just an unsolveable property of our linear-algebra based neural network architecture.
Can they do things way faster than a human? No doubt. Can they do very complex tasks? Yes. Do they do things with perfection? Not by our human definition of perfect.
"Not by our human definition of perfect."?
'Human definition' has nothing to do with it.
Your job is to define what you want, to the extent you can do that, the AI does really well at a certain scale, at the 'functional' scale, nearly perfectly.
Modelling a problem is what I'm concerned about. And I'm currently better than any AI agent at doing that, given enough time.
Even with the few 'tell tale' patterns it's been leaving ... that threshold is being moved past quite quickly. Within not even 6 months, works will be identical for all bus some specific activities.
[1] https://www.sciencedirect.com/science/article/pii/S147738802... [2] https://link.springer.com/article/10.1186/s41077-025-00396-6...
I wish you just started with the copout.
https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...
Holding LLMs to the standards of human contextual understanding without communicating sufficient context can be dispensed with, as fantasy.
Using LLMs you quickly learn how much can be inferred from existing, which is dynamic and particular to each model today. There will always be a gap in how much instruction is needed due to mismatches of existing versus intent. Current state, you don't need much to get a lot out that can be verified prior to merge.
LLMs + Harnesses are incredibly effective, as evidenced by the literal millions of people who are paying quite a lot to use them, who speak glowingly of them and would 'never go back'.
Whatever 'shape they take' - they are obviously useful - ergo - 'you're doing something wrong' if you can't make use of them for most tasks.
Mileage varies, there are downsides, but it's the same with anything.
Anecdotes from sunk cost users aren't evidence.
I've heard this said so many times, but my experience has just been so dramatically the opposite that it rings false. But geohot seems to be a pretty productive and smart guy, so it's hard to just dismiss what he's saying.
I get the sense that he's truly one of the 10x engineers. And maybe he can do it faster and better manually. But for those of us who aren't 10x, I think it lets us bridge that gap. Now we're getting back to "status anxiety": is this an attack on his ego, if the average becomes 10x?
Anecdote: Over 2 weeks of spare time, I used AI tooling to build a fairly sophisticated debian package caching proxy server (~72KLOC, 27K implementation, 45K tests). This would have easily taken me 6 months of focused time to implement by hand. I literally couldn't have done it because I can't take that much time off work and I have other weekend/evening obligations.
This is dangerously incorrect. AI summaries of search results consistently return incorrect information and grossly oversimplified and thus misleading summaries, neither of which are detectable unless one either has prior domain knowledge or spends time drilling into search results to validate the AI output.
I chalk this up to primary two reasons. First, I cared a lot more about the implementation details of the C program than I did the Python one, and second, it's just better at simple stand-alone python programs than it is at C programs.
The criteria I know use is "do I care about the implementation details of this?". If I do (because for example it's going to be long-term code that I need to maintain) then the agent likely isn't worth it. But if I don't, there are huge efficiency gains to be had using the agent.
The quality of the codebase decays precisely at the rate you stop reading the results. This is not an issue of AI writing the code. This is an issue of unreviewed code. geohot's issue is entirely valid. This problem does exist. But this isn't dependent on the generation phase.
If the answer is yes, the argument doesn’t matter: you just run the loop and wait for llm analog of moore’s law to get costs down.
The issue is people mix up complexity, novelty, repeatability and scale.
Well documented complex problems can easily be solved by LLMs.
Doing the same thing over and over again is easy for an LLM.
Novelty and scale is very hard for an LLM.
Even small novel problems confuse LLMs.
When you start a new code base the LLM smashes through the boilerplate work. Then when it gets to scale it struggles with context rot plus novelty.
The AI agents are great, and any expert can prompt them correctly to get good code. LLMs occasionally pick wrong patterns and start digging a hole, but this is why an expert is required. The code itself is just not worth writing when a detailed prompt can get you the same code typing 20x less text.
Where I agree with the post is:
The adoption of AI agents into software engineering is a problem. Solo projects are great, but our teams have not adjusted to the speed-of-change to a mental model of a project. So I see orgs making a choice to either: slow down or forgo the shared mental model.
Anybody choosing to forgo the mental model is building crooked legacy slop at scale. You can and should save the mental model to an AGENTS.md, but devs need it in their brain to prevent the digging a hole behavior.
To be fair the digging a hole behavior is something humans do just as well. But in teams you'd communicate enough to catch it - hopefully^1. It's the combination of higher speeds and teams that's creating a bit of a disaster.
I'm not sure what a good solution is either. There is a case for solo devs running for 2-month sprints with much more freedom. Perhaps we'll have an "AI Agile manifesto" within a year.
[1] Though you should not underestimate the amount of poor code being created before LLMs. There are enough teams for whom LLMs are practically all upsides. Stay very far away from those.
I'm lucky to be part of different codebases, +200 engineers codebase in a 10 years old company and code, +5 engineers on fresh code. My personal projects, that are beyond POC's, real users, hundreds of commits.
The LLM agent sweet spot is the last one, they are perfect, as I can contain most of the knowledge in my brain of how it works in/out. Speed is insane as a solo developer.
Then the 5 engineers codebase, is also really good, but here you already start to see the problems, thanks to agents you don't even need to care how it works, I have been working on it for +6 months, it uses TRPC and I don't even know (I don't care) how TRPC works. You feel that no one in the team really knows how stuff works at 100% (fresh codebase, we have build this ourselves!!).
Then there is the old codebase with +200 engineers, this is the worst of all of them, you described it perfectly, a bottomless pit of tech debt. This codebase before agents was an old non-typescript one, it was not perfect, but you could build a mental model and understand it perfectly after a few weeks working on it. Now, is a hot-mess of code duplication and the quality is degrading faster and faster as the code gets worse and the Claude Code adoption increases within the engineering team.
Not sure what will be the outcome of all this, but I wouldn't be surprised if some company wakes up in 2027 with a codebase that maintenance and development has increased by x100 fold thanks to Agents.
Once Humans just had oral language, and we could us words to pass ideas from one human mind to another. Then with writing ideas could pass to minds that weren't immediately close together in space or time.. and with this we made complext global spanning civilization. When words just become noise, that one has to be suspect of each one as to whither they'er coming from another human mind, or just a statistical process, can this civilization even survive?
How much will it take for AI agents to pass from distilling decades of collective wisdom to copying each other's worst mistakes?
Here's a sample of my work using digital cameras, not a food picture in sight.
https://flickr.com/photos/---mike---/albums/7217772029640662...
The thing about having the ability to take effectively free photographs is that it really lets you experiment and learn the edges of what's possible.
I was inspired by Stanford's camera array, and wound up doing virtual focus synthetic aperture photography. I'm hoping to build a rig to do it on near real time, instead of the manual process I used to do on my train rides to and from work.
Sure, the removal of cost lead to a flood of the mundane, but it also means we can capture our lives in ways that even kings couldn't afford in the past. I have thousands of good photos, and even some video, of friends and family.
I used to be deep into photography in the 00s, until I came to the conclusion that I was spending lots of money, spending hours trying to take the perfect picture and then hours reworking them on the computer... for what?
Sure, I had fun while doing all of this, but to me art is about sharing. And given the sheer amount of pictures that were published daily on Flickr at the time, I basically had to spend more time in trying to reach out to people to share with them, than in creating something, which was my initial goal.
Or I could strong-harm my relatives, but I myself hate people that are forcing me to watch at their kids/holidays pictures, so...
And looking at the figures on your pictures, taking them at absolute value (not considering what personal value you can give them), it seems that your art is not seen by many. Contrary to thousands of random pictures of take-away than can reach thousands of eyes in a mater of hours.
Which is, to me, depressing. You put your soul in something and it has almost no reach, while random snapshots got millions of views.
And I see how the same thing will apply to software: flood of vibecoded trash that can be used by many, while deeply thought after software will be used by a few.
The good news for me here is that I never thought software as art but only a tool. I can vibecode the hell out of an issue, and the code will live on my private repos. Don't care. Sometimes I put stuff on Github but I don't care if someone (or something) uses it. I'm focussed on solving my issues.
Oh, there's a story there about assumptions and bad UI. I had fairly large numbers, and tried to whittle down the thousands of photos I had posted to the ones that were favorited, and in the process erased everyone elses favorite tags, leading to rage quitting Flickr for a while. It's all now a mere hint of what it was.
I don't understand how it's remotely reasonable to try to make the comparison.
First of all define productive. Would someone using AI to build software at a startup which is likely to fail be considered productive? What if there is already similar software available that solves the same problems? What about the broad use of LLMs to draft emails or make silly memes?
It’s funny how everyone’s concerns around climate change just disappeared when they realised AI was useful to them.
Could AI technology change the world? Sure. Will it? That depends on so much more than what the technology can do. Why are we all still working 40 hours a week? Why are people still hungry? We could have radically changed our world with the technology we have had for decades. Yet, we have not, we have continued, nothing has really changed.
The internet is a great example. What is the most impactful part of the internet today? Social media. Social media has radically changed our culture. What is social media? A database, a few endpoints and an app? The technology is the least consequential part, the consequence comes from how we use it.
Nerds focus on what is possible with the technology, not what society is likely to do with it. What evidence is there that AI is going to change the world? What change is going to come from... being able to generate plausible sounding text? From being able to instruct agents? How many companies are using garbage software from 20 years ago despite dozens of revolutionarily better equivalents being available out there today that could have drastically reshaped their workforce? What are agents if not better macros? How many businesses have hundreds of employees doing the same tasks over and over again that could have been replaced by a few macros? How much of the code that you and I have written in our careers has already been written before?
The fundamental usefulness is the least important part of a technology when discussing how that technology will impact the world.
Cryptocurrency is more popular and intertwined with the financial system than it ever was so while the claim isn’t currently true it doesn’t mean it won’t be on a long enough timeframe.
If you are old enough then you would be aware that similar claims were made about email but only one country that I know of (the Netherlands) no longer processes mail. Still if we had to guess I would say that we are still early and email will replace the worlds postal systems.
Which has a good track record of being right, most of the time.
I agree!
However, this isn't a plug to be using AI for coding everything, but a more general plug that AI should be integrated to a lot more things outside of the mainstay of chatbots.
There is a lot of merit to using AI to establish a new abstraction layer.
You guide the AI with some prompts and give it some guidance on how to scenario-test it. It makes some classes, test methods. Maybe ~2000 lines and you do a quick verification, check if the overall idea looks okay. Ask it to fix a few design things and then merge it.
Its much easier than doing it yourself with all the boilerplate and understanding each esoteric language specific thing. Which library do I use for UDP communication in golang? The agent might have made a good assumption. These kind of things is where it speeds it up.
We have our core code in a weird dialect of C and rust. C I know well, but not rust. Our tests are in Python. The pipeline descriptions are in Yaml.
Outside of the core code there are so many arcana to learn. Writing syntactically and semantically correct yaml/Python test code would be a nightmare. The Agents have flaws, but they provide a huge leg up in improving the tests.
And they are great at providing a first pass review of the core code before bothering a human reviewer. Lastly we run some of our test failures through AI triage, which often enough finds the root cause or rules out simple failures.
This shows up in a higher checkin rate. I'm curious to see whether this will lead to quality end product since we have more support for the more manually written and reviewed core product code.
LLM's are directionally right and if their answer "fits" then I take it at face value.
I wrote a blog detailing the computational difference between "generation" and "verification" and why it matters for LLM's: https://simianwords.bearblog.dev/the-generation-vs-verificat...
As an example: I asked the LLM "synonym for "provides" that also means "places" on you" and it gave me 5 answers and I immediately knew the right one was "confers". How? It just fits. Just like most things.
I’m skeptical because I’ve seen this exact situation and I’ve seen the result be something that anyone experienced wouldn’t do.
The point is, it’s a game of chance and yet good players beat bad players in the long run. Your job in the new era of software engineering is to design the process so LLMs doing your code monkeying avoid the losses (including discarding bad changes) and take the wins. Win often enough and you’ll come out ahead.
I think what you are saying is that people should learn and appreciate working in high variance environments and still exploit small gains. This is clearly not something that is easily digestible to people so they end up rejecting LLMs.
now, what if you asked for the synonym for "provides" in a language that has gender differences (e.g. spanish/portuguese) as well as societal nuances (e.g. japanese) and it gives you "confers", how would you now know that's correct?
ah, so you say you tell it to take into consideration gender differences, as well as societal nuances. What are those, if you were not already familiar with the language?
The extent to which LLMs help is determined by how well acquainted you are with the domain. But it will always push you directionally in the right direction.
In your case, you used a language example and this is one where LLMs have natural strength in. I don’t need to be an expert in Spanish to trust it because I know that LLMs are specifically good at catching these problems.
But again there are limits and good to understand it.
LeCun thinks that LLMs are a bad fit for AI that understands the physical, dynamical systems that we inhabit, and that understanding this is necessary for AGI/ASI.
I don't know that Hutter is bearish on LLMs, but Hutter is interested in AI that can reason exceptionally well given infinite compute, and approximations of such a reasoning AI. I think he is open to the idea that LLMs can be such an approximation.
> Without fully endorsing all their ideas, I’m now in the LeCun/Marcus camp on LLMs.
I'm pretty sure he means "Yann LeCun and Gary Marcus" not "Yann LeCun and Marcus Hutter".
I'm very interested in APIs that allow client-side context construction rather than relying on opaque APIs concatenating strings from your JSON messages and injecting tool prompts. I found that generally, you can craft the entire context as a unicode string and just stuff it in the system message. This works best with models where the chat template is published.
If you are already comfortable with letting other devs work on features then it's easier, because it's similar (arguably you have more control with AI, because what you say goes regardless of hierarchy).
I've been running into this experience with non-code artifacts, like slideshows and documents.
This line which he wrote, will override any quality gaps, because the cost to produce that shitty software will be lower than the cost to produce good software.
I can't agree with this. You tend to get one point of view, often without any actual resources and references so you have to go look it up yourself, on [insert search engine]. Plus, what does it say when we consider an AI the one stop for our data intakes.
I think the discussion about methods (coding agents included) depends on answering those questions. Seems pointless to claim these agents [dont] make you more productive.
Although, at a first glance, the productivity increase does seem like nothing I’ve seen before. Even more than the transition of making webapps in plain js -> jquery -> frameworks or going from something like Flask to using Rails.
Problem is this is not evidence based. I just feel prototyping has speed up 100x. So the number of iterations/attempts has gone up. Transforming specs into a test suite takes a fraction of the time. Dunno, feels weird not to be able to be overall more productive (do more with less time) if you have these new tools.
But each time I suspected I could have done it better and faster manually
There is a class of tasks that can't be done faster manually, unless you're some sort of colour-smells-like-chicken-and-numbers-have-taste genius. And there is other class (my suspicion now is any non-standard task+framework) that are slower than using agents. So I can imagine you have excellent experience with some tasks like USB hacking and would do it faster than LLM. On the other hand for me, as a Java developer, hacking a USB is finally possible with LLM. Otherwise I'd need to stop-and-learn for some time, which I wouldn't, so either I'd by a more expensive hardware that fulfills my requirements, or put the USB reverse engineering project to my 100 acre todo listIf your work was previously googling stack overflow, it can be incredibly useful at working through that. Which let’s face it, that’s what a lot of us do.
I hate how both the for and against case for LLMs are just so bloody terrible at addressing these things.
You have to use something like superpowers, the key is that the humans need to make the important decisions.
You have to review the code - just like you had to review the code humans wrote. There will be iterations.
You have to give the LLM skills and patterns to follow, access to architectural documents, etc, just like humans needed to be onboarded at a company and do the same.
If you get all of these right with today's LLMs, you will never write code at all because it is so obviously not the best use of your time. If you feel that you are still better at writing the code manually, you have not done the above right, fix your workflow and try again.
To use a Geohot-inspired analogy, what we have now is like the Google self-driving car of 2010. It works most of the time, yet sometimes fails in unpredictable ways. So you need a safety driver behind the wheel to constantly watch what it's doing (the code review).
A real AI agent would not need a safety driver. We don't have that but many people are basically saying "fuck it, I'm just going to set this car off on its own and see what happens". And sure if you're prototyping it's not dangerous. But for production systems that is dangerous.
There is some very cool tech it just needs continued refinement, there is a path forwards even if it isn't always the clearest. This is happening but it is taking years and a lot of work to get done.
Nailed it!
At my last place this was encouraged (by non-technical leadership driving the AI adoption policies, as well as setting salaries) and seen as a huge win.
The "step change in number of created PR's" was celebrated (cult-style), and by one of the (co) CEO's praised as a paradigm shift of the same magnitude as the personal computer. Meanwhile, I was stuck finding insta-reject level bugs in pull requests from people one-shotting 6000 line PR's "finally solving" long-standing issues from the backlog. Needless to say I left.
There aren't many truly general purpose tools so viewing things this way seems like either a fantasy or an over-reaction. And if nothing else the processes we use will have to change along with the tools.
It's the early days so we still have a lot to figure out but one of the most significant is which tools are appropriate for what sort of tasks. I've had good luck refactoring a small code base, building some small hobby projects and building features for our company's product. But, I've also dodged bullets doing greenfield development on some features where Claude (my default) has made what seemed like sound choices early on, and which I approved of, only to build something fragile or with unforseen consequences. I haven't quite figured out what distinguished those situations from the successful ones but I'm trying. But it's complicated by the fact that things are evolving quickly and yesterday's failure mode isn't the same as today's and, for that matter, yesterday's successes aren't guaranted to be repeatable today.
See, the project actually has a well thought out structure that I design carefully, but more and more of it gets filled out by Codex. Codex is not smart enough to remember all the high-level design considerations, some of which had not been documented because I was just implicitly assuming them. So the fix was to use Codex to isolate the error, think about in terms of the high-level design, and fix the problem, which was partially an implementation problem, and partially a problem of the high-level design.
I fixed the high-level design with discussions with Codex, and documenting this, and then let Codex implement the fixes. The discussion took me more than an hour, the implementation was done in a few minutes.
This working style is similar to doing math: You have a high-level idea of what you are doing, and let that guide you, and Codex assumes the role of something that fills out all of the details you take for granted. Often it turns out your high-level idea had flaws, and this shows up in your code not working as expected. So you revise your high-level idea, refactor the code to reflect the modified high-level design, rinse and repeat.
Working this way is still really hard, but it allows me to do things I could not have done before. Getting your ideas validated (or refuted) in minutes instead of days is huge, and makes it possible to march through stuff that would have turned into a deadly swamp before, at least for me.
Now. Do I think that most corporate programmers will use Codex or CC in this way? I don't know, but I think probably not. So what will stop them going into the swamp until it swallows them, instead of backing up in time and marching around it?
For something to take "longer and longer" to realise, doesn't they imply that it's been realised at least once before or that there was an expected deadline for the realisation?
Okay, that's a nitpick.
So what he is telling us? That agents are not infaillable and they are not capable to one shot complex software and they do not produce perfect code?
We know what and the solution is to use agents for what they are good at and work around their limitations and we have a human in the loop.
>not some RLVR shit that comments out the failing test and tells you all the tests are now passing
That's what harnesses should be about: detect when the agent is misbehaving and force it to take the right approach.
This example in particular should be easy to solve if we generated the tests before coding and we have a workflow or state machine that doesn't allow the agent to disable tests and doesn't allow it to reach the next stage unless all tests are passing.
Saving money is the wrong reason to use AI now. AI is expensive if you want good results.
But what AI is good for, is it allows you to build fast.
Also, I don't see everything being automated. To get good results you have to drive the AI.
The factories still have workers supervising the process and doing some high value manual processes even if most of the production is done by machines.
This is effectively what's happening to software. We are getting some forms of automation but I believe there's plenty of manual work and coordination left for humans to do.
For me, the AI is essentially "faster hands" that can type what I am thinking way faster than I can do it. I tell it what I want, I give it the broad architecture and design patterns/types to use, and any specific test conditions, and let it write all of that usually by the time I have responded to a single email or chat message or two. Custom instructions etc build overtime to address model blind spots or my own personal taste so I don't have to repeat myself in every prompt for cross-cutting things.
Does it "one shot it"? Almost never - we go around the cycle a few times, treating it like pair programming a junior or intern by keeping a close eye on the broad direction and making sure it is acceptable - course-correcting where it matters, but cutting some slack where it doesn't. Sometimes I ask it why it picked a particular approach (that I wouldn't have necessarily) and it gives me a cogent explanation and we go with it, so I actually sometimes learn new things from it too which is great.
The other use case is just it's sheer capacity to research a codebase and hold everything in it's attention at once. It can comprehend unfamiliar code way faster and way more in-depth than I can. So if you are in an unfamiliar code base or a language or framework you are not that familiar with, it absolutely shines because it can just absorb all that info in seconds, and then you can just drill it with questions and what-abouts and how does it do this and what technique is used for that and that, what are the existing patterns and norms in this codebase when it comes to foo or bar? Etc etc
What I am not doing is deferring everything off to the AI unless it really doesn't matter (e.g. disposable one-off or prototype code). Same that I would not expect a junior or intern to make big architectural decisions when implementing something - you keep them on a fairly close leash and watch what they are up to.
Sloptember is clearly a reference to this - the similarity being that masses of AI generated content, from social media posts to open source contributions are replacing the human internet. In a way this is related to the "dead internet theory", an idea I previously found hard to believe, but these days could easily be true.
If the history of the internet interests you, both these are worth looking up.
I mean, this has been the trend for decades really, before LLMs were a thing. The incentive is skewed toward quantity rather than quality. The new tools just add more fuel to the fire.
Code quality is also really lacking in much of the industry. The truth is, these LLM models, as limited as they are, program at a level above that of the median junior programmer.
I don't get how anybody who has used the SOTA models in the last 3-4 months can write a sentence like this?
They most certainly can program. And usually better than 90% of my coworkers.
The question is really.. Can they engineer? By which I mean handle the duties of a software engineer working in a team, managing a large complex system, making reviewable pieces, forward progress in incremental steps, etc.
No, that part I'm definitely more skeptical about. That requires slave driving by the person in front of the prompt.
But this is a useful distinction to make. Because making overly pessimistic claims about the coding capabilities of the models makes me question the author's experiences with them.
I think agentic tools are toxic to team programming culture and engineering that produces reliable stable results. But I wouldn't for the life of me question their ability to write programs.
-- I think this article is COPE, if I'm being quite honest. I thought of putting cute analogies, like the C programmers saying the Python and Javascript programmers are not "hardcore" enough... but the truth should be obvious to anyone using LLMs effectively.
-- Current AI is a much better programmer than 100% of people and when directed by someone in that top 10%, it's a force majeur.
I assume you meant something like 'force multiplier"? Force Majeure is an uncontrollable event that prevents a party from fulfilling a contract. Which some may argue is what AI will also deliver. :)
EDIT: To people downvoting me, please come up with a reasonable bet and lets try to work it out.
EDIT 2: $500 bet paid to your account on whether LLM's are going to still be used productively or not. No one?
EDIT 3: Any bet that would express the author's argument in a way that can be disproven in the future
He is just making a moment-in-time assessment of how he feels about current state of LLM coding. I don't think it's a super-strongly held opinion, I'm sure he would be willing to change his opinion if the next two years of LLM development produce exponentially better results.
> I’m calling it now, the adoption of AI agents into software development will be one of the most costly mistakes in the field’s history
> Agents _will_ end up hurting large organizations more than high performing individuals or small orgs
> Do you think macOS will get better or worse in the next 2 years?
> I don’t think models like this will _ever_ be able to program, I think the process matters.
From this quote especially how can you say he only thinks about the "current" state of LLM coding?
You are wrong. I don't think this is a point in time assessment. I don't think it is not supper strongly held. I do think he will change his mind though.
That's why I asking for terms of the bet.
That could easily be true and Apple "will use even more tokens and spend even more money".
Three years ago I mentioned to co-workers I was most concerned about juniors not being able to build skills to become mid-level or senior. In the past year others have started talking about the same. But also I had thought people who were already mid-level or senior could resist and control themselves enough to use it well - but in the past six months two different co-workers have independently said they've noticed their own skills atrophying. And with those skills atrophied, they'll have less and less input/direction for the AI tools.
My suspicion is that for everyone who has gone all-in, a few years from now, they will still see a productivity increase from their own baseline - but their baseline will have dropped from where it is now as they "get used to it".
Are we really still doing this?
However there's still a distinction. Unless I'm responding to an LLM, you had a childhood. You learned about the world and space and agency before you ever learned how to program. And you didn't learn it from billions of examples, you learned from a few examples, some self directed experiments, some feedback from teachers, etc...
I'm saying that's what matters. The process matters. You didn't learn to mimic a distribution, you learned to program. Of course in the perfect mathematical limit it's the same, but in practice it's not.
1. It only accurately describes pre-training 2. It ignores the existence of generalization
Next token prediction is just a training task, not "what the model does internally" in any meaningful sense
However, 99.9999% of coding is not like that. Non-coders don't care about the code at all. They just care about outcomes. People don't care if it's "slop" if it works. Similar to bug prevalence, the optimal level of slop is not zero and will be decided by the market, not by coders.
I do not want a $10M - $100M dollar issue (lawsuits) because I admitted that I don't understand why a breach happened after using a coding agent. Responsiblity and reputation can't be vibe-coded.
So:
> However, 99.9999% of coding is not like that. Non-coders don't care about the code at all. They just care about outcomes. People don't care if it's "slop" if it works. Similar to bug prevalence, the optimal level of slop is not zero and will be decided by the market, not by coders.
There's a vast difference between code that works as a prototype vs how it works in production. I don't think you would trust anyone with no experience to fly a commercial plane with them vibe-coding a flight simulator without knowing the process of becoming a pilot.
But since "it works", it is ok right?
Its fundamentally how LLMs work.
Is this any different than how a PM gets a programmer to work on a project? They think, then they deliver. If given more time, maybe they deliver something better. Maybe they consult some text and try to apply a design pattern.
The LLM in this use case is perfect because almost everything involved is text based, and the model is able to take in all the expressive that is language.
Yes, it's very different. You seem to be suggesting that the current frontier LLMs, when tied to their tools and harnesses, have emergent properties that are similar to human consciousness. If you truly believe that, I'm not sure how to have a productive discussion here.
They will not save you from every pitfall, but that isn't the point; engineers walk into pitfalls all the time. This can get you in, and out, much much quicker.
Agents code extremely well.
They're not particularly good at 'architecture' and I think that's where his specific concerns about 'not being able to see the problems' arise - the issues are are almost never in the syntax, because the AI writes perfect code. The issue is that it's not doing exactly what you intended.
Instead of 'missing the target' ... it's 'hit the wrong target perfectly'.
Any senior developer working with AI daily should be able to have a baseline intuition for all of this, and would therefore reject the hyperbole of the premise 'it can't code!'.
Of course it's producing gargantuan amounts of slop - that's not because 'it can't code', that's something else entirely.
That is precisely because it can't code! Or rather, it's because it can't reason, or understand things, which in turn means it can't code. The output of LLMs is sloppy because they have no understanding of what they are doing.
People are hallucinating on this thread.
That people will say 'Code Me An App' and expect some kind of magical results, will be more common than not, but it's no way evidence that the AI can't code.
Given a sufficiently detailed prompt, the AI will produce almost whatever you ask it within a certain scale.
As sure as the sky is blue.
And it will make perfectly compilable code usually on the first prompt.
Obviously, it can code.
Obviously, it can 'synthetically reason' about the code.
You can point it an arbitrary code base and it will give a better overall assessment than most humans.
Is it fallible? Obviously. Is it limited in scope? Obviously.
In other words - they can program, and probably better than you.
I don't like being too critical but this is a really superficial post - as if either 'AI is a Software Engineer - or - It must be Fraud'
It's an extremely powerful tool that is very 'pattern oriented' and with guidance can absolutely write great code - and even across modules given the right basis.
It's also great at so many other tasks - finding bugs in big code bases, doing migrations etc.
It's not going to make very goo architectural decisions for you, and if you're doing anything novel you have to read most of the code ... but that's too be expected.
https://en.wikipedia.org/wiki/George_Hotz
In fact, he’s done several things that are truly hard, and has a well-deserved engineering reputation.
It's ridiculous to suggest that 'AI can't code' - when the entire development world has moved into agentic coding, including all of the best developers in the world, and it's yielding positive results in most scenarios.
It's a callow 'bad twitter take' the length of an article.
He's not wrong to suggest that IA is a 'stochastic mechanism' over all the code that's ever been written, but that's evidence of the mechanism, and frankly, describe how it is able to code.
And yes - organizations will misappropriate AI at scale as they do with everything.
His premise is so far out of proportion and misguided, it's tantamount to 'fake moon landing' conspiracy theory.
Careful, your bubble is showing.
In most cases, LLMs can get you 80-95% of the way, sometimes less, sometimes more. And heck, sometimes, it just gets you somewhere wrong.
But it seems everyone is arguing about whether LLMs can be perfect software engineers in isolation running in a closet, and using that to say that LLMs do not have a massive potential in other scenarios.
Sometimes, I like to imagine how much more productive most organizations could be from the things that the internet gave us, even to this day. Most companies never really do even a fraction of what is possible. That helps to ground my view of LLMs as well.
The fault dear Brutus isn't in our language models, but in ourselves.