Hacker News Clone

facundo_olanoMay 23, 2026, 10:18 PM

Author here. I'm surprised to see this surfacing now. I just wanted to clarify, since apparently the post doesn't do a good job at it, that what I discussed there is not a methodology I advocate for. The point of the post was: ok, since there are organizations mandating to maximize speed by reducing time spent on typing code (or even mandating to maximize agents usage), is there a way we can meet that requirement while still preserving the rigor somewhere else?

This was a follow up to a previous article[1] and the pair tried to express what I still think today (using AI daily at work): every time I use AI for coding, to some capacity I'm sacrificing system understanding and stability in favor of programming speed. This is not necessarily always a bad tradeoff, but I think it's important to constantly remind ourselves we are making it.

[1] https://olano.dev/blog/tactical-tornado/

khasan222May 24, 2026, 9:01 AM

For sure every time you use ai you’re sacrificing understanding if you don’t plan out and understand how exactly the ai is going to do the work you asked it to do.

The same output that is such a bad thing in this article can also be used to gain context, by making a thorough plan with your ai first, reading through the plan and proposing changes just like you would with a real developer.

You can also use this output to have the ai write a journal as well. The journal can be as detailed as possible and essentially a ledger of all of the changes your ai has made to the code. This allows not only for your teammates reviewing your pr to gain greater context, but also can be used by yourself, or even the ai itself to figure the why behind a particular implementation was done the way it was, far into the future even.

Lastly how many of us ever deploy code without actually checking the feature works e2e? I would gather not many of us do, I don’t, because even though we may have a greater understanding of the code, we can make mistakes in the code or in our logic. And I keep coming back to why would we treat llms any differently? I believe we should be spending our energy thoroughly manually testing a feature to make sure when we brainstormed we actually did get every edge case, and it works well.

sokoloffMay 24, 2026, 10:04 AM

I think most people test at least a happy path of their code end to end. I think we can all agree that your last sentence is far more aspirational than bare minimum standard practice. (“I believe we should be spending our energy thoroughly manually testing a feature to make sure when we brainstormed we actually did get every edge case, and it works well.”)

I did one small side web project by only writing spec tests and prompts and testing the results in a browser, never reading nor editing a single line of generated code. It was something for home and so low stakes, but it worked remarkably well and was much better tested than the typical 2022-era home project of mine.

khasan222May 25, 2026, 5:06 PM

My last sentence is definitely aspirational, it is how I try and go about it, but for sure I make mistakes. However your comments about writing spec tests was interesting to me.

Honestly I don’t even write tests manually because of coverage checks. Being that the coverage check is not something easily manipulated, I always tell the ai, don’t ever change configs, and make the coverage pass whatever I set it to, most times > 95%. I just tell the AI, make this coverage pass.

I find tremendous success with this technique, or anytime really I can find an objective way for the ai to test its work.

gcrMay 24, 2026, 11:59 AM

What does “much better tested” mean to you?

If you don’t read the tests to check they confer your intent or specifications, they’re more like tautologies than tests, you know?

sokoloffMay 24, 2026, 12:18 PM

I don't understand the comment.

I wrote the tests. That was how I expressed the spec and my intentions.

gcrMay 24, 2026, 8:25 PM

Ah. Most Claude users let Claude write their tests for them and I assumed you were too. Sorry.

ignoreusernamesMay 23, 2026, 10:58 PM

Don’t you think that the provider of the LLM is also a dimension on these discussions about responsibility? We often talk about the tech itself (LLM driven development) but how we access it is just as important imo. It’s either locked behind a non trivial amount of hardware (for open models) or some monopolistic driven provider entity like OpenAI or anthropic. In the provider case, it’s not really the LLM that will “own” the code, it’s the provider itself and we’ll be at the mercy of whatever pricing model they shove down our throats.

LelouBilMay 23, 2026, 11:33 PM

I don't like the premise of the article, but I agree that if you accept the premise, the contents of the articles are a good way to do it.

dapperdrakeMay 24, 2026, 3:57 PM

It gets better:

(NOT a lawyer)

Previously, liability and indemnification could be bureaucratically laundered to "engineers", because it was a huge diffuse set of people.

Now the bag is left with top of the chain for authorizing LLMs. Gia Tan went the hard way with xz. LLM-trolling is the new social engineering.

huflungdungMay 24, 2026, 7:12 PM

[dead]

UptrendaMay 23, 2026, 11:40 PM

[flagged]

AlexCoventryMay 24, 2026, 1:28 AM

He was establishing the context of The current blog post. Very unlikely that he was doing it for Google juice.

throwaw12May 23, 2026, 7:32 PM

> my first bet would be specifications and tests

You are missing another dimension how easy it would be to migrate if adding new feature hits a ceiling and LLM keeps breaking the system.

Imagine all tests are passing and code is confirming the spec, but everything is denormalized because LLM thought this was a nice idea at the beginning since no one mentioned that requirement in the spec. After a while you want to add a feature which requires normalized table and LLM keeps failing, but you also have no idea how this complex system works.

Don't forget that very very detailed spec is actually the code

iloveoofMay 23, 2026, 10:49 PM

Software engineering has always worked this way, just not to ICs.

“The LLMs produce non-deterministic output and generate code much faster than we can read it, so we can’t seriously expect to effectively review, understand, and approve every diff anymore. But that doesn’t necessarily mean we stop being rigorous, it could mean we should move rigor elsewhere.“

Direct reports, when delegated tasks by managers, product non-deterministic outputs much faster than team leads/managers can review, understand or approve every diff. Being a manager of software developers has always been a non-deterministic form of software engineering.

alabutMay 24, 2026, 12:38 AM

Simon Willison made a similar parallel recently:

https://simonwillison.net/2026/May/6/vibe-coding-and-agentic...

  “The thing that really helps me is thinking back to when I’ve worked at larger organizations where I’ve been an engineering manager. Other teams are building software that my team depends on.

  If another team hands over something and says, “hey, this is the image resize service, here’s how to use it to resize your images”... I’m not going to go and read every line of code that they wrote.

  I’m going to look at their documentation and I’m going to use it to resize some images. And then I’m going to start shipping my own features. And if I start running into problems where the image resizer thing appears to have bugs or the performance isn’t good, that’s when I might dig into their Git repositories and see what’s going on. But for the most part I treat that as a semi-black box that I don’t look at until I need to.”

simoncionMay 24, 2026, 3:30 AM

> Being a manager of software developers has always been a non-deterministic form of software engineering.

I disagree. Being a manager of programmers requires that you trust your programmers and have some way of occasionally verifying the correctness and efficacy of what they build to make sure that that trust is still properly placed.

But on top of that, the user of an LLM isn't really akin to a manager of programmers. Human programmers are responsible for what they write [0], and even the ones that only cost ~50% of a senior's total comp [1] are still going to be able to fairly reliably explain to you why they made the decisions they did, and fairly reliably be able to follow instruction. LLMs just aren't there yet, and the major LLM providers may never care to get them there.

A programmer who's using LLMs is a programmer who's using LLMs... not a manager of other programmers. I'm not going to say that the tech will never advance to that point, but it's simply not there yet.

[0] Unless management decides otherwise, of course.

[1] Nvidia's CEO recently mentioned that he'd be "deeply alarmed" if senior staff aren't spending at least half of their total compensation [2] on LLM providers, so I'm going to use that as my benchmark for "expected annual LLM spend".

[2] ...meaning that each senior programmer costs their employer at least 50% more than their total compensation....

devmorMay 23, 2026, 11:44 PM

> Being a manager of software developers has always been a non-deterministic form of software engineering.

Unless the manager is also a principal/architect, I don’t find this to be agreeable.

It’s similar to saying that you are a non-deterministic chef when you order food from a restaurant.

camgunzMay 24, 2026, 1:21 PM

No, because those direct reports can use tools to build deterministic software. LLMs can't, because they themselves are non-deterministic. They will say they did, and they will be wrong. And the LLM you have check will also say it did, and it will be wrong. Etc etc.

These things just can't be in the critical path. They are ridiculously unreliable.

insanitybitMay 25, 2026, 6:30 PM

What? Software being deterministic is not a feature of who wrote it. And how the hell is a human "deterministic"?

camgunzMay 26, 2026, 1:44 PM

At the bottom of each of these arguments is a "who is accountable" question. You can tell an LLM "hey, use Lean to verify this" or "hey, do a code review" or "hey, write some tests and run them". It might do these things, it might not. You can then tell another LLM "hey, check that LLM A did these things". It might do these things, it might not. Repeat.

You can tell a human (an IC) the same things. You can then tell another human (a manager) "hey, check that IC A did these things". So far, these are the same. But there's now a critical difference: you can then hold those people accountable if they don't. They can be in the critical path. LLMs can't. People can improve. LLMs can't. People can work together. LLMs can't.

This doesn't always matter. You don't need things like accountability or improvement or teamwork all the time. But you do in reliable software.

teaearlgraycoldMay 24, 2026, 12:14 AM

Well yes but if no humans at the company understand the code then no one is truly responsible for it.

NpovviewMay 24, 2026, 7:30 AM

what about the artifacts that were supposed to test the correctness of the code? are they passing willy nilly?

ekiddMay 24, 2026, 10:59 AM

No amount of testing will save a large program with a dogshit architecture. Roughly, this is because tests increase coverage linearly with the number of tests, but weird interactions increase exponentially with code size.

This might be fine if you're building a tiny app, or if you're building a medium-sized app that follows a strict existing architecture (like a web app consisting mostly of forms). In which case, have fun.

But if you're building something slightly novel and interesting, then Claude is surprisingly bad at architecture and taste, and it tends to "fix" problems by spewing more slop. What you need instead is actual insight that leads to simplifying principles. This, in turn, allows breaking up the exponential complexity into disciplined patterns. This allows your code complexity to scale far more slowly, allowing an essentially linear number of tests to provide coverage.

I actually download and try people's vibe-coded developer tools. And frankly, those tools are some of the worst software I've used in my life, worse than even Unix-vendor Motif implementations from the early 90s.

Like, I'm super happy that people can vibe-code themselves simple, one-off personal tools. That's incredibly empowering. But that doesn't mean you can big, novel stuff the same way without a competent human actively in the loop.

theshrike79May 24, 2026, 11:11 AM

> those tools are some of the worst software I've used in my life

Is the code bad or don't they do what they claim they do? Both are very different issues.

ekiddMay 24, 2026, 11:24 AM

They do what they claim to do maybe 20% of the time. The other 80% of the time is spent trying to figure out why they aren't working, why they corrupted their data, why they crash every 10 minutes, etc.

And I want to be clear that this isn't some non-technical novice vibe coding this garbage. This is often extremely talented developers with decades of experience who have apparently decided that they don't need to look at their code anymore.

You can get very good results out of AI agents. But mostly the people who get good results are the ones who still read the LLM output in detail, and who introduce the structure the LLMs are missing. But like I said, this distinction mostly becomes apparent past a certain size and novelty level.

theshrike79May 24, 2026, 7:15 PM

Where do you find these apps that fail to work 80% of the time?

I must be an anomaly because all of the vibe coded apps I'm running 24/7 don't keep crashing or stop suddenly working.

NpovviewMay 24, 2026, 11:13 AM

Would Antirez with LLMs make the same mistakes a novice would make? You are comparing your strongest contender with my weakest contender.

ramozMay 23, 2026, 3:43 PM

> If I had to roll out such a development process today, I’d make a standardized Markdown specification the new unit of knowledge for the software project. Product owners and engineers could initially collaborate on this spec and on test cases to enforce business rules. Those should be checked into the project repositories along with the implementing code. There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec. This specification, and not the code that materializes it, is what the team would need to understand, review, and be held accountable for.

The constant urge I have today is for some sort of spec or simpler facts to be continuously verified at any point in the development process; Something agents would need to be aware of. I agree with the blog and think it's going to become a team sport to manage these requirements. I'm going to try this out by evolving my open source tool [1] (used to review specs and code) into a bit more of a collaborative & integrated plane for product specs/facts - https://plannotator.ai/workspaces/

[1] https://github.com/backnotprop/plannotator

EFLKumoMay 24, 2026, 1:42 PM

> just like we don’t read assembly, or bytecode, or transpiled JavaScript

This makes sense since certain higher-level code produces certain lower-level code, while LLM cannot. If the transpired JS code doesn't work we could just find out the bug in minifiers, etc. but one cannot figure out why LLM fails at one task, especially considering LLMs, even SOTA ones, could be strongly affected by even small prompt changes. Taking this into consideration, I don't think this is a sound reasoning why we don't need to review ai-generated code.

> The LLMs produce non-deterministic output and generate code much faster than we can read it, so we can’t seriously expect to effectively review, understand, and approve every diff anymore.

Exactly. However, this could also indicate a weaker review standard instead of just dropping review. We could also suggest an idea where devs mainly review code design or interfaces, leveraging one's *taste*, while leaving strict logic reasoning, validating and testing to other tools or approaches. It cannot pursuade me that the nature of LLM's code generation must lead to a complete cancel of the code review.

Anyway, I'm not opposing this article and its thought of shift in the future is really good.

trimethylpurineMay 24, 2026, 1:57 PM

Couldn't we slowly add guardrails that eventually lead to code generation becoming more and more deterministic over time?

I'm seeing in my experience that Claude has become better with every version at producing uniformity in its code output. Especially where the architecture is clear and documented. And even more so in languages with built in uniformity (Go, HTMX, SQL) where there is intentionally only one or two ways of doing things. In such environments, the output is nearly deterministic.

EFLKumoMay 25, 2026, 4:10 AM

I once thought about this and found that n-shots makes greater influences on LLMs. In other words, in a repo with good code quality and architecture (which offers good n-shots) and on a task with clear instructions and goals, LLM's output seems reliable enough, which meets your opinion. And n-shots is always better than relying on instruction following, instruction following mentioned in the article ("specifications") as an approach facing LLM's productivity, so imo the idea you suggested is another probability against/comparing with the article as well.

ninalanyonMay 23, 2026, 4:08 PM

> Rework is almost free

Is it? All the electricity and capital investment in computing hardware costs real money. Is this properly reflected in the fees that AI companies charge or is venture capital propping each one up in the hope that they will kill off the competition before they run out of (usually other people's) money?

fractaledMay 24, 2026, 12:30 AM

Even ignoring the AI costs, 'rework' is going to be more expensive as soon as you have customers. For example any sort of data migration. Or UX expectations. Or public API interface. None of these can change without some thought, so one would be leaning on these specs quite a lot.

gesshaMay 23, 2026, 5:02 PM

Yeah, a lot of Claude Code users(me included) found in March if rework is free or not.

k3vinwMay 23, 2026, 8:58 PM

> We can stop reading LLM-generated code just like we don’t read assembly, or bytecode, or transpiled JavaScript; our high-level language source would now be another form of machine code

This is too weird for me. At least with programming languages I can consult the documentation and if the programming language isn’t behaving as documented, it’s obviously a defect and if you’re savvy enough you often have open channels that accept contributions. Can we say the same for Claude or other AI solutions?

SeviiMay 23, 2026, 8:59 PM

If you run a local LLM and an open source agent harness you are pretty close to that.

throawayontheMay 23, 2026, 9:20 PM

can you explain how? with a compiler you can rely on the adage "it's never a compiler bug" (until it is! and then you can fix it)

how can a local LLM with an open source agent harness provide the same trustworthiness?

zoogenyMay 23, 2026, 9:27 PM

> ... then you can fix it

I recall working on a project that used (MSVC) VC++ and a coworker found a bug in the compiler. We reported the issue to Microsoft and they eventually patched it.

You may find yourself arguing explicitly for open source dev tools if you continue down this line. There are many commercial cases where "you can fix it" does not apply to the dev toolchain and you will find yourself reliant on a provider. At that point, the trustworthiness of "compiler provider" and "local LLM provider" is the pertinent discussion (e.g. provider vs. provider instead of LLM vs compiler).

skydhashMay 24, 2026, 12:46 PM

> There are many commercial cases where "you can fix it" does not apply to the dev toolchain and you will find yourself reliant on a provider.

That’s only on the hobbyist level. On the enterprise level, there are lots of contracts involved that requires speedy bugs correction.

throawayontheMay 24, 2026, 11:06 PM

> You may find yourself arguing explicitly for open source dev tools

well sure, of course i would :) but ig i meant more so "can be fixed" in a way it can't with llms, open source or not

tyleoMay 23, 2026, 3:00 PM

The underlying mechanism is still the same: humans type and products come out.

So something which must be true if this author is right is that whatever the new language is—the thing people are typing into markdown—must be able to express the same rigor in less words than existing source code.

Otherwise the result is just legacy coding in a new programming language.

zoogenyMay 23, 2026, 7:54 PM

>... my first bet would be specifications ... and tests ... If I had to roll out such a development process today, I’d make a standardized Markdown specification the new unit of knowledge for the software project.

I've found that adopting RFC Keywords (e.g. RFC 2119 [1]; MUST, SHOULD, MAY) at least makes the LLM report satisfaction. I'd love to see a proper study on the usage of RFC keywords and their effect on compliance and effectiveness.

1. https://www.rfc-editor.org/info/rfc2119/

kortexMay 23, 2026, 10:42 PM

That's literally what OpenSpec does (https://openspec.dev/). It's quite nice. I've only exceptionally rarely seen claude do something wrong based on spec docs when it's fully spec'd out. More often it's because something wasn't nailed down and claude was forced to make assumptions.

The downside is the ospx markdown specs sometimes end up too granular, focusing on the wrong or less important details, so reading the specs feels like a slog.

Also at times aspects of the english language spec end up way more verbose than just giving a code example would be.

jonnytranMay 24, 2026, 2:44 PM

Is it time for the literate programming renaissance?

ricardobeatMay 23, 2026, 11:03 PM

> We can’t leverage agents if our unit of work is still “add a new endpoint to the RESTful API”

Why not? You just make every task faster. Not everything has to be an uncontrollable rocket launch.

> We need a virtually infinite supply of requirements, engineers acting as pseudo-product designers, owning entire streams of work

Why? To build what? You can only build as fast as you understand the business and your users.

charcircuitMay 24, 2026, 12:32 AM

>You can only build as fast as you understand the business and your users.

It should be possible to go faster by having AI understand the business and users.

arcwhiteMay 24, 2026, 12:52 AM

It doesn't do that though. Understand. That's not how LLMs work.

charcircuitMay 24, 2026, 1:45 AM

LLMs are not the only possible AI models to use and create.

paulryanrogersMay 24, 2026, 3:33 AM

Aren't they state of the art though?

DavidVoidMay 23, 2026, 7:58 PM

> Product owners and engineers could initially collaborate on this spec and on test cases to enforce business rules. Those should be checked into the project repositories along with the implementing code. There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec. This specification, and not the code that materializes it, is what the team would need to understand, review, and be held accountable for.

This just sounds like typical requirements management software (IBM DOORS for example, which has been around since the 90s).

It's kind of funny how AI evangelists keep re-discovering the need for work methods and systems that have existed for decades.

When I worked as a software developer at a big telecom company and I had no say in what the software was supposed to do, that was up to the software design people--they were the ones responsible for designing the software and defining all the requirements--I was just responsible for implementing that behavior in code.

bitwizeMay 23, 2026, 8:54 PM

Spec-driven development is basically PRIDE, the first proven commercial software methodology dating back to 1971. In fact it may be the culmination of PRIDE because PRIDE's creators realized coding wasn't the hard part; the hard part was systems analysis, determining what problem needed to be solved and what to build. Coding comes last and when you did it right, was simply a translation step.

And now that step can be 100% automated.

Information systems design was a solved problem in the 1970s. PRIDE turned it from an art into a proven, repeatable science. Programmers, afraid of losing their perceived importance, resisted the discipline it imposes as the mustang resists the bit, but now that they're going the way of buggy-whip makers, maybe systems design as a science will make a comeback after 50 years.

irishcoffeeMay 23, 2026, 8:06 PM

One of my first tasks at my first job out of college required me to learn dxl (doors extension language) and implement some really intricate requirements management features.

It was gratifying to build the confidence of learning a new language quickly that I had never even heard of before. DXL was also pretty awful.

Opened a lot of doors for me though, no pun intended.

CraigJPerryMay 24, 2026, 9:11 AM

I prefer "the bottleneck is understanding" framing.

The author is nibbling at the same problem ultimately, but i don't think "hey one strategy is we could just let cognitive debt accumulate so we can go faster!" is a particularly insightful tool in the toolbox. Don't misread me, i'm not denying it can be a valid strategy.

Instead i want to read about insightful strategies for optimising that system-wide bottleneck we have: understanding.

Tell me about how you managed to shift to a higher level of abstraction, tell me about how and when that abstraction leaks. Tell me how you reduced the amount of information that has to flow through the system bottleneck.

jmullMay 23, 2026, 4:47 PM

The lesson I've learned from our new AI age is how little a large number of people who've worked in software development their entire careers understand software development.

I suppose all the money floating around AI helps dummify everything, as people glom on to narratives, regardless of merit, that might position them to partake.

What we actually have now is the ability to bang out decent quality code really fast and cheaply.

This is massive, a huge change, one which upends numerous assumptions about the business of software development.

...and it only leaves us to work through every other aspect of software development.

The approach this article advocates is to essentially pretend none of this exists. Simple, but will rarely produce anything of value.

This paragraph from the post gives you the gist of it:

> ...we need to remove humans-in-the-loop, reduce coordination, friction, bureaucracy, and gate-keeping. We need a virtually infinite supply of requirements, engineers acting as pseudo-product designers, owning entire streams of work, with the purview to make autonomous decisions. Rework is almost free so we shouldn’t make an effort to prevent incorrect work from happening.

As if the only reason we ever had POs or designers or business teams, or built consensus between multiple people, or communicated with others, or reviewed designs and code, or tested software, was because it took individual engineers too long to bang out decent code.

AI has just gotten people completely lost. Or I guess just made it apparent they were lost the whole time?

evolve-mazMay 23, 2026, 9:28 PM

All the talking points and techniques are those which were used when pushing outsourcing: give better specs, write detailed tests, accept bad code because it works so who cares, we can just rewrite from scratch later, and my favorite "they will get better with more exposure to your code base". None of these takes is wrong, but what they neglect is doing all that work is way more effort than if I wrote the original code myself.

Using an LLM to one shot a small function (something i would do with a very specific search on Google or SO) is handy. Giving it a harness and free access to a code base leads to some terrible code, and doubling down with more instructions and agents in the loop means more time writing the rube Goldberg orchestration rather than just opening up an editor and writing code.

dasil003May 23, 2026, 7:29 PM

Yeah this article is in a real uncanny valley for me where it has some insight, but it also throws out some wild ideas that don't pass the sniff test for me.

To me what AI is doing is changing the economics of human thought, but the change is happening way faster than individuals, let along organizations can absorb the implications. What I've seen is that AI magnifies the judgment of individuals who know how to use it, and so far it's mostly software engineers who have learned to use it most effectively because they are the ones able to develop an intuition about its limitations.

The idea of removing the human from the loop is nonsense. The question is more what loops matter, and how can AI speed them up. For instance, building more prototypes and one-off hacky tools is a great use of vibe coding, changing the core architecture of your critical business apps is not. AI has simultaneously increased my ability to call bullshit, while amplifying the amount of bullshit I have to sift through.

When the dust settles I don't really see that the value or importance of reading code has changed much. The whole reason agentic coding is successful is because code provides a precise specification that is both human and machine readable. The idea that we'll move from code to some new magical form of specification is just recycling the promise of COBOL, visual programming, Microsoft Access, ColdFusion, no-code tools, etc, to simplify programming. But actually the innovations that have moved the state of the art of professional programming forward, are the same ones that make agentic coding successful.

vinnymacMay 23, 2026, 5:57 PM

I appreciate your insights in a sea of psychosis comments. I find it strange how many people think we have achieved the likes of Y2K flying cars 20 years ago, or the dream of having every car on the road be an electric fully self driving car by now (a promise made at least over a decade ago by several of these types).

The point I’m making is that we give the spotlight to people who are making absurd claims. We have not achieved the ability to remove the human from the loop and continually produce value-able outputs. Until we do, I don’t see how any of the claims made in this article are even close to anything more than simply gate-keeping slop.

ninalanyonMay 23, 2026, 6:13 PM

And if we do remove the human from the loop? What then, what are humans for? Do we get Keynes' idea that we only need to work a few hours a week or do we get a continuation and intensification of what we already have: a few high 'earners' and a sea of people struggling to make ends meet?

andrekandreMay 24, 2026, 2:01 PM

  > What then, what are humans for?

why, to make money for the boss of course!

retinarosMay 23, 2026, 7:20 PM

markdown became the language I hate the most thank to LLMs and specs-driven approach. everything feels so dumb right now in agentic coding. looping blindlessly and aimlessly until it compiles then until the playwright server or whatever devtools shows that it somehow works. push the code, have a llm autoreview/autofix,push to prod, run a mythos (perfect name) to identify the bug that opus 4.7 create. loops on loops on loops of some kind of zombie processes running to a "goal" that everyone seems to mystify in talks to just hide the fact that we do nothing anymore. the bottleneck never was code. it was the gate that was keeping away the Elizabeth Holmes and SBF from software engineering and it just opened.

abalashovMay 23, 2026, 10:09 PM

A colleague and I have taken to use of the verb "meatspin", from another era in Internet shock humour, to describe what it is that coding agents actually do 99% of the time.

wizzwizz4May 23, 2026, 1:30 PM

> There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec.

As I understand, this is an unsolved problem.

phyzix5761May 23, 2026, 5:38 PM

I wonder if with the speed of iteration with AI the industry will switch back to waterfall. Clear documentation first so the LLM can easily produce what's being asked with a round of testing before going back to the documentation stage and running it again. History does repeat itself.

yibersMay 23, 2026, 5:54 PM

We already switched

humbleharbingerMay 23, 2026, 9:16 PM

My amazon orgs leadership has been obsessed with spec driven development while individual engineers tell me the only use they have is to placate leadership. I'm tired

culiMay 23, 2026, 9:53 PM

How does spec driven development differ from test driven development?

invalidatorMay 23, 2026, 11:21 PM

TDD is done in a tight loop (minutes) while coding. For every little micro-feature/fix, you write a test for the new behavior you want, implement the minimal ugly fix to get the test to pass, then rely on the tests so you don't regress as you clean up.

LLMs struggle with TDD. They want to generate a bunch of code and tests in large passes. You can instruct them to do red/green TDD, but the results aren't great.

SDD starts before implementation, and formalizes intent and high-level design. LLMs eat it up. The humans can easily reinvent the worst parts of waterfall if they're not careful.

They're not mutually exclusive.

culiMay 24, 2026, 4:11 PM

In many frameworks the tests are referred to as the "spec". I guess that's where my confusion arises from.

> SDD starts before implementation

No different from TDD.

nullsexMay 23, 2026, 10:12 PM

[dead]

adelksMay 24, 2026, 12:57 AM

"A sufficiently precise spec is code". I've read somewhere here before.

So guardrails, i.e. sufficiently precise spec and tests, will need to be as strict as the LLM is bad at getting the right context and asking back the right questions. I suppose at that point not much difference between a human engineer and it.

niruiMay 24, 2026, 12:32 AM

> We can stop reading LLM-generated code just like we don’t read assembly, or bytecode, or transpiled JavaScript; our high-level language source would now be another form of machine code.

My opinion is very close to this. Currently the reason that it's bad to not reviewing/testing the code LLMs generated is because the LLMs can sometime generate bad codes. But it's a bug that can be improved. One day you'll have LLMs generating code consistently better than what a human could write. And then you just stop needing to review them. (And that's probably also the time where most programmers/developers got fired too)

Don't get surprised if anyday the LLMs starts to generate binaries directly. THAT will be impossible to read and costs more time to analyze.

furyofantaresMay 24, 2026, 3:20 AM

> Currently the reason that it's bad to not reviewing/testing the code LLMs generated is that the LLMs can sometime generate bad codes.

Sometimes?

I am heavily into vibe coding and I think they almost always generate bad code. At least as soon as you're distant enough from the code to call it vibe coding.

When you're still in touch with the code, have at least been recently talking to it about code rather than 100% about features, and its context is filled with good code, it can generate good code.

9029May 24, 2026, 12:45 AM

Is it possible to reason or prove the correctness of an LLM?

montroserMay 23, 2026, 1:32 PM

This could very well be a pattern that some teams evolve into. Specs are the new source -- they describe the architectural approach, as well as the business rules and user experience details. End to end tests are described here too. This all is what goes through PRs and review process, and the code becomes a build artifact.

vips7LMay 23, 2026, 3:09 PM

It just doesn’t work though. Anthropic couldn’t even get Claude to build a working C compiler which has a way better specification than any team can write and multiple reference implementations.

jelmersnoeckMay 24, 2026, 3:34 AM

> "I'd make a standardized Markdown specification the new unit of knowledge for the software project. ... There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec."

Agree, this is how you make the development loop more deterministic and ultimately autonomous. It's how I've been using coding agents myself for the past few months (by building my own to support this natively [1]).

If you have a spec you approve/agree on, have an agent code against it, and then have a review phase verify the implementation didn't drift from the spec (either by adding or removing features), you get to a position where you can trust the outcome.

There's still a lot to be said about spec definition and what if during implementation gaps are discovered, and that's where HITL comes into play.

[1] https://github.com/jelmersnoeck/forge

hombre_fatalMay 23, 2026, 4:38 PM

Yeah, this has been my process for months now.

I might even start my own blog to write about things I've found.

1. Always get the agent to create a plan file (spec). Whatever prompt you were going to yolo into the agent, do it in Plan Mode first so it creates a plan file.

2. Get agents to iterate on the plan file until it's complete and thorough. You want some sort of "/review-plan <file>" skill. You extend it over time so that the review output is better and better. For example, every finding should come with a recommended fix.

3. Once the plan is final, have an agent implement it.

4. Check the plan in with the impl commit.

The plan is the unit of work really since it encodes intent. Impl derives from it, and bugs then become a desync from intent or intent that was omitted. It's a nicer plane to work at.

From this extends more things: PRs should be plan files, not code. Impl is trivial. The hard part is the plan. The old way of deriving intent from code sucked. Why even PR code when we haven't agreed on a plan/intent?

This process also makes me think about how code implementation is just a more specific specification about what the computer should do. A plan is a higher level specification. A one-line prompt into an LLM is the highest level specification. It's kinda weird to think about.

Finally, this is why I don't have to read code anymore. Over time, my human review of the code unearthed fewer and fewer issues and corrections to the point where it felt unnecessary. I only read code these days so I can impose my preferences on it and get a feel for the system, but one day you realize that you can accumulate your preferences (like, use TDD and sum types) in your static prompt/instructions. And you're back to watching this thing write amazing code, often better than what you would have written unless you have maximum time + attention + energy + focus no matter how uninteresting the task, which you don't.

geraneumMay 23, 2026, 8:55 PM

> PRs should be plan files, not code. Impl is trivial.

Doesn’t it bother you that the outcome each PR is different every time you/CI “run it”?

hombre_fatalMay 25, 2026, 5:17 PM

No, because consider the pre-AI status quo where a human PR will come in like "Added tab support", maybe scribbles out some guiding ideas, maybe references some issue where we kinda hashed out some ideas of how it could work, and then we must derive all of the intentions/assumptions/decisions of the implementor from the PR's code changes.

Basically zero plan. Or rather, the "internal" plan that the human implementor used while writing the code is hidden from us because it's a mix of ideas they held in their head, jotted in some notes, existed in a sequence of commits that were lost when squashed into a PR, etc. There's zero reproducibility in the implementation.

So take my idea and pretend we still don't have AI yet: the main point is that we move to a pipeline where we work on a first-class plan first before we begin implementation. This gets us closer to reproducible implementation no matter who is implementing it.

It just so happens that now with implementation becoming automated, we have more attention and energy freed up to focus on this plan-based model.

moritzwarhierMay 23, 2026, 4:23 PM

Entertaining flag name!

React team seems to really have set a precedent with their "dangerouslySetInnerHTML" idea.

Or did they borrow it somewhere?

I'm just curious about that etymology, of course the idea is not universally helpful: for example, for dd CLI parameters, it would only make a mess.

But when there's a flag/option that really requires you to be vigilant and undesired the input and output and all edge cases, calling it "dangerous" is quite a feat!

wrxdMay 23, 2026, 5:23 PM

I’m pretty sure this comes from Claude code’s --dangerously-skip-permissions

saulpwMay 23, 2026, 5:36 PM

which sounds like it came from React's "dangerouslySetInnerHTML", per the comment you replied to.

brabelMay 23, 2026, 8:17 PM

I think people used similar prefixes for a long time. For example, Haskell has had `unsafePerformIO` since the 90's... and MSFT's Hungarian notation was also similar, though it used abbreviations for things like "unsafe" (not "dangerous"). Perhaps React was the most famous case of using "dangerously" though.

culiMay 23, 2026, 9:56 PM

"unsafe" seems quite different from the "dangerously [...]" phrasal template. I don't think it's a stretch to suppose it was inspired by React. Still waiting for this one to catch on:

  React.__SECRET_INTERNALS_DO_NOT_USE_OR_YOU_WILL_BE_FIRED

https://github.com/reactjs/react.dev/issues/3896

urbandw311erMay 23, 2026, 10:42 PM

This is sometimes exposed in front end browser code and I had an actual (non technical) end user email our support team last month asking if it was something they should be concerned about! God knows how they found it, I suspect everyone is now an AI-enabled expert at these things…

culiMay 24, 2026, 2:04 AM

Oh I've stumbled upon it myself while coding. You can ctrl/cmd-click into any React type and it will take you to its explicit source definition. I'm not sure if it's still the case but they used to have all of the types organized into a single file. Since I was the go-to TypeScript person at one job I had I made sure to familiarize myself with every type (less than you'd expect)

moritzwarhierMay 25, 2026, 8:14 PM

The point of that scary flag name was the stance that source maps should never be deployed in production.

There are stances that say they should, browse a large SPA with complex working source maps enabled, DevTools open, cache disabled and a long session (relevant because of HMR in dev), and you can see why this matters.

Browsers only fetch and process source maps in a development environment in production, that's why this flag name exists.

That being said, I still have a hobby project with an (in my opinion) sensible (at the time) Webpack configuration, and glossed over this being in the minified bundle, after 1-2 days at the time.

But if my hobby project would have been something production-relevant, I'd have continued to hunt down this artifact.

I think, with Vite et al this should not appear anymore in current JS bundles ready for prod, so the name is apt.

But the underlying problem is still a neverending source of frustration: minification is (by definition, when it's statically verifiable), not equipped to change object property names without provoking breakage.

stuaxoMay 24, 2026, 9:48 AM

In short:

We will have code full of unknown bugs, that is unfixable.

The solution is to replace it with more of the same but with some new specification (fix some bug add some new feature).

And this will be done by using astounding amounts of compute in massive new data centres.

debesylaMay 23, 2026, 3:55 PM

I found that adding "philosophy" descriptions help guide the tooling. No specs, just general vibes what's the point, because we can't make everyone happy and it's not a goal of a good tool (I believe).

Technology, implementation may change, but general point of "why!?" stays.

blacobMay 23, 2026, 11:36 PM

> Then where does the rigor go? Similar to the Thoughtworks report, my first bet would be specifications (which is not the same as prompts) and tests (which is not the same as TDD).

This is what we're building for at Saldor (https://saldor.com). It's a hard problem, to get a team in the habit of writing good specs. Probably because it's a hard thing to do: thinking of the behavior of your program, especially at the edges. But I agree (biased) that this is probably the way forward for writing code in the near future. I'm excited to see other people thinking about it.

TerrettaMay 24, 2026, 3:32 AM

Saldor pitch is on point.

I have team do this using CLAUDE.md telling Claude to do it in a set of interconnected steps, but in brief: they are to make it write every aspect of transcript somewhere: PRD, research notes, spec, dev log and debate log, break/fix/retro notes, commit log, PR, release notes, README, docs .mds... heavy emphasis on the edges in our thinking, and just as important, the edges in its ability to provide good leverage.

It needs a core set of guidance on the ordering and how to write "as of" a given phase or release so context stays current, trusting the old info is in git history it can navigate for the story of how we got here.

CC's /insights claims I have 10:1 md edits to code edits, and we both note this way of working is resulting in far fewer error loops per higher quality outcome.

// So yes, interested in your product. Baking something more broadly battle tested in so we don't have to reinvent it makes sense.

QuantumNoodleMay 24, 2026, 11:39 PM

It's too early for me to have a firm opinion one way or another.

Just a data point: this month I had a knarly bug in generated bpf code. The C language was correct but the compiler produced a bug that corrupted packets. I spent around 8 hours debugging _where_ the issue is and how to work around, never really understanding what went wrong. That knowledge came with several more days on and off looking at it--after I had mitigated the production issue.

So if I extrapolate this experience to LLMs (who are not deterministic) and who will make larger systems. What we trade for velocity we will pay for with hours of debugging because we won't understand how things work. I think this is unavoidable.

Another way I'm looking at it: after some time of not writing code, it will be analogous to instructing the LLM and the output being assembly--where I simply don't have the muscle to grok the output. How do I mitigate that knowledge gap? I see micro serves coming back. Today it is easy to slop up disposable scripts. Our services need to be modular so we can dispose of broken things--so they are only coupled with each other by strict APIs.

immanuwellMay 24, 2026, 8:34 AM

it's the most honest framing I've seen, but specs as the new source of truth is exactly what we promised ourselves with UML, then WSDL, then OpenAPI. the graveyard of just make the artifact above the code authoritative is long

Ozzie-DMay 23, 2026, 7:25 PM

the irony is that AI is making this exact problem worse. ppl are generating entire codebases now without reading any of it -- the flag might as well be the default. the skill thats actually becoming scarce isnt writing code, its reading code you didnt write and knowing if its correct.

nlitenedMay 24, 2026, 12:03 PM

I feel like people who program in JavaScript or whose projects pull megabytes of dependencies, don’t get a moral right to complain about this. You guys just sit and calm down this time, you already said what you could.

Your app takes 20 seconds to load, pulling 50 megabytes of minified JS. Your backend is a mess of 20 Rust microservices, 300 megabytes docker image each.

Nobody has actually been reading and understanding code in your org for the past 15 years. And nobody has ever been responsible, everybody has just been job hopping for a 15% total comp bump.

Now the secret is out.

okandshipMay 24, 2026, 9:02 AM

making the review artifact explicit feels like the part teams skip

testplzignoreMay 23, 2026, 4:52 PM

> Product owners and engineers could initially collaborate on this spec and on test cases to enforce business rules.

LOL. I had to check if this was published on April 1st.

UptrendaMay 23, 2026, 10:13 PM

Does this post mark the top of the hype train or is there still more to come?

0xpgmMay 24, 2026, 12:20 PM

Still more to come I think. Until all the major AI companies IPO starting this year.

farmerbbMay 23, 2026, 7:03 PM

I legit can't tell if this article is satire, or not.

EcysMay 23, 2026, 1:34 PM

very true. and we already know and agree with this.

user experience/what the app actually does >>> actually implementing it.

elon musk said this a looong time ago. we move from layer 1 (coding, how do we implement this?) to layer 2 thinking (what should the code do? what do we code? should we implement this? (what to code to get the most money?))

this is basic knowledge

duskdozerMay 23, 2026, 3:43 PM

Elon Musk has been saying Teslas would have fully autonomous self-driving within 1-3 years since 2013

vinnymacMay 23, 2026, 6:13 PM

I left a similar comment elsewhere in this thread. I still remember when so many people hallucinated that we would suddenly have flying cars by 2002 at the latest. If we achieve several more major improvements on current technology, these thoughts are interesting to consider. But not before that occurs.

We need the pragmatic engineer more than ever.

lesscodeMay 23, 2026, 5:15 PM

Instead of accepting 20,000 lines of slop per PR (and never-ending combinatorial complexity), maybe we should aim to think about abstractions and how to steer LLMs to generate code similar to that of a skilled human developer. Then it could actually be a maintainable artifact by humans and LLMs alike.

crnkofeMay 23, 2026, 8:16 PM

I don't get why every AI article is so hyper-focused on coding speed. If the coding is so fast doesn't it make sense to invest more time into quality, learning, documentation, testing refactoring, making a better product? I'm beginning to think that the slopcoders are evaluated by kLOCs of lines written in addition to LLM token usage and they're just maximising the measured metrics. Whether that actually ends up in production or is used by any real person is seemingly irrelevant. Likely the more bugs that are produced the more agents can be spun in parallel to simulate busywork.

donbventuresMay 23, 2026, 9:54 PM

[flagged]

3vo-aiMay 24, 2026, 8:27 AM

[flagged]

dundunUpMay 24, 2026, 1:14 AM

[flagged]

fijiolMay 23, 2026, 4:55 PM

[flagged]

--dangerously-skip-reading-code

Comments

-​-dangerously-skip-reading-code

Comments

--dangerously-skip-reading-code