I love ruby - but surely it's closer to Perl and "There's more than one way to do it" - than python which generally strives for "There should be one-- and preferably only one --obvious way to do it."?
I tried to run a Ruby script recently and got an error about Fixnum. Apparently they made some breaking change to how integer types are referenced in version 3. I had to modify the script to get it to work on a modern parser. How is this not equivalent to the Python 2-3 jump? I don't know the first thing about Ruby but this already told me that it's a language with breaking changes between versions.
(It was the ruby scripts here if anyone is curious: https://github.com/haberman/vtparse/ )
The Rails apps I’ve programmed with LLMs seem to work a LOT better than arbitrary python or ruby or JavaScript apps. I chalk that up to “there are a gazillion examples of omiauth in Rails that the LLM can’t really stray off the path. It just works.”
That means I let the agent do things the way it wants to, not because I have a preference. So we’re using turbo and Hotwire and whatever it is it’s doing. And I’m using React for some other problems. Not because I know React, but because the LLM does.
In golf it is said to “let the club do the work”. Over control leads to disaster. Same with LLMs. Not saying let it do whatever but if there are widely baked in conventions you’ll be far better off letting those do the work.
Correct in that Ruby never had a schism and is still massively productive and wideley deployed (e.g. Shopify + Stripe alone represent billions/trillions of dollars through Ruby hotpaths).
Python's general lack of success in this domain is telling and embodies whats I was trying to communicate in the article -- languages with low entropy in syntax, features, ecosystem, and toolchain compound slowly.
I doubt Python is significantly better.
The problem isn't that there's more than one way to write working code.
It's that there's infinite ways to write working code, and nearly all of them are bad, and any dynamically typed language allows for a lot of type slop that you don't want.
There's a lot more opportunity for LLMs to write terrible working code in Ruby or Python than there is in Rust, for example.
I posit that Rust is the optimal language to emit from LLMs unless you have to target web, a specific platform, or a legacy project:
- The required error handling for Option<T>, Result<T,E>, and required destructuring of sum types naturally reduces errors by an order of magnitude
- If it compiles, chances are higher the code is correct. Especially if you're using strong typing.
- The training data for Rust is likely of a higher quality than, say, Javascript
- The resulting code is fast and portable
- You get really nice threading and async, and you don't have to think about the silly "color problem" because the LLM handles it for you.
- Using an LLM takes away any trouble you'd have with the borrow checker or refactoring, or otherwise working in a slightly more difficult language.
- Applications are single binary executables.
Since LLMs let you generate and manipulate Rust code as fast as you would Python, why not just emit Rust instead? It's the least brittle language, and it's incredibly performant.
What would you suggest is optimal for targeting web?
Usually once the project is already established and has good patterns, both Claude and GPT will continue the good patterns, but you may still want to add a review pass step to remove bad practices (usually hacks around concurrency instead of doing it properly).
Most models come up with the least effective solutions when writing Python.
For one-shot responses, the majority of failures are environmental/syntax, which naturally favors interpreted languages. For longer agentic coding sessions, models solve the environment issues quickly and it becomes a fair comparison of who comes up with the smarter solution. You can filter for that here: https://gertlabs.com/rankings?mode=agentic_coding
Very little difference between TypeScript and JavaScript, which are essentially the same language, just one has more tokens.
Functional languages like Clojure and OCaml are pretty dense, I would have expected them to feature lower.
Kotlin is in some ways a more token dense version of Java, yet Kotlin leads, and Java is almost last.
This article doesn't seem to mention it either https://martinalderson.com/posts/which-programming-languages...
even though states Clojure to be the most token efficient. I personally, honestly don't care much. In my opinion (using LLMs with multiple different languages), specifics of PLs don't matter to the point of stating a "clear winner". It's not the language that matters but the "stories you tell" with the language. And greatest stories sometimes told in the languages people long forgotten.
I wonder how LLMs would do with something like an image based system - it seems like you could pin the image and get a perfectly reproduced environment to get the LLM to make changes to, each time.
I'm sure a REPL helps out with this sort of thing quite a bit!
How do you work with LLM and repl?
Smalltalk has system images - which AFAIK clojure lacks (as does python, ruby).
I wonder if it would be possible to pair python ZODB with storing python code alongside the pickled objects... And effectively create an unholy image-like workflow with IPython and ZODB?
But at any rate, I was more curious about how you mix repl, clojure and LLMs in practice?
The parallelism issue in particular was also not something I noticed agent struggling with in JavaScript, although JavaScript concurrency model is clearly fundamentally different.
The concurrency issues that I saw LMM‘s face was one reason why I created freelang which uses a very boring and audible concurrency model of OS processes that use the file system to talk instead of IPC, shared state, or anything like that. Higher overhead, lower throughput, but more boring and hopefully less bugs: https://github.com/DO-SAY-GO/freelang
The more assumptions I can move to compile time the better models are at dealing with emerging complexity.
I would go the other way with LLMs and I wish for liquid types and effects in Rust to make type specifications even more strict.
P.S. effects and liquid types and type specifications in general add a lot of busywork, but models have higher level of tolerance to busywork compared to developers.
Which is why I think it's silly to suggest creating a new language "for agents". Unless one or more of the frontier AI companies commit to creating a language and the training corpus for a new language, there's no good way to bootstrap a language that is ideal for agents. You need the huge pile of high quality code as a prerequisite for a language being good for agents. And, the argument applies similarly poorly for some language that looks like it has a good shape for agents, if it doesn't have a lot of human written code from the past decade or whatever. It's not a good language for agents unless agents already know the language really, really, well because of a huge pile of code in that language in its training data.
In the OP article, they mention 'don't need to worry about thread cause the concept does not exist' - well, & does not exist in Python.
Those things are related to low-level computational issues (memory management) not problems space issues (moving money, transcoding the file, checking the spreadsheet), so a lot of &/&mut etc. and all that extra thinking slows down AI for the same reason it slows down you and I.
In particular, building in rust requires us to think a bit different about how we create the program in the first place and I don't think AI is very good at architecture yet.
Probably ... eventually none of this will really matter though, it will just be like 'compiler pedantry' for the small number of people who work on those things.
Similarly there is an important semantic difference between something you can change and something you shouldn’t change. Again - this can be used to express business logic constraints, similarly to how you can use static types to enforce other properties like e.g. „age must be a number”.
While you may say you can get away with just sharing references everywhere like Java or JS do, and not care about immutability, that’s like saying the only type you need is string and hashmap and you can code everything. And you just document in comments when strings contain numbers. I saw code like that in PHP once. Fun.
Therefore the best language for agents is likely the one that, on one hand erases all irrelevant details (ie. raises the level of abstraction and does not force focusing on eg. memory management), and on the other hand encodes any domain-relevant details in the code (eg. using advanced type systems, annotations, contracts, spec-like tests eg. property-based).
Human readability is a separate concern and still relevant, but the two mentioned properties actually generally improve on that as well (at least for engineers persistent enough to scale the tower of abstraction).
Based on this, it seems Go is certainly not that "agent endgame" language. It has large amounts of boilerplate, a general lack of safety around concurrency features, a pretty middling static safety story overall with a generally underpowered type system.
I don't think the perfect language exists, yet, but just wildly imagining, it would probably be something like a cross between Scala, Elixir and Lean (or equivalents). Unfortunately none of those languages also have the large training corpus required to make them perform well in all agenting engineering situations (yet).
For any language comparison, one must separate the expressiveness of the language, which limits the long-term possibilities for agents, and the training corpus, which is what mostly gives it the current standing. I think we are still in the phase where the languages are separated by essentially random non-design factors such as the amount of training environments the frontier labs are willing to create for them.
Given that, the syntax does not matter all that much, as long as the base language itself is flexible enough - as a another wild idea, it's also possible that eg. Python could mostly swallow all these features through external tools (eg. the pre-existing type checkers or linters), and if the frontier labs bother to RL on those tools, that would also work (see also: Mojo).
And yeah, if Clojure had a better static safety story, it would actually rank high, since it's high-abstraction with metaprogramming capabilities, excellent runtime specifications, and has a good "garden path" ie. natural code is good code.
(As a devil's advocate though, REPLs can be replaced with gluing together scripts with ad-hoc state/caching via the file system, which LLMs seem to be pretty good at already...)
So as an addendum, introspectability is also important ie. allowing to discover the state and composition of the system at any moment - though that's partially a language (eg. reflection) and partially a tooling (eg. debuggers) issue.
Here's one of many practical examples I can give you - my WM on Mac is Yabai, it is hooked up to Hammerspoon, which can be scripted with Lua, which means I can use Fennel, which means I can have Lispy REPL. And from here, it is a bit difficult to explain the difference. The challenge is that the magic is invisible to people who haven't felt it.
The key insight is "the image vs the file". In most languages, your program is a description that gets turned into a running thing. The REPL is bolted on - a convenience wrapper around that same lint/compile/run/restore-the-state cycle. Python, Ruby, Lua, C#, etc. REPLs work that way. You're still fundamentally working with files that produce processes.
In a Lisp, the running system is the environment. There's no gap between "the code" and "the live thing". When I connect to Hammerspoon's Lua runtime via Fennel, I'm not sending scripts to a subprocess - I'm reaching into a living system and reshaping it while it runs.
The missing vocabulary here is "liveness", not "fast feedback" - that's a pale shadow of it. Liveness means the environment has no opinion about what's "done" versus "in progress". Everything is always mid-flight and accessible.
So I can actually reach out in live REPL session to let's say Slack app window and extract the data about every single element in the app, get the content, compare and continuously reiterate, without having to restart anything, without even saving the code - just pure data extraction without compiling, dealing with state changes, etc. I can interactively move the window, resize it, hide, or maximize it - all that programmatically. Imagine DevTools on steroids, only it works for everything, not just web apps.
Learning Lisp and Clojure allowed me to truly experience the genuine joy of programming, because it makes it feel like you're playing a video game. And now, can you even imagine what happens when you open up access to all that awesomeness and grant it to an LLM? Most people have zero idea what it feels and looks like, when you can point an agent to a REPL running in a k8s cluster and it introspects things on the fly, while you let another agent poke through the UI and they work as a team to fix something or develop a new feature.
> if Clojure had a better static safety story, it would actually rank high
This framing reveals an assumption - the primary value of a type system is catching errors early, and that Clojure is just a dynamically typed language that would be improved by adding that. In a Lisp image, "early" and "late" barely exist as meaningful categories. The feedback loop isn't compile-time vs runtime - it's just... now. You evaluate a form and you know immediately. The error is right there, in context, with the live data that caused it.
Static types are, in a real sense, a compensation for the gap I just described - the gap between the description and the running thing. When you can't easily inspect or reshape the live system, you want the compiler to tell you as much as possible before you cross that gap.
Clojure has Spec (which can do things most other type systems would struggle to express), it has instrumentation, it has a rich data inspection story. You said "debugger", Clojure has Flowstorm, which is one of the best debugger experiences I have ever encountered in any language, and I have used more than a few.
I think this underrates static typing. For me the biggest value add of a static type system is that while doing a big, breaking-change refactor, I can near-instantly see all the places I need to update callers. Getting the code to work in the place I was actually working on is easy, I was already focused there. Static types pay off by helping me know when my change broke other parts of the system I wasn't even thinking about.
What the static typing camp can't acknowledge is that not all dynamically typed languages are equal. I agree with your sentiment e.g., on Python codebases - it's legit pain, but PLs like Clojure and Elixir are a different story.
Data-orientation flattens the call graph. Most Clojure functions take and return plain maps/vectors/seqs. A "breaking change" to a data shape doesn't ripple through type signatures across the codebase the way it does when every layer has its own nominal type. You change the shape at the edges (parsing, validation via Spec/Malli) and most intermediate code keeps working.
Fewer, more general functions. map, filter, reduce, get-in, update, etc. replace dozens of bespoke typed methods. There's just less surface area to break. But then again, I would say `(map)`, and someone familiar with Javascript's `map()` would have some wrong assumptions.
You really just can't evaluate ANY language by picking a single aspect of it without wholly understanding the holistic picture in practical, battleground scenarios. Yes, Clojure is dynamic, but it doesn't mean it's harder to maintain or to more difficult to build with, or you just can't write robust software in it. It genuinely has qualities that shine in some domains.
My claim wasn't "static types are useless for refactoring" - it was that they're a compensation for the gap between description and running system. Your refactor scenario fits that exactly: you need the compiler to tell you about distant breakage because you can't cheaply ask the live system "who calls this, and does it still make sense?"
In a live image, that question is answerable directly. find-usages, instrument the function, exercise the path, watch what flows through. The "places I wasn't thinking about" announce themselves the moment they're touched, with real data in hand - not as a list of locations I now have to go read and reason about cold.
This is annoying but only needs to be solved once at the start, either by the LLM or the human guiding it. A single prompt of "Set up a uv project in this directory with Python 3.13" is enough that it's never an issue again for that repo.
> Goroutines are a far more tractable primitive for coding agents than threads, callbacks, async/await, or any of the colored-function regimes that dominate elsewhere.
I disagree with this. Goroutines, along with threads, callbacks, and traditional async, are all in the same category: spaghetti of unbounded background tasks. Structured concurrency [1] on the other hand is dramatically easier to reason about. Python has support for this (in Trio and asyncio.TaskGroup) as do other languages like Kotlin and Swift. Function colouring a red herring; if anything, it's useful because it highlights the scheduling/cancellation points in your code.
[1] https://vorpus.org/blog/notes-on-structured-concurrency-or-g...
-----
This really does read as "Go is my favourite language". In fairness, that's a good reason to choose a language to use with an LLM (so long as it's powerful enough and not too obscure). But let's not pretend it's the best language for everyone.
Because it always is that.
People advocating for boring languages always advocate for their boring language. For instance, if you tell a gopher that you agree with the point, and therefore the project is going to use java, they won’t be happy about it.
> everyone knows goto was bad.
Absolutely hard disagree. You can write extremely clean and resilient C with C89, goto, and a handful of rules. Telling people `goto` is bad is how we get shitty C programs and paradigms where goto would have been better.
Goto isn't bad, its misuse is bad. Beginners will write shit code regardless of whether you tell them they can or can't use goto. That's also exactly what Dijkstra was arguing, if you read past the much misquoted "goto considered harmful", which he never said (it was an editorialized title, and not even the full version).
> Absolutely hard disagree. You can write extremely clean and resilient C with C89, goto, and a handful of rules.
That's a different goto. The one in C89 can only jump around within functions, but the article is talking about goto that can jump between any two points in the whole codebase arbitrarily. It stresses that point a bit more later on in the article, but you can already see it from the FLOW-MATIC code quoted above (which doesn't even have functions).
Your point actually still stands: it's theoretically possible to write clean code using even the more general goto. (Probably by building abstractions with it like "function" and "for loop".) But would you be happy doing that with someone else - or especially with a coding agent? It's better that the "handful of rules" are enforced by the language, in my opinion.
---
Edit:
> That's also exactly what Dijkstra was arguing, if you read past the much misquoted "goto considered harmful", which he never said
I just re-read the original "GOTO considered harmful" article (it's short and clear) and, while the title might not have been his, Dijkstra was definitely making a very plain argument that goto is bad for everyone and should be scrapped. He says in the introduction:
> I [have] become convinced that the go to statement should be abolished from all "higher level" programming languages (i.e. everything except, perhaps, plain machine code).
And in the conclusion:
> The go to statement as it stands is just too primitive; it is too much an invitation to make a mess of one's program.
// Syntax: { ...; y = go_to state1(x, ...); }
// Meaning: Cross-codebase GOTO w/continuation values
// Implementation: tail call
#DEFINE go_to return
// Syntax: { ... y = go_do state2(x, ...); ... }
// Meaning: Cross-codebase sub-task/sub-state
// Implementation: normal call
#DEFINE go_do
// Syntax: { ... go_terminate(y); }
// Meaning: State machine termination
// Implementation: normal return
#DEFINE go_terminate return
// Syntax: int state3 state(int x, ...) { ... }
// Meaning: Structured state definition
// Implementation: normal function
#DEFINE state
// SYNTAX: if (GOTO_NOT_HARMFUL) { ... };
// Meaning: GOTO is now cleaned up
// Derivation: Achieved
#DEFINE GOTO_NOT_HARMFUL true
Example: int state1 state() { ...; go_to state2(m); }
int state2 state(int m) { ...; y = go_do substate2a(); go_to state3(); }
int substate2a state() { ... ; go_to substate2b(q); }
int substate2b state() { ... ; go_terminate(q); }
int state3 state() { ...; switch (...) { case 1: go_to state4(v); case 2: go_to state5(); ...} }
int state4 state(int v) { ...; go_terminate(r); }
int state5 state() { ...; go_to state3(); } // State cycle
So now you can have your #include <go_to>EDIT: Compressed/cleaned up my mess
Any time you see anyone overly fixating on "function coloring" for any context other than ancient versions of Javascript it's a clue that the speaker has no idea what they're talking about.
Goroutines are literally threads. Yeah, this really is a "go is my fav" article.
This could be because our go code is typically smaller more defined services but I don't really believe that since even the isolated python services are pretty spaghetti looking.
You can be pedantic and say they aren't technically threads but that doesn't really matter from a programming perspective.
> You can be pedantic and say they aren't technically threads but that doesn't really matter from a programming perspective.
They are technically threads: they are independently scheduled, concurrent units of execution sharing an address space. They're just not OS (or kernel) threads. Hell, technically userspace threads (generally cooperatively scheduled) are the original, they predate kernel threads by a decade or two.
That being said the whole `tractable primitive` thing used in the article sounds somewhat sloppy to me. I don't quite get it. Yeah, they could be easier for an agent to write than async/await, but threads are also trivial in that matter, and you'd still need a mutex with go routines.
This isn't true. I'm mostly using python and UV with claude and it periodically decides to try to run scripts directly instead of using UV.
in your mind, you think a harness and prompt is sufficient framing to keep the LLM output to design goals. but no matter your context size, as it grows, anomolous gradients appear that try to normalize competing patterns of development.
the only real way is directly training out unwanted crosswise options.
https://github.com/Tencent-Hunyuan/AutoCodeBenchmark/blob/ma...
Each process (with the Node runtime engine) consumes about 50–100 MB of RAM. One of the first things I tried was using large language models to help port them over to Go. Since the model has a point of reference (the original Node program plus its unit tests) it’s been easy to use a test-driven approach and ensure the Go version maintains parity with the originals.
Memory usage usually drops from around ~50 MB to about ~5 MB per process, which really adds up when you’ve got a dozen of these programs running at once.
There's a lot of stuff in Python's favor in regard to coding with LLMs: its wildly popular so there's a lot of references for the right and wrong ways to use it, it can be typed using included libraries - its as simple as telling the LLM "use typing for this", and there are several great lint and unit testing tools to cover the hallucinations and poor decisions. The flexibility seems like an advantage to me personally, but I've always been a Python stan.
It’s increasingly obvious that whole swaths of developers will just continue using the language they did before LLMs “just cause”
It’s more identity based at this point. My LLMs write Rust for me and I couldn’t tell you the difference outside of it being way faster and more reliable
Rust is a language I would like to adopt longterm, but its not one I can easily grok and so my output would be worse for it.
From what I can tell, LLMs know/use patterns above the syntax and idioms of specific languages and the syntax and idioms of specific languages and how to apply the former to the latter.
The bottleneck isn't what languages the LLM can handle, but what I can handle coming out of the LLM. The general advice, then, is to use the language (and related setup/environment) you're familiar with.
But for the rest of us who aren't vibe coding, but pairing with LLMs and actively steer them, correct things and iterate while reviewing the code, design and architecture deeply then yes, I agree a lot, matters more that you're familiar with the language than the LLM, they can pick up new programming languages in a message so doesn't really matter, knowledge seems to come from programming, not locked into a specific language's syntax.
I'm biased, I preferred it this way before AI. But even so I think there is real merit. Firm guardrails and clear feedback seem to benefit AI.
Anecdotally, the worst AI performance I've seen was with gdscript, which is basically python minus the huge corpus of training data. Best results I'm getting with rust, which is in the opposite end of the strictness spectrum.
Every single one I ask always happily says yes, and starts claiming C# has local classes (a feature of Java)
The compiler is incredibly helpful because it catches errors and gives clear explanations and the LLM can iterate over it. I’ve also added the elm-review package with the default configuration, which is fantastic for ensuring code quality.
I have worked extending the Elm compiler and both Opus 4.6, GPT 5.4 and GLM 5 had no issues both with the Elm compiler (written in Haskell) and my extended Elm.
I didn't see them hallucinate much, not more than on mainstream languages.
Times change, and I work more in R&D space than on legacy codebases, but I still ask it to write something in Python then convert it to the actual language on occasion. I don't know if I'm tricking the context window, forcing alternate pathways, or both, but it works.
My experience with LLMs is that they perform best in one of two modes - either one carefully scoped context or translating between two different contexts without modification - so this modality lines up with that fairly nicely: think in the programming language the LLM thinks "best" in and then translate that to the one you want.
That said, there's often enough structural and conceptual differences between languages that a direct "transliteration" between, say, Python and Go is going to result in some fairly crummy Go, so I'm curious what you see in terms of the fidelity of that translation - do you mostly get "Python written in Go," or does the LLM really do a proper conversion from one language to the other?
I don't really buy the intuition (aka Goroutines are more 'clear' than 'coloured' functions or threads), and there's no evidence presented for this either.
Although this could very well be true, I'm doubtful without seeing some real world data points.
The 'general premise' aka 'cosine similarity' may have been true before bit it may not be that anymore.
AI just pretty good at anything it's 'seen enough' and that's it, I think it's more likely a 'threshold' problem than an ability problem, at least for most things.
'Rust' may represent a different domain, given the very detailed nature of notation and the vast possibilities that arise from that.
In the last year or so, I have been using LLMs, to assist my work, with generally, excellent results.
I have noticed that the LLM delivers much better PHP, than Swift. I seldom need to rewrite or correct, the PHP code I get from it, and am constantly correcting the Swift. Part of the reason, may be that I am a much better Swift programmer, than PHP programmer, and there’s just a lot more Swift code. I haven’t really taken the time to analyze it.
I have my theories, as to why, but it’s not something I’m really into researching. I’ve just noted the trend.
If you ever have the time and inclination to try Axiom (https://charleswiltgen.github.io/Axiom/), I'd really appreciate knowing if you feel it quantitatively changes the Swift experience with your LLM/coding harness of choice, especially in regards to Swift concurrency.
Honestly, not sure that I'll be able to use it, right now (no telling, in the future). Looks like you really did a good job, though!
And UI code quality tends to be technically pretty crummy/low-discipline. Your UI code doesn't need much consideration around data races, for example.
A lot of the code I need to tear out, looks like that.
Most of the code I write is UI. It's actually fairly intense work, but relies on the underlying SDK, rather than language tricks.
I find the UIKit code I get, is a lot more robust and performant, than the SwiftUI code.
Hard not to think that's a major part of it. IME you make loads more corrections in languages you're more opinionated about (and opinionated usually follows more experience & confidence).
I correct AI Python all the time. When it cranks out TypeScript I just check it works.
I feel that the reason posited by another poster is more likely. There's a ton of mature, well-written, shipping, PHP out there, due to the open nature of most PHP, as opposed to Swift; where the more robust and mature implementation is likely behind proprietary walls. Most of the public Swift that I see, tends to be folks showing off its fancier features, in relatively small code samples.
We have decades of compiler research, static code analysis etc, why do these extremely complicated black boxes of billions of parameters have to produce readable source code as their main output?
Presumably because LLMs are trained on corpora read, and for now still probably mostly written, by humans, rather than on corpora consisting mostly of ASTs or graphs?
Without any typechecking, LLMs obviously find it harder to work agentically and validate their work.
With too much typechecking (I'm looking at you, rust), I've found agents get themselves stuck in local "architectural minima" and end up doing insane shit to mitigate ownership/borrow-checker issues inherent in the design they ended up with.
That said, if you're hands-on I think rust is a fantastic language for pairing with an LLM.
So I think the author is saying that go is a simple language that tends to have less solutions to the same problem. I personally agree to that to a degree.
What I don't agree on is that we can choose what "low variance" is. There is a lot of go code out there, it's shape may have little "noise", but the variance is massive.
Or you hire a team of specialists for the language you want. Perhaps niche languages should have fine-tuned LLMs in the same way.
I've also been doing quite a bit of Rust for web services and wasm targets, which has worked exceedingly well... similarly with Tokio + Axum, etc.
I have seen very few issues with either of the above... that said, C# has been a bit more painful by comparison... I often rely on FastEndpoints for services and Grate for database migrations, and LLMs often get a bit tangled with those libraries in practice.
But as someone who is working in python since ages - I guess it is pretty much easy too, and as not as hard as you described. LOL, but whatever, your this post was really amazing
On the other hand, even if that were true, I don’t know how important it would actually be since LLMs can generalise across languages well.
It might be best to pick languages where it’s just harder to screw up, the canonical example being to prefer typescript over JavaScript.
This is another way of saying that the tools you equip the LLM affect their effectiveness, in other words, the harness you build around them matters and matters a lot.
At the end of the day, the language you pick, enriches the harness with the toolchain, libraries etc. it offers. This is most evident with the toolchain as the author mentioned but if you think about it, picking a specific framework that constraints the choices the model can make (e.g. the Ruby on Rails example) is also affecting the behavior it has.
The best language I have seen an LLM use was Kotlin. It actually surprised me how well it wrote the language. I wrote a project in it and I think I didn't have to correct it once. Like I was seriously impressed. I just wish Kotlin had better tooling so I didn't have to use gradle or maven lol.
Additionally, fault-tolerant languages such as Erlang/Elixir allow me to not worry about the billions of edge-cases, and let Claude aggressively implement a mostly good-enough application. With LLMs, accepting a limited amount of failure may be a necessity (depending on the business/domain), and that's exactly what the BEAM enables.
They seem quite good at figuring this out in my experience
It would be an interesting language, had it been released at the time of any of its influences, Oberon in 1987, Limbo in 1995.
Back when the type system ideas from CLU, Standard ML, Cedar were still taking off among industrial programming languages.
That generates plenty of excitement.
It was intentionally designed for programmers with limited skill.
Go language creator Rob Pike:
> The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt.
No. That is not true. It was designed as a language so programmers of all levels can be productive at the scale of what Google does and across possibly many different teams, no matter your prior background. Google does a lot at scale and a language that is easy to pick up and handles concurrency seamlessly is definitely a helpful tool.
Thanks to Docker pivoting from Python into Go, and Kubernetes from Java into Go, while it was still pre-1.0, it managed to take off, and has more users outside Google than at Google itself, where Java, Kotlin, C++, Python still dominate most projects outside Kubernetes ecosystem.
There is a certain irony that Google would need a language like Go, given their hiring process.
Also, proper namespaces from the start, Unicode, 9p even under Emacs and who knows what. Oh, and fore sure far less exploits, and with no Kubernetes or Docker nonsense. Half of VC's would be bankrupt today because damn namespaces would do the 90% of today's backends seamlessly.
And maybe we would be using some Inferno based smartphone with custom UI's and programming in Limbo for that. Oh, and batteries lasting a week for sure.
What a coincidence, since Rob Pike wasn't capable of designing a brilliant language.
- Durable, 'enterprise grade' software patterns are baked into the runtime and into common, stable libraries that everyone uses
- You can use Ash, which pretty much entirely solves architectural considerations for many types of backends
- The tooling for inspecting and enforcing style (tidewave, credo, dialyzer, Dan's "vibe" ecosystem tools) is far beyond what I see in other ecosystems
- Ecosystem coverage for pretty much everything you need, including numerical software
- Excellent performance escape hatches (NIFs)
And, as has been shown in various benchmarks, agents are quite good at it.
My one problem in practice has been that getting tests right is hard. LLMs need a lot of cajoling to not build flaky tests with all the concurrency, and I find myself spending hours rewriting parts of the test suite once or twice a week.
* Chances are that fewer people (maybe even none) will look at the code when it's LLM-generated
* Amount of code being written isn't all that critical anymore
* Keeping patches small isn't that big of a deal anymore (because it's now the LLM's job to maintain it, not the human's)
All of this implies: boilerplate isn't a good reason to avoid a language anymore. (I hate this conclusion, because I hate boilerplate).
Then the question is: what kind of language can you use that buys safety with boilerplate? Probably a statically typed one, possibly with lots of asserts... Eiffel? I don't know if there's enough Eiffel code around the Internet to train LLMs, so maybe a more popular one would be better.
Maybe Java or C#? Haskell? OCaml?
The article suggests golang, and I think there are use cases where golang would be a good candidate.
It would be quite interesting to run an experiment: give separate instances of the same LLM coding agent the task to implement a specific application, and use different languages. Then compare quality, code size, runtime performance and token cost. Ideal would be a multi-stage development that better simulates a real development workflow (bug reports and new feature requests come in over time).
Just narrow your window of thought to easier problems for the LLM, and all of a sudden the LLMs do everything you want!
Reminds me of playing around with image generation models. Someone who's been practicing can crank out prompts for really impressive images back to back. But you try to use an everyday object or concept the model isn't trained on? Everybody will race to show off how smart they are by saying "just don't hold it like that."
Python __is__ a boring language (it is mature and well supported) with a somewhat convoluted package manager that has gotten a lot better since that xkcd came out.
Yeah, I get it, Go is better for distributing your code-- just one binary you can copy. But what does that have to do with "boring"?
The specifics of Python were chosen only due to the language ecosystem being fragmented and inconsistent while Python remains an essential learning, research, and now ML programming language (it was my first language and I still love it).
My thoughts on LLM generated code have changed immensely in the last 9 months as I've taken on teams and projects through my consulting work [1] as a fractional CTO. Python remains a difficult, flakey, and inconsistent programming language for complex production systems. Most other programming languages suffer from fragmented toolchains and ecosystems: JavaScript (famously), PHP, and even C/C++ to a degree.
Languages with a single way to do things benefit the most: Ruby, Rust, Swift (even). Low entropy is the way to go and convention > configuration seems to pay off with LLMs.
Mean cost of management is more important than specific edge examples "X company run on Y language". I think that 'boring' languages with rock-solid compilers, toolchains, testing frameworks, and package managers make for high return on engineering time and production maintenance.
[1]: sancho.studio