My bet is that eventually we'll end up with a powerful agentic tool that uses the browser environment to plan and execute personal agents or to deploy business agents that doesn't access system resources any more than browsers do at the moment.
Not that I'm not excited about the possibilities in personal productivity, but I don't think this is the way--if it was, we wouldn't have lost, say, the ability to have proper desktop automation via AppleScript, COM, DDE (remember that?) across mainstream desktop operating systems.
And today this is.. not sufficient. What we require today is to run software protected from each other. For quite some time I tried to use Unix permissions for this (one user per application I run), but it's totally unworkable. You need a capabilities model, not an user permission model
Anyway I already linked this elsewhere in this thread but in this comment it's a better fit https://xkcd.com/1200/
Locking down features to have a unified experience is what a browser should do, after all, no matter the performance. Of course there are various vendors who tried to break this by introducing platform specific stuff, but that's also why IE, and later Edge (non-chrome) died a horrible death
There are external sandbox escapes such as Adobe Flash, ActiveX, Java Applet and Silverlight though, but those external escapes are often another sandbox of its own, despite all of them being a horrible one...
But with the stabilization of asm.js and later WebAssembly, all of them is gone with the wind.
Sidenote: Flash's scripting language, ActionScript is also directly responsible for the generational design of Java-ahem-ECMAScript later on, also TypeScript too.
I feel like I am the only one who absolutely loved ActionScript, especially AS3. I wrote a video aggregator (chime.tv[1]) back in the day using AS3 and it was such a fun experience.
1. https://techcrunch.com/2007/06/12/chimetv-a-prettier-way-to-...
There is the universal hate for flash because it was used for ads and had shitty security, but anyone I know who actually used AS3 loved it.
At its peak, with flex builder, we also had a full blown UI Editor, where you could just add your own custom eöements designed directly with flash ... and then it was all killed because Apple did not dare to open source it, or put serious efforts on their own into improving the technical base of the flash player (that had aquired lots of technical dept).
I never really worked with it, but it seems whenever it comes up here or on Reddit, people who did, miss it. I think the authoring side of Flash is remembered very positively.
Silverlight was nice, pity it got discontinued.
Thus it isn't as if the browser plugins story is directly responsible for its demise.
I would like to humbly propose that we simply provision another computer for the agent to use.
I don't know why this needs to be complicated. A nano EC2 instance is like $5/m. I suspect many of us currently have the means to do this on prem without resorting to virtualization.
https://developer.chrome.com/blog/persistent-permissions-for...
On my desktop Chrome on Ubuntu, it seems to be persistent, but on my Android phone in Chrome, it loses the directory if I refresh.
At the risk of sounding obvious :
- Chrome (and Chromium) is a product made and driven by one of the largest advertising company (Alphabet, formally Google) as a strategical tool for its business model
- Chrome is one browser among many, it is not a de facto "standard" just because it is very popular. The fact that there are a LOT of people unable to use it (iOS users) even if they wanted to proves the point.
It's quite important not to amalgamate some experimental features put in place by some vendors (yes, even the most popular ones) as "the browser".
What I find most compelling about this framing is the maturity argument. Browser sandboxing has been battle-tested by billions of users clicking on sketchy links for decades. Compare that to spinning up a fresh container approach every time you want to run untrusted code.
The tradeoff is obvious though: you're limited to what browsers can do. No system calls, no arbitrary binaries, no direct hardware access. For a lot of AI coding tasks that's actually fine. For others it's a dealbreaker.
I'd love to see someone benchmark the actual security surface area. "Browsers are secure" is true in practice, but the attack surface is enormous compared to a minimal container.
Browsers are closer to operating systems rather than sandboxes, so giving access of any kind to an agent seems dangerous. In the post I can see it's talking about the file access API, perhaps a better phrasing is, the browser has a sandbox?
The point is that most people won't do that. Just like with backups, strong passwords, 2FA, hardware tokens etc. Security and safety features must be either strictly enforced or on enabled by default and very simple to use. Otherwise you leave "the masses" vulnerable.
Something more like a TEE inside the browser of sorts. Not sure if there is anything like this.
It seems he started his blog in 2003: https://simonwillison.net/2003/Jun/12/oneYearOfBlogging/
And then you see the recent vulnerabilities in opencode for example. The current model is unsustainable
It would be great if desktop Linux adopted a better security model (maybe inspired by Android). So far we got this https://xkcd.com/1200/ and it's not sufficient
Browser sandboxes are swiss cheese. In 2024 alone, Google reported 75 zero-day exploits that break out of their browser's sandbox.
Browsers are the worst security paradigm. They have tens of millions of lines of code, far more than operating system kernels. The more lines of code, the more bugs. They include features you don't need, with no easy way to disable them or opt-in on a case-by-case basis. The more features, the more an attacker can chain them into a usable attack. It's a smorgasbord of attack surface. The ease with which the sandbox gets defeated every year is proof.
So why is everyone always using browsers, anyway? Because they mutated into an application platform that's easy to use and easy to deploy. But it's a dysfunctional one. You can't download and verify the application via signature, like every other OS's application platform. There's no published, vetted list of needed permissions. The "stack" consists of a mess of RPC calls to random remote hosts, often hundreds if not thousands required to render a single page. If any one of them gets compromised, or is just misconfigured, in any number of ways, so does the entire browser and everything it touches. Oh, and all the security is tied up in 350 different organizations (CAs) around the world, which if any are compromised, there goes all the security. But don't worry, Google and Apple are hard at work to control them (which they can do, because they control the application platform) to give them more control over us.
This isn't secure, and there's really no way to secure it. And Google knows that. But it's the instrument making them hundreds of billions of dollars.
Also the double iframe technique is important for preventing exfiltration through navigation, but you have to make sure you don't allow top navigation. The outer iframe will prevent the inner iframe from loading something outside of the frame-src origins. This could mean restricting it to only a server which would allow sending it to the server, but if it's your server or a server you trust that might be OK. Or it could mean srcdoc and/or data urls for local-only navigation.
I find the WebAssembly route a lot more likely to be able to produce true sandboxen.
Can you believe that if you download a calculator app it can delete your $HOME? What kind of idiot designed these systems?
The problems discussed by both Simon and Paul where the browser can absolutely trash any directory you give it is perhaps the paradigmatic example where git worktree is useful.
Because you can check out the branch for the browser/AI agent into a worktree, and the only file there that halfway matters is the single file in .git which explains where the worktree comes from.
It's really easy to fix that file up if it gets trashed, and it's really easy to use git to see exactly what the AI did.
1) webcontainer allows nodejs frontend and backend apps to be run in the browser. this is readily demonstrated to (now sadly unmaintained) bolt.diy project.
2) jslinux and x86 linux examples allow running of complete linux env in wasm, and 2 way communication. A thin extension adds networking support to Linux.
so technically it's theoretically possible to run a pretty full fledged agentic system with the simple UX of visiting a URL.