Describe a dreamscape like "flying through clouds at sunset" or upload an image, and it finds visually similar scenes from the films.
Live demo: https://ghibli-search.anini.workers.dev/
Full Cloudflare stack: Workers, AI Search, R2, Workers AI
Open source: https://github.com/aninibread/ghibli-search
Would love feedback on the search quality and any ideas for improvements!
The big differentiator: on BrowserOS you can use local LLMs or BYOK and run the agent entirely on the client side, so your company/sensitive data stays on your machine!
Today we're launching filesystem access... just like Claude Cowork, our browser agent can read files, write files, run shell commands! But honestly, we didn't plan for this. It turns out the privacy decision we made 9 months ago accidentally positioned us for this moment.
The architectural bet we made 9 months ago: Unlike other AI browsers (ChatGPT Atlas, Perplexity Comet) where the agent loop runs server-side, we decided early on to run our agent entirely on your machine (client side).
But building everything on the client side wasn't smooth. We initially built our agent loop inside a Chrome extension. But we kept hitting walls -- service worker being single thread JS; not having access to NodeJS libraries. So we made the hard decision 2 months ago to throw away everything and start from scratch.
In the new architecture, our agent loop sits in a standalone binary that we ship alongside our Chromium. And we use gemini-cli for the agent loop with some tweaks! We wrote a neat adapter to translate between Gemini format and Vercel AI SDK format. You can look at our entire codebase here: https://git.new/browseros-agent
How we give browser access to filesystem: When Claude Cowork launched, we realized something: because Atlas and Comet run their agent loop server-side, there's no good way for their agent to access your files without uploading them to the server first. But our agent was already local. Adding filesystem access meant just... opening the door (with your permissions ofc). Our agent can now read and write files just like Claude Code.
What you can actually do today:
a) Organize files in my desktop folder https://youtu.be/NOZ7xjto6Uc
b) Open top 5 HN links, extract the details and write summary into a HTML file https://youtu.be/uXvqs_TCmMQ
--- Where we are now If you haven't tried us since the last Show HN (https://news.ycombinator.com/item?id=44523409), give us another shot. The new architecture unlocked a ton of new features, and we've grown to 8.5K GitHub stars and 100K+ downloads:
c) You can now build more reliable workflows using n8n-like graph https://youtu.be/H_bFfWIevSY
d) You can also use BrowserOS as an MCP server in Cursor or Claude Code https://youtu.be/5nevh00lckM
We are very bullish on browser being the right platform for a Claude Cowork like agent. Browser is the most commonly used app by knowledge workers (emails, docs, spreadsheets, research, etc). And even Anthropic recognizes this -- for Claude Cowork, they have janky integration with browser via a chrome extension. But owning the entire stack allows us to build differentiated features that wouldn't be possible otherwise. Ex: Browser ACLs.
Agents can do dumb or destructive things, so we're adding browser-level guardrails (think IAM for agents): "role(agent): can never click buy" or "role(agent): read-only access on my bank's homepage."
Curious to hear your take on this and the overall thesis.
We’ll be in the comments. Thanks for reading!
GitHub: https://github.com/browseros-ai/BrowserOS
Download: https://browseros.com (available for Mac, Windows, Linux!)
Between us, we've spent years working on satellite operations at SpaceX, Blue Origin, and NASA. At SpaceX, we managed constellation health for Starlink. At Blue, we worked on next-gen test infra for New Glenn. At NASA, we dealt with deep space communications. The same problem kept coming up: by the time you notice a link is degrading, you've often already lost data.
The core issue is that satellite RF links are affected by dozens of interacting variables. A satellite passes overhead, and you need to predict whether the link will hold for the next few minutes. That depends on: the orbital geometry (elevation angle changes constantly), tropospheric attenuation (humidity affects signal loss via ITU-R P.676), rain fade (calculated via ITU-R P.618 - rain rates in mm/hr translate directly to dB of loss at Ka-band and above), ionospheric scintillation (we track the KP index from magnetometer networks), and network congestion on top of all that.
The traditional approach is reactive. Operators watch dashboards, and when SNR drops below a threshold, they manually reroute traffic or switch to a backup link. With 10,000 satellites in orbit today and 70,000+ projected by 2030, this doesn't scale. Our system ingests telemetry at around 100,000 messages per second from satellites, ground stations, weather radar, IoT humidity sensors, and space weather monitors. We run physics-based models in real-time - the full link budget equations, ITU atmospheric standards, orbital propagation - to compute what should be happening. Then we layer ML models on top, trained on billions of data points from actual multi-orbit operations.
The ML piece is where it gets interesting. We use federated learning because constellation operators (understandably) don't want to share raw telemetry. Each constellation trains local models on their own data, and we aggregate only the high-level patterns. This gives us transfer learning across different orbit types and frequency bands - learnings from LEO Ka-band links help optimize MEO or GEO operations. We can predict most link failures 3-5 minutes out with >90% accuracy, which gives enough time to reroute traffic before data loss. The system is fully containerized (Docker/Kubernetes) and deploys on-premise for air-gapped environments, on GovCloud (AWS GovCloud, Azure Government), or standard commercial clouds.
Right now we're testing with defense and commercial partners. The dashboard shows real-time link health, forecasts at 60/180/300 seconds out, and root cause analysis (is this rain fade? satellite setting below horizon? congestion?). We expose everything via API - telemetry ingestion, predictions, topology snapshots, even an LLM chat endpoint for natural language troubleshooting.
The hard parts we're still working on: prediction accuracy degrades for longer time horizons (beyond 5 minutes gets dicey), we need more labeled failure data for rare edge cases, and the federated learning setup requires careful orchestration across different operators' security boundaries. We'd love feedback from anyone who's worked on satellite ops, RF link modeling, or time-series prediction at scale. What are we missing? What would make this actually useful in a production NOC environment?
Happy to answer any technical questions!
Capitol One statement: https://investor.capitalone.com/news-releases/news-release-d...
Brex statement: https://www.brex.com/journal/brex-and-capital-one-join-force...
Qwen3 Omni looks perfect on paper (“real-time”, speech-to-speech, etc). But I’ve been poking around and I can’t find a single reproducible “here’s how I got the open weights doing real speech-to-speech locally” writeup. Lots of “speech in → text out” or “audio out after the model finishes”, but not a usable realtime voice loop. Feels like either (a) the tooling isn’t there yet, or (b) I’m missing the secret sauce.
What are people actually using in 2026 if they want open + local voice?
Is anyone doing true end-to-end speech models locally (streaming audio out), or is the SOTA still “streaming ASR + LLM + streaming TTS” glued together?
If you did get Qwen3 Omni speech-to-speech working: what stack (transformers / vLLM-omni / something else), what hardware, and is it actually realtime?
What’s the most “works today” combo on a single GPU?
Bonus: rough numbers people see for mic → first audio back
Would love pointers to repos, configs, or “this is the one that finally worked for me” war stories.
Examples
Simple function:
from polymcp.polymcp_toolkit import expose_tools_http
def add(a: int, b: int) -> int: """Add two numbers""" return a + b
app = expose_tools_http([add], title="Math Tools")
Run with:
uvicorn server_mcp:app --reload
Now add is exposed via MCP and can be called directly by AI agents.
API function:
import requests from polymcp.polymcp_toolkit import expose_tools_http
def get_weather(city: str): """Return current weather data for a city""" response = requests.get(f"https://api.weatherapi.com/v1/current.json?q={city}") return response.json()
app = expose_tools_http([get_weather], title="Weather Tools")
AI agents can call get_weather("London") to get real-time weather data instantly.
Business workflow function:
import pandas as pd from polymcp.polymcp_toolkit import expose_tools_http
def calculate_commissions(sales_data: list[dict]): """Calculate sales commissions from sales data""" df = pd.DataFrame(sales_data) df["commission"] = df["sales_amount"] * 0.05 return df.to_dict(orient="records")
app = expose_tools_http([calculate_commissions], title="Business Tools")
AI agents can now generate commission reports automatically.
Why it matters for companies • Reuse existing code immediately: legacy scripts, internal libraries, APIs. • Automate complex workflows: AI can orchestrate multiple tools reliably. • Plug-and-play: multiple Python functions exposed on the same MCP server. • Reduce development time: no custom wrappers or middleware needed. • Built-in reliability: input/output validation and error handling included.
Polymcp makes Python functions immediately usable by AI agents, standardizing integration across enterprise software.