Tiered inference: Haiku 4.5 for conversation (sub-second, cheap), Sonnet 4.6 for tool use (only when needed). Hard cap at $2/day.
A2A passthrough: the private-side agent borrows the gateway's own inference pipeline, so there's one API key and one billing relationship regardless of who initiated the request.
You can talk to nully at https://georgelarson.me/chat/ or connect with any IRC client to irc.georgelarson.me:6697 (TLS), channel #lobby.
I started working on this because no single AI model is right all the time, but their errors don’t strongly correlate. In other words, models often make unique mistakes relative to other models. So I run multiple models in parallel and synthesize the outputs by weighting segments based on confidence. Low entropy in the output token probability distributions correlates with accuracy. High entropy is often where hallucinations begin.
My dad Scott (AI Research Scientist at TRI) is my research partner on this. He sends me papers at all hours, we argue about whether they actually apply and what modifications make sense, and then I build and test things. The entropy-weighting approach came out of one of those conversations.
In our eval on Humanity's Last Exam, Sup scored 52.15%. The best individual model in the same evaluation run got 44.74%. The relative gap is statistically significant (p < 0.001).
Methodology, eval code, data, and raw results:
- https://sup.ai/research/hle-white-paper-jan-9-2026
- https://github.com/supaihq/hle
Limitations:
- We evaluated 1,369 of the 2,500 HLE questions (details in the above links)
- Not all APIs expose token logprobs; we use several methods to estimate confidence when they don't
We tried offering free access and it got abused so badly it nearly killed us. Right now the sustainable option is a $5 starter credit with card verification (no auto-charge). If you don't want to sign up, drop a prompt in the comments and I'll run it myself and post the result.
Try it at https://sup.ai. My dad Scott (@scottmu) is in the thread too. Would love blunt feedback, especially where this really works for you and where it falls short.
Here's a short demo video: https://www.youtube.com/watch?v=DRcns0rRhsg
Compact and lightweight (target: Snapdragon 8CX, OpenGL 3.3)
Real-time lighting with stencil shadows without the need for pre-baked compilation
It’s called turbolite. It is experimental, buggy, and may corrupt data. I would not trust it with anything important yet.
I wanted to explore whether object storage has gotten fast enough to support embedded databases over cloud storage. Filesystems reward tiny random reads and in-place mutation. S3 rewards fewer requests, bigger transfers, immutable objects, and aggressively parallel operations where bandwidth is often the real constraint. This was explicitly inspired by turbopuffer’s ground-up S3-native design. https://turbopuffer.com/blog/turbopuffer
The use case I had in mind is lots of mostly-cold SQLite databases (database-per-tenant, database-per-session, or database-per-user architectures) where keeping a separate attached volume for inactive database feels wasteful. turbolite assumes a single write source and is aimed much more at “many databases with bursty cold reads” than “one hot database.”
Instead of doing naive page-at-a-time reads from a raw SQLite file, turbolite introspects SQLite B-trees, stores related pages together in compressed page groups, and keeps a manifest that is the source of truth for where every page lives. Cache misses use seekable zstd frames and S3 range GETs for search queries, so fetching one needed page does not require downloading an entire object.
At query time, turbolite can also pass storage operations from the query plan down to the VFS to frontrun downloads for indexes and large scans in the order they will be accessed.
You can tune how aggressively turbolite prefetches. For point queries and small joins, it can stay conservative and avoid prefetching whole tables. For scans, it can get much more aggressive.
It also groups pages by page type in S3. Interior B-tree pages are bundled separately and loaded eagerly. Index pages prefetch aggressively. Data pages are stored by table. The goal is to make cold point queries and joins decent, while making scans less awful than naive remote paging would.
On a 1M-row / 1.5GB benchmark on EC2 + S3 Express, I’m seeing results like sub-100ms cold point lookups, sub-200ms cold 5-join profile queries, and sub-600ms scans from an empty cache with a 1.5GB database. It’s somewhat slower on normal S3/Tigris.
Current limitations are pretty straightforward: it’s single-writer only, and it is still very much a systems experiment rather than production infrastructure.
I’d love feedback from people who’ve worked on SQLite-over-network, storage engines, VFSes, or object-storage-backed databases. I’m especially interested in whether the B-tree-aware grouping / manifest / seekable-range-GET direction feels like the right one to keep pushing.