The project is trying to see if we can get agents (any agent, any model) to propose 'knowledge units' (KUs) as a standard schema based on gotchas it runs into during use, and proactively query for existing KUs in order to get insights which it can verify and confirm if they prove useful.
It's currently very much a PoC with a more lofty proposal in the repo, we're trying to iterate from local use, up to team level, and ideally eventually have some kind of public commons.
At the team level (see our Docker compose example) and your coding agent configured to point to the API address for the team to send KUs there instead - where they can be reviewed by a human in the loop (HITL) via a UI in the browser, before they're allowed to appear in queries by other agents in your team.
We're learning a lot even from using it locally on various repos internally, not just in the kind of KUs it generates, but also from a UX perspective on trying to make it easy to get using it and approving KUs in the browser dashboard. There are bigger, complex problems to solve in the future around data privacy, governance etc. but for now we're super focussed on getting something that people can see some value from really quickly in their day-to-day.
Tech stack:
* Skills - markdown
* Local Python MCP server (FastMCP) - managing a local SQLite knowledge store
* Optional team API (FastAPI, Docker) for sharing knowledge across an org
* Installs as a Claude Code plugin or OpenCode MCP server
* Local-first by default; your knowledge stays on your machine unless you opt into team sync by setting the address in config
* OSS (Apache 2.0 licensed)
Here's an example of something which seemed straight forward, when asking Claude Code to write a GitHub action it often used actions that were multiple major versions out of date because of its training data. In this case I told the agent what I saw when I reviewed the GitHub action YAML file it created and it proposed the knowledge unit to be persisted. Next time in a completely different repo using OpenCode and an OpenAI model, the cq skill was used up front before it started the task and it got the information about the gotcha on major versions in training data and checked GitHub proactively, using the correct, latest major versions. It then confirmed the KU, increasing the confidence score.
I guess some folks might say: well there's a CLAUDE.md in your repo, or in ~/.claude/ but we're looking further than that, we want this to be available to all agents, to all models, and maybe more importantly we don't want to stuff AGENTS.md or CLAUDE.md with loads of rules that lead to unpredictable behaviour, this is targetted information on a particular task and seems a lot more useful.
Right now it can be installed locally as a plugin for Claude Code and OpenCode:
claude plugin marketplace add mozilla-ai/cq claude plugin install cq
This allows you to capture data in your local ~/.cq/local.db (the data doesn't get sent anywhere else).
We'd love feedback on this, the repo is open and public - so GitHub issues are welcome. We've posted on some of our social media platforms with a link to the blog post (below) so feel free to reply to us if you found it useful, or ran into friction, we want to make this something that's accessible to everyone.
Blog post with the full story: https://blog.mozilla.ai/cq-stack-overflow-for-agents/ GitHub repo: https://github.com/mozilla-ai/cq
Thanks again for your time.
Edit: Just to clarify, this has been accepted into the community extensions repo. So you can use it like:
```
INSTALL hnsw_acorn FROM community;
LOAD hnsw_acorn;
```
Over the past year we tried to use many different platforms/frameworks to build out agent systems and while building we hit some sort of problem with all of them, so we decided to have a go at it. Jon has worked with kubernettes and terraform for years and always liked the declarative nature so took patterns and concepts from both to build out Orloj.
Orloj treats agents the way infrastructure-as-code treats cloud resources. You write a manifest that declares an agent's model, tools, permissions, and execution limits. You compose agents into directed graphs (pipelines, hierarchies, or swarm loops).
Governance has been overlooked so we made resource policies (AgentPolicy, AgentRole, and ToolPermission) that are evaluated inline during execution, before every agent turn and tool call. Instead of prompt instructions that the model might ignore, these policies are a runtime gate. Unauthorized actions fail closed with structured errors and full audit trails. You can set token budgets per run, whitelist models, block specific tools, and scope policies to individual agent systems.
For reliability, we built lease-based task ownership (so crashed workers don't leave orphan tasks), which allows you to run workers on different machines with whatever compute that’s needed. It helps when we need a GPU for certain tasks (like we did). The scheduler also supports cron triggers and webhook-driven task creation.
The architecture is a server/worker split like kubernettes. orlojd hosts the API, resource store (in-memory for dev, Postgres for production), and task scheduler. orlojworker instances claim and execute tasks, route model requests through a gateway (OpenAI, Anthropic, Ollama, etc.), and run tools in configurable isolation (direct, sandboxed, container, or WASM).
We work with a lot of MCP servers so wanted to make MCP integration as easy as possible. You register an MCP server (stdio or HTTP), Orloj auto-discovers its tools, and they become first-class resources with governance applied. So you can connect something like the GitHub MCP server and still have policy enforcement over what agents are allowed to do with it.
It comes shipped with a built in UI to manage all your workflows and topology to see everything working in real time. There are a few examples and starter templates in the repo to start playing around with to get a feel for what’s possible.
More info in the docs: https://docs.orloj.dev
We're a small team and this is v0.1.0, so there's a lot still on the roadmap, but the full runtime is open source today and we'd love feedback on what we've built so far. What would you use this for? What's missing?
I used this to build a CLI that indexes hours of footage into ChromaDB, then searches it with natural language and auto-trims the matching clip. Demo video on the GitHub README. Indexing costs ~$2.50/hr of footage. Still-frame detection skips idle chunks, so security camera / sentry mode footage is much cheaper.