Optio is an open-source orchestration system that turns tickets into merged pull requests using AI coding agents. You point it at your repos, and it handles the full lifecycle:
- Intake — pull tasks from GitHub Issues, Linear, or create them manually
- Execution — spin up isolated K8s pods per repo, run Claude Code or Codex in git worktrees
- PR monitoring — watch CI checks, review status, and merge readiness every 30s
- Self-healing — auto-resume the agent on CI failures, merge conflicts, or reviewer change requests
- Completion — squash-merge the PR and close the linked issue
The key idea is the feedback loop. Optio doesn't just run an agent and walk away — when CI breaks, it feeds the failure back to the agent. When a reviewer requests changes, the comments become the agent's next prompt. It keeps going until the PR merges or you tell it to stop.
Built with Fastify, Next.js, BullMQ, and Drizzle on Postgres. Ships with a Helm chart for production deployment.
You can afford a lot of extra guardrails and process to ensure sufficient quality when the result is a system that gets improved autonomously 24/7.
I'm on my way home from a client, and meanwhile another project has spent the last 10 hours improving with no involvement from me. I spent a few minutes reviewing things this morning, after it's spent the whole night improving unattended.
I am all for delegating everything to AI agents, but it just becomes a mess over time if you don’t steer things often enough.
EDIT: I'll add that you can't expect it to guess what you want, but you can let it manage how it delivers it. We don't expect e.g. a product manager to dictate how developers deliver the code, just what the acceptance criteria is, and that's where I'm headed.
From the project: "The plugin enqueues the input and a daemon picks it up - planning, building, reviewing, and validating autonomously."
The part that is not clear to me (and causes most problems for me) is the "validating". It makes a mistake, or decides mocking an interface is fine, etc. declares success and moves on to the next. The bigger the project the more small mistakes compound. It sounds like the agent is doing the validation. What's the approach here for validation?
Right now nothing special happens, so claude/codex can access their normal tools and make web calls. I suppose that also means they could figure out they're running in a k8s pod and do service discovery and start calling things.
What kind of features would you be interested in seeing around this? Maybe a toggle to disable internet connections or other connections outside of the container?
I want to run my agents fully isolated with headless mode. To achieve that safely you have to run a proxy
Like, can an AI agent use a browser, attempt to use the software, find bugs and create a ticket? Can an AI agent use a browser, try to use the software and suggest new features?
As far as humans in the loop, the only human we ultimately cannot get rid of is the user. But I think with a combo of user feedback forms and automated metrics we can give AI a lot of feedback about how good software is just from users using the software.
I started tasking subagents for each remaining chunk of work, and then found I was really just repeating the need for a normal sprint tasking cycle but where subagents completed the tasks with the unit tests as exit criteria. So optio came to my mind, where I asked an agent to run the test suite, see what was failing, and make tickets for each group of remaining failures. Then I use optio to manage instances of agents working on and closing out each ticket.
I am not sure how AI agent variation of that joke would look like. Every now and then some blog posts lands on HN asking "Where are all new apps created thanks to LLM productivity boost"?. I am more surprised there are no news about some serious fuck-ups that can be traced back to LLM usage in code.
a) you can create CI/build checks that run in github and the agents will make sure pass before it merges anything
b) you can configure a review agent with any prompt you'd like to make sure any specific rules you have are followed
c) you can disable all the auto-merge settings and review all the agent code yourself if you'd like.
you've really got to be careful with absolute language like this in reference to LLMs. A review agent provides no guarantees whatsoever, just shifts the distribution of acceptable responses, hopefully in a direction the user prefers.
A software engineer takes a spec which "shifts the distribution of acceptable responses" for their output. If they're 100% accurate (snort), how good does an LLM have to be for you to accept its review as reasonable?
It's not like a human being always pushes correct code, my risk assessment for an LLM reading a small bug and just making a PR is that thinking too hard is a waste of time. My risk assessment for a human is very similar, because actually catching issues during code review is best done by tests anyways. If the tests can't tell you if your code is good or not then it really doesn't matter if it's a human or an LLM, you're mostly just guessing if things are going to work and you WILL push bad code that gets caught in prod.
| | | |
How is this part tackled when all that you have is GH issues? Doesn’t this work only for the most trivial issues?