I agree, it isn't for everyone.
“It’s not working well enough!” We tell them. They respond with “Have you tried using it more?”
Soooo... Its not just greed. There is something there.
The point was: when they do agree, it is a very strong signal.
Sometimes they'll announce the changes, and they'll even try to spin it as improving services or increasing value.
Local AI capabilities are improving at a rapid pace, at some point soon we'll have an RWKV or a 4B LLM that performs at a GPT-5 level, with reasoning and all the bells and whistles, and hopefully that'll shake out most of the deceptive and shady tactics the big platforms are using.
I do find Codex very good at reviewing work marked as completed by Claude, especially when I get Claude to write up its work with a why,where & how doc.
It’s very rare Claude has fully completed the task successfully and Codex doesn’t find issues.
What if we had an agent-to-agent network that contacted the human as a source of truth whenever they needed it. Keep a list of employees that are experts in said skill, then let them answer 1-2 questions.
Or are we speeding up our replacement like this?
I like this idea, I’ll experiment with it as part of a brainstorming skill to make the agents ask clarifying questions (to each other and to the human in the loop).
What are you suggesting instead? To share the prompt in order to capture the intent? Usually I expect the plan to reflect the prompt.
I find it interesting when I create a PR after a quick session: the description really captures the intent instead of focusing on the actual implementation. I think it’s because the context is still intact, and that’s very useful.
The good thing is that it establishes a direct connection so it's already much better than having one agent spawn the other and wait for its output, or read/write to a shared .md file -- but it would be cool to make it work for all agent harnesses.
Open to ideas! The repo is open-source.
If "more changes than expected" means "out of scope", then I disagree. Those types of changes are exactly one of the things that's best to avoid whether code is being written by a person or an LLM.
That’s why I’m wondering if we should instruct the agents to act more like humans would: if the change can be done in a follow-up PR, this is probably what an experienced engineer would do.
What makes loop different is that it lets Claude and Codex talk to each other directly (receiving messages from Claude via the Codex App Server and from Codex via Claude Code Channels). I believe this approach works even better than having one agent spawn the other and wait for its output, or read/write to a shared file.
The interesting thing here is agents working together to be better at a single task. Not agents integrated in a workflow. There's a lot of opportunity in "if this then that" scenarios that has nothing to do with two agents communicating on one single element of a problem, it's just Agent detect -> agent solve (-> Agent review? Agent deploy? Etc.)
Also implemented this as a gh action, works well for sentry to gh to auto triage to fix pr.
Currently I’m authoring with codex and reviewing with opus.
Even with the same model (--self-review), that makes a huge difference, and immediately highlights how bad the first iterations of an LLM output can be.
Personally, I have tried pair programming, and it hasn't really felt like something that works, for various reasons - the main one is that I (and my partner) have complex thought processes in my head, that is difficult and cumbersome to articulate, and to an onlooker, it looks like I'm randomly changing code.