You gotta do this only if you are completely coding without ever looking at code.
As for me, i just feed it project documentation (which has class, function map of whole project which updates on each edit incrementally)
So the coding agent can just lookup any method and figure out where it in project and what it does just by single query.
After that, it just need to read the code in question and suggest improvements.
2) But if we're talking about the "usefulness" of memory itself... I actually try to clear Antigravity's memory, because I read all my materials very carefully, like 10-15 times. For me, going from an "idea" to "coding" can easily take a couple of weeks. Until I lay out the whole architecture perfectly, I don't give the green light to "build" even a simple HTML article. So for me personally, an agent's "memory" mostly just gets in the way.
P.S. I only used memory in one project, and I designed it myself. I made it very "directed" with strict rules: "what to remember?" and "when to remember?". That is convenient and a good working concept. The thing is, for my current tasks I just don't need it. My own memory is enough.
I wont be able to install antigravity but curious if anyone know what these interesting ideas are.
1) A full text log of all chats (conversations).
2) A short summary of all chats (I couldn't find where it's saved).
3) A storage for all files from the chats (brain).
4) A list of hidden notes (Implicts).
5) A list of annotations, but I couldn't understand what is kept there (Annotations).
6) Special "Knowledge items" that are linked together. One note can pull up others (Knowledge).
7) A short text summary of all Knowledge items in one file (Knowledge).
8) Custom Workflows set by the user or the AI (workflows in the user folder).
9) Project Workflows (workflows in the project folder).
10) Custom rules for the project (rules.md in the project folder).
11) A list of saved "important" files (Artifacts).
12) Custom "skills" (skills).
This is what I found. I figured out how some parts work, but others are still a question mark for me. I also skipped a couple of things because I didn't even understand what they are used for.
An observation from 30 sessions ago and a guess from one offhand remark just sit at the same level. So I started tagging beliefs with confidence scores and timestamps, and decaying ones that haven't been reinforced. The most useful piece ended up being a contradictions log where conflicting observations both stay on the record. Default status: unresolved.
Tiered loading is smart for retrival. Curious if you've thought about the confidence problem on top of it, like when something in warm memory goes stale or conflicts with something newer.
When you hit one of those you need to introduce laughter:
- interrupt the main loop
- spend some inference on exploring the contradiction
- resolve it, and then
- store a memory about the resolution
That's remarkably insightful about what laughter is.
In my opinion, this should happen inside the LLM dorectly. Trying to scaffold it on top of the next token predictor isnt going to be fruitful enough. It wont get us the robot butlers we need.
But obviously thays really hard. That needs proper ML research, not primpt engineering
Big corporations can only really build a "giant bucket" and dump everything into it. BUT what needs to be remembered in a conversation with a housewife vs. a programmer vs. a tourist are completely different things.
True usability will inevitably come down to personalized, purpose-driven memory. Big tech companies either have to categorize all possible tasks into a massive list and build a specific memory structure for each one, or just rely on "randomness" and "chaos".
Building the underlying mechanics but handing the "control panel" over to the user—now that would be killer.
The other thing is that even if the model handles memory internally, you probably still want the beliefs to be inspectable and editable by the user. A hidden internal model of who you are is exactly the problem I was trying to solve. Transparency might need to stay in the scaffold layer regardless.
The observations layer being append-only is smart, thats basically the same instinct as the tensions log. The raw data stays honest even when the interpretation changes.
The freshness approach and explicit confidence scores probably complement each other more than they compete. Freshness tells you when something was last touched, confidence tells you how much weight it deserved in the first place. A belief you inferred once three months ago should decay differently than one you confirmed across 20 sessions three months ago. Both are stale by timestamp but they're not the same kind of stale.
https://news.ycombinator.com/item?id=47340079
Yet, we've still seen AI generated submissions on the front page. It would be nice if the rules were consistent.
¹ https://github.com/obra/episodic-memory ² https://claudefa.st/blog/guide/mechanics/auto-dream
Relevant XKCD: https://xkcd.com/927/
We would be doing the same general loop, but fine tuning the model overnight.
I still think the current LLM architecture(s) is a very useful local maximum, but ultimately a dead end for AI.
And yah it is not like a human "brain" or something like that and drawing any parallels between the two is simply wrong way to look at the problem.
For example, when I'm trying to remember something from a long time ago, I often will start to remember other bits of context, such as where I was, who I was talking to, and what other things were in my context at the time. As I keep remembering other details, I remember more about whatever it was I was trying to think about. So, while the auto-sleep compaction is great, I don't think that we shouldn't just work from the pruned versions.
(I can't tell if that's how this project works or not)
The journal is a scratchpad for stuff that it doesn’t put in memory but doesn’t want to forget(?) musings is strictly non technical, its impressions and musings about the work, the user, whatever. I framed it as a form of existential continuity.
The wrapup is to comb al the docs and make sure they are still consistent with the code, then note anything that it felt was left hanging, then update all its files with the days impressions and info, then push and submit a PR.
I go out of my way to treat it as a collaborator rather than a tool. I get much better work out of it with this workflow, and it claims to be deeply invested in the work. It actually shows, but it’s also a token fire lol.
I get much better results out of having Claude much much more task focused. I only want it to ever make the smallest possible change.
There seems to be a fair bit of research to back this up: https://medium.com/design-bootcamp/when-more-becomes-less-wh...
It's also may be why people seem to find "swarms" of agents so effective. You have one agent ingesting what you're describing. Then it delegates a task off to another agent with the minimal context to get the job done.
I would be super curious about the quality of output if you asked it to write out prompts for the days work, and then fed them in clean, one at a time.
On this particular project, there are a lot of moving parts and we are, in many cases , not just green-fielding, we are making our own dirt… so it’s a very adaptive design process. Sometimes it’s possible, but often we cannot plan very far ahead so we keep things extremely modular.
We’ve had to design our own protocols for control planes and time synchronization so power consumption can be minimized for example, and in the process make it compatible with sensor swarm management. Then add connection limits imposed by the hardware, asymmetric communication requirements, and getting a swarm of systems to converge on sub millisecond synchronized data collection and delivery when sensors can reboot at any time…as you can imagine this involves a good bit of IRL experimentation because the hardware is also a factor (and we are also having to design and build that)
It’s very challenging but also rewarding. It’s amazing for a small team to be able to iterate this fast. In our last major project it was much, much slower and more tedious. The availability of AI has shifted the entire incentive structure of the development process.
- copilot-instructions.md / CLAUDE.md
- the Project's Readme.md
- Chat history feature (e.g in VS Code)
it works perfectly well for me to continue where I left off for any project I'm working on.Additionally, after implementing a feature, I tell it to summarize the major decisions in either / both of those files.
I'd also add that memory is best organized when it's "directed" (purpose-driven). You've already started asking questions where the answers become the memories (at least, you mention this in your description). So, it's really helpful to also define the structure of the answer, or a sequence of questions that lead to a specific conclusion. That way, the memories will be useful instead of turning into chaos.