The idea started as a pretty simple question: text chatbots are everywhere, but they rarely feel present. I wanted something closer to a call, where the character actually reacts in real time (voice, timing, expressions), not just “type, wait, reply”.
Beni is basically:
A Live2D avatar that animates during the call (expressions + motion driven by the conversation)
Real-time voice conversation (streaming response, not “wait 10 seconds then speak”)
Long-term memory so the character can keep context across sessions
The hardest part wasn’t generating text, it was making the whole loop feel synchronized: mic input, model response, TTS audio, and Live2D animation all need to line up or it feels broken immediately. I ended up spending more time on state management, latency and buffering than on prompts.
Some implementation details (happy to share more if anyone’s curious):
Browser-based real-time calling, with audio streaming and client-side playback control
Live2D rendering on the front end, with animation hooks tied to speech / state
A memory layer that stores lightweight user facts/preferences and conversation summaries to keep continuity
Current limitation: sign-in is required today (to persist memory and prevent abuse). I’m adding a guest mode soon for faster try-out and working on mobile view now.
What I’d love feedback on:
Does the “real-time call” loop feel responsive enough, or still too laggy?
Any ideas for better lip sync / expression timing on 2D/3D avatars in the browser?
Thanks, and I’ll be around in the comments.
In the near term, this creates a noisy and uncomfortable phase. AI not only launders weak or outcome-engineered studies by polishing them, it also exposes how much questionable work already existed. Retractions, skepticism, and institutional anxiety increase. This phase delays reform because institutions instinctively defend existing processes—but it also makes their failure undeniable at scale.
The likely future is a shift from narrative-centric publishing to verification-centric science. As AI makes large-scale checking, replication, statistical forensics, and cross-paper analysis cheaper than traditional review, authority migrates from prose to data, code, and reproducible methods. Single papers matter less; replicated clusters matter more. What looks like breakdown today is better seen as a phase transition: the slow collapse of narrative authority and the emergence of evidence-first norms shaped as much for machine readers as for human ones.
I started out following Crafting Interpreters, but gradually branched off that until I had almost nothing left in common.
Tech stack: Rust, Cranelift (JIT compilation), LALRPOP (parser).
Original title: "A small programming language where everything is a value" (edited based on comments)