> The world is unpredictable. If you try to build a generative model that predicts every detail of the future, it will fail. JEPA is not generative AI. It is a system that learns to represent videos really well. The key is to learn an abstract representation of the world and make predictions in that abstract space, ignoring the details you can’t predict. That’s what JEPA does. It learns the underlying rules of the world from observation, like a baby learning about gravity. This is the foundation for common sense, and it’s the key to building truly intelligent systems that can reason and plan in the real world. The most exciting work so far on this is coming from academia, not the big industrial labs stuck in the LLM world.
I think anything less than that is just a parlor trick.
The counterpoint would be that when they started to build LLMs they must have clearly seen limitations of the approach and proceeded regardless, and achieved quite a bit. So the approach to introduce continuous (in-vivo if you will) self-guided training AND multiple sensors and actuators would still be limited but might yield some interesting results nevertheless.
The current approach of guided pre-training and inference on essentially a "dead brain" clearly causes limitations.