World models: how close are we to something usable in a real product?
Posted by bruhagan@reddit | LocalLLaMA | View on Reddit | 2 comments
I'm a dad of two (8 and 10) building a voice-first learning game for kids 6-12. Think Carmen Sandiego, but the kid is inside the adventure, talking to characters and solving the plot as they learn.
Today I'm using 2D Rive animations driven by LLM reactions. Kids engage, but the ceiling is low. What I actually want is a real-time rendered character and world that the agent can direct moment to moment.
So I've been tracking Genie 3, Odyssey, World Labs, and the avatar side (Runway, Anam). My working thesis is that within 18 months, the convergence of interactive real-time world models and real-time avatars hits something usable in production. But today it still feels premature.
Three things I'd love input on: is anyone here actually shipping or prototyping on a world model today, outside demos? Does 12-18 months feel reasonable, or am I being optimistic? And for a scripted-adventure use case (known characters, recurring world, narrative beats), is a world model the right primitive, or is it overkill vs. stitched pre-gen assets + a real-time avatar layer?
ClearApartment2627@reddit
Apparently, it is „coming soon“:
https://huggingface.co/tencent/HY-World-2.0
Disclaimer: I never tried any of the bits they have released so far.
-dysangel-@reddit
I would guess you're being optimistic about Genie running on consumer hardware within 1.5 years, even if it were available to the public (though maybe I'd be surprised). I also think that it will be a while before any model can fully create and maintain video-game-like complexity purely in its cache, and so these world models would probably be best combined with offline datafiles/databases, where fundamental structure is saved to disk, and the model dreams the graphics and interaction with the structure into existence.