LeWorldModel
Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pretrained encoders, or auxiliary supervision to avoid representation collapse. In this work, we introduce LeWorldModel (LeWM), the first JEPA that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings. This reduces tunable loss hyperparameters from six to one compared to the only existing end-to-end alternative. With 15M parameters trainable on a single GPU in a few hours, LeWM plans up to 48× faster than foundation-model-based world models while remaining competitive across diverse 2D and 3D control tasks. Beyond control, we show that LeWM’s latent space encodes meaningful physical structure through probing of physical quantities. Surprise evaluation confirms that the model reliably detects physically implausible events.
<https://arxiv.org/pdf/2603.19312>
贡献者
最近更新
Involution Hell© 2026 byCommunityunderCC BY-NC-SA 4.0
Programmer's Burnout Recovery Guide
A burnout recovery guide for programmers: unpacks the chronic psychological state caused by prolonged stress and emotional exhaustion, covering triggers such as excessive conscientiousness, overwork, and workplace gaslighting (PUA), along with three core principles for recovery. Essential reading for programmers and job seekers experiencing insomnia, brain fog, or self-doubt under sustained high-pressure conditions.
Prompt Repetition Improves Non-Reasoning LLMs
Learn how repeating input prompts boosts accuracy for non-reasoning LLMs like GPT, Gemini, and Claude without extra latency—ideal for AI engineers and researchers.