Bridging human motion modeling, visual world models, and embodied AI for socially intelligent perception and action.
Human motion and activity provide a critical signal for understanding, predicting, and interacting with dynamic environments. In recent years, computer vision has made significant progress in human motion perception, activity understanding, and motion generation, providing a strong foundation for modeling human behavior and dynamics. Integrating these advances into visual world models that support prediction, planning, and decision-making is a key next step toward enabling embodied intelligent systems to reason and act effectively in human-populated scenes.
This workshop focuses on the challenge of integrating rich models of human motion and behavior into world models, an increasingly important yet still under-explored direction in visual world modeling. Topics include: 1. modeling human motion and activity in complex, interactive scenes; 2. learning dynamic world models that incorporate human behavior to represent scenes, objects, and affordances; 3. enabling efficient and robust real-world deployment, including integrated perception-planning, safe navigation and autonomous driving, and improved generalization under noise, occlusions, and distribution shifts.
This topic is closely aligned with recent progress in visual world modeling and generative simulation which aim to capture vision-based representations of the scene structure, object relations, and human dynamics. The workshop will bring together research on human motion and activity modeling, dynamic scene understanding, and world models that explicitly account for human behavior as a central component of the environment. By uniting perspectives from computer vision, embodied AI, robotics, and graphics, the workshop provides a forum to explore human-centered world models that enable socially intelligent perception and action in applications such as dynamic scene understanding, predictive navigation, autonomous driving, and human-robot interaction.
Half-day program
Selected paper oral presentation
Open Q&A with speakers
EPFL, Switzerland
3D localization & trajectory forecasting
EPFL, Switzerland
World models for autonomous driving
EPFL, Switzerland
Motion prediction & 3D perception
KTH / Scania, Sweden
Prediction & planning in traffic
Γrebro University, Sweden
Probabilistic human motion patterns
TU Munich, Germany
Motion prediction & HRI
The Chinese University of Hong Kong, China
HRI & robotic manipulation
TU Munich, Germany
Multi-modal perception & navigation
Γrebro University, Sweden
3D mapping & autonomous systems
Meta Reality Labs, USA
Generative AI for digital humans
Bosch Center for Artificial Intelligence, Germany
Dynamic perception & 3D scene graphs
Bosch Center for Artificial Intelligence, Germany
Predictive navigation & RL