Bridging human motion modeling, visual world models, and embodied AI for socially intelligent perception and action.
Human motion and activity provide a critical signal for understanding, predicting, and interacting with dynamic environments. In recent years, computer vision has made significant progress in human motion perception, activity understanding, and motion generation, providing a strong foundation for modeling human behavior and dynamics. Integrating these advances into visual world models that support prediction, planning, and decision-making is a key next step toward enabling embodied intelligent systems to reason and act effectively in human-populated scenes.
This workshop focuses on the challenge of integrating rich models of human motion and behavior into world models, an increasingly important yet still under-explored direction in visual world modeling. Topics include: 1. modeling human motion and activity in complex, interactive scenes; 2. learning dynamic world models that incorporate human behavior to represent scenes, objects, and affordances; 3. enabling efficient and robust real-world deployment, including integrated perception-planning, safe navigation and autonomous driving, and improved generalization under noise, occlusions, and distribution shifts.
This topic is closely aligned with recent progress in visual world modeling and generative simulation which aim to capture vision-based representations of the scene structure, object relations, and human dynamics. The workshop will bring together research on human motion and activity modeling, dynamic scene understanding, and world models that explicitly account for human behavior as a central component of the environment. By uniting perspectives from computer vision, embodied AI, robotics, and graphics, the workshop provides a forum to explore human-centered world models that enable socially intelligent perception and action in applications such as dynamic scene understanding, predictive navigation, autonomous driving, and human-robot interaction.
We welcome both archival and non-archival submissions. All submissions will use the official ECCV format, up to 7 pages excluding references.
Original, unpublished work. Accepted archival papers will be included in the official ECCV 2026 Workshop Proceedings.
Work-in-progress, preliminary results, or relevant work previously published. These will be presented at the workshop but will not appear in the proceedings.
All submissions must use the official ECCV 2026 LaTeX template and follow the ECCV submission guidelines. The review process is double-blind: author identities will not be visible to reviewers, and reviewer identities will not be visible to authors. Please ensure your manuscript is properly anonymized.
All papers should be submitted through OpenReview. Please select "Archival" or "Non-Archival" as the submission type when submitting your paper.




Half-day program
Selected paper oral presentation
Open Q&A with speakers

EPFL, Switzerland
3D localization & trajectory forecasting

EPFL, Switzerland
World models for autonomous driving

EPFL, Switzerland
Motion prediction & 3D perception

KTH / Scania, Sweden
Prediction & planning in traffic

Örebro University, Sweden
Spatio-temporal data modeling

Postdoc, TU Munich, Germany
Motion prediction & HRI

Postdoc, The Chinese University of Hong Kong, China
Robotic manipulation

Full Professor & MIRMI Deputy Director, TU Munich, Germany
Multi-modal perception & navigation

Full Professor in computer science, Örebro University, Sweden
3D mapping & autonomous systems

Senior Research Scientist, Meta Reality Labs, USA
Generative AI for digital humans

Senior research scientist, Bosch Center for Artificial Intelligence, Germany
Dynamic perception & 3D scene graphs

Group leader and research scientist, Bosch Center for Artificial Intelligence, Germany
Predictive navigation & RL