

4th Workshop on Open-World 3D Scene Understanding with Foundation Models
13:45 - 14:00 | Welcome & Introduction |
14:00 - 14:30 | Keynote 1 Jeannette Bohg (Stanford) Challenges and Opportunities of Mobile Manipulation |
14:30 - 15:00 | Keynote 2 Laura Leal-Taixé (NVIDIA) Towards a Foundation Model for 4D Lidar |
15:00 - 15:45 | Coffee Break & Poster Session in Hall D (Level 3, Boards: 452-470) |
15:45 - 16:15 | Keynote 3 Afshin Dehghan (Apple) 3D Scene Intelligence |
16:15 - 16:45 | Keynote 4 Lukas Schmid (MIT) Hierarchical Methods for Task-driven and Dynamic Scene Understanding |
16:45 - 17:15 | Keynote 5 Björn Ommer (LMU) Efficient Repurposing of T2I Representations Across Modalities |
17:15 - 17:45 | Challenge Winner Jaime Corsetti (FBK) Functionality Understanding and Segmentation in 3D Scenes |
17:45 - 18:00 | Concluding Remarks |
Dr. Laura Leal-Taixé is a Senior Research Manager at NVIDIA and also an Adjunct Professor at the Technical University of Munich (TUM), leading the Dynamic Vision and Learning group. From 2018 until 2022, she was a tenure-track professor at TUM. Before that, she spent two years as a postdoctoral researcher at ETH Zurich, Switzerland, and a year as a senior postdoctoral researcher in the Computer Vision Group at the Technical University in Munich. She obtained her PhD from the Leibniz University of Hannover in Germany, spending a year as a visiting scholar at the University of Michigan, Ann Arbor, USA. She pursued B.Sc. and M.Sc. in Telecommunications Engineering at the Technical University of Catalonia (UPC) in her native city of Barcelona. She went to Boston, USA to do her Masters Thesis at Northeastern University with a fellowship from the Vodafone foundation. She is a recipient of the Sofja Kovalevskaja Award of 1.65 million euros in 2017, the Google Faculty Award in 2019, and the ERC Starting Grant in 2021.
Lukas Schmid is a Research Scientist at the MIT SPARK Lab led by Prof. Luca Carlone at the Massachusetts Institute of Technology (MIT). Before, he was a Postdoctoral Fellow at MIT SPARK, and briefly a Postdoctoral Researcher at the Autonomous Systems Lab (ASL) led by Prof. Roland Siegwart at ETH Zürich (ETHZ). He earned his PhD in 2022 from ASL at ETHZ, where he was a visiting researcher at the Microsoft Spatial AI Lab led by Prof. Marc Pollefeys in 2022, and also obtained his M.Sc. in Robotics, Systems, and Control (RSC) in 2019. His work has been recognized by several honors, including RSS Pioneers 2025, the RSS 2024 Outstanding Systems Paper Award, two ETH Medals for outstanding PhD and M.Sc. Theses, the Willi Studer Prize for the best graduate of the year at ETHZ, the first place in the 2024 Hilti SLAM challenge, and a Swiss National Science Foundation (SNSF) Postdoc Fellowship. His research focuses on active perception and understanding of complex, dynamic, human-centric environments for robot autonomy and augmented reality. This includes research on dense geometric and semantic scene representations and abstraction, on detection, prediction, and understanding of moving and changing entities, as well as lifelong learning for continuous improvement and adaptation to the robot environment, embodiment, and human preference.
Jeannette Bohg is an Assistant Professor of Computer Science at Stanford University. She was a group leader at the Autonomous Motion Department (AMD) of the MPI for Intelligent Systems until September 2017. Before joining AMD in January 2012, Jeannette Bohg was a PhD student at the Division of Robotics, Perception and Learning (RPL) at KTH in Stockholm. In her thesis, she proposed novel methods towards multi-modal scene understanding for robotic grasping. She also studied at Chalmers in Gothenburg and at the Technical University in Dresden where she received her Master in Art and Technology and her Diploma in Computer Science, respectively. Her research focuses on perception and learning for autonomous robotic manipulation and grasping. She is specifically interested in developing methods that are goal-directed, real-time and multi-modal such that they can provide meaningful feedback for execution and learning. Jeannette Bohg has received several Early Career and Best Paper awards, most notably the 2019 IEEE Robotics and Automation Society Early Career Award and the 2020 Robotics: Science and Systems Early Career Award.
Björn Ommer is a full professor of computer science at LMU Munich, where he leads the Computer Vision & Learning Group. Before joining LMU, he was a full professor at Heidelberg University and a director at both the Interdisciplinary Center for Scientific Computing (IWR) and the Heidelberg Collaboratory for Image Processing (HCI). He holds a Ph.D. from ETH Zurich, a diploma from University of Bonn, and he was a postdoctoral researcher at UC Berkeley. His research focuses on generative AI, visual understanding, and explainable neural networks. His group developed several influential approaches in generative modeling, such as Stable Diffusion, which has seen broad adoption across academia, industry, and beyond. Björn is a director of the Bavarian AI Council, an ELLIS Fellow, and he has served in senior roles at major conferences such as CVPR, ICCV, ECCV, and NeurIPS. His most recent recognitions include the German AI Prize 2024, the Eduard Rhein Technology Award, and a nomination for the German Future Prize by the President of Germany.
Afshin leads the Multimodal Intelligence group in Hardware Technology at Apple, where he drives critical research and development in multimodal technologies that bridge vision, language, and spatial understanding. His team developed RoomPlan, Apple’s breakthrough 3D parametric room mapping solution, which set a new benchmark in spatial computing by leveraging LiDAR for high-fidelity scene understanding. His group has shipped core 2D and 3D perception technologies across iOS and VisionPro, and is now advancing Apple Intelligence through cutting-edge work in visual foundation models.
Challenge Winners
Our workshop challenge is proudly supported by: