

4th Workshop on Open-World 3D Scene Understanding with Foundation Models
13:45 - 14:00 | Welcome & Introduction |
14:00 - 14:30 | Keynote 1 Jeannette Bohg (Stanford) |
14:30 - 15:00 | Keynote 2 Laura Leal-Taixé (NVIDIA) |
15:00 - 15:45 | Coffee Break & Poster Session in Hall D (Level 3, Boards: 452-470) |
15:45 - 16:15 | Keynote 3 Afshin Dehghan (Apple) |
16:15 - 16:45 | Keynote 4 Lukas Schmid (MIT) |
16:45 - 17:15 | Keynote 5 Björn Ommer (LMU) |
17:15 - 17:45 | Challenge Winners |
17:45 - 18:00 | Concluding Remarks |
Dr. Laura Leal-Taixé is a Senior Research Manager at NVIDIA and also an Adjunct Professor at the Technical University of Munich (TUM), leading the Dynamic Vision and Learning group. From 2018 until 2022, she was a tenure-track professor at TUM. Before that, she spent two years as a postdoctoral researcher at ETH Zurich, Switzerland, and a year as a senior postdoctoral researcher in the Computer Vision Group at the Technical University in Munich. She obtained her PhD from the Leibniz University of Hannover in Germany, spending a year as a visiting scholar at the University of Michigan, Ann Arbor, USA. She pursued B.Sc. and M.Sc. in Telecommunications Engineering at the Technical University of Catalonia (UPC) in her native city of Barcelona. She went to Boston, USA to do her Masters Thesis at Northeastern University with a fellowship from the Vodafone foundation. She is a recipient of the Sofja Kovalevskaja Award of 1.65 million euros in 2017, the Google Faculty Award in 2019, and the ERC Starting Grant in 2021.
Lukas Schmid is a Research Scientist at the MIT SPARK Lab led by Prof. Luca Carlone at the Massachusetts Institute of Technology (MIT). Before, he was a Postdoctoral Fellow at MIT SPARK, and briefly a Postdoctoral Researcher at the Autonomous Systems Lab (ASL) led by Prof. Roland Siegwart at ETH Zürich (ETHZ). He earned his PhD in 2022 from ASL at ETHZ, where he was a visiting researcher at the Microsoft Spatial AI Lab led by Prof. Marc Pollefeys in 2022, and also obtained his M.Sc. in Robotics, Systems, and Control (RSC) in 2019. His work has been recognized by several honors, including RSS Pioneers 2025, the RSS 2024 Outstanding Systems Paper Award, two ETH Medals for outstanding PhD and M.Sc. Theses, the Willi Studer Prize for the best graduate of the year at ETHZ, the first place in the 2024 Hilti SLAM challenge, and a Swiss National Science Foundation (SNSF) Postdoc Fellowship. His research focuses on active perception and understanding of complex, dynamic, human-centric environments for robot autonomy and augmented reality. This includes research on dense geometric and semantic scene representations and abstraction, on detection, prediction, and understanding of moving and changing entities, as well as lifelong learning for continuous improvement and adaptation to the robot environment, embodiment, and human preference.
Jeannette Bohg is an Assistant Professor of Computer Science at Stanford University. She was a group leader at the Autonomous Motion Department (AMD) of the MPI for Intelligent Systems until September 2017. Before joining AMD in January 2012, Jeannette Bohg was a PhD student at the Division of Robotics, Perception and Learning (RPL) at KTH in Stockholm. In her thesis, she proposed novel methods towards multi-modal scene understanding for robotic grasping. She also studied at Chalmers in Gothenburg and at the Technical University in Dresden where she received her Master in Art and Technology and her Diploma in Computer Science, respectively. Her research focuses on perception and learning for autonomous robotic manipulation and grasping. She is specifically interested in developing methods that are goal-directed, real-time and multi-modal such that they can provide meaningful feedback for execution and learning. Jeannette Bohg has received several Early Career and Best Paper awards, most notably the 2019 IEEE Robotics and Automation Society Early Career Award and the 2020 Robotics: Science and Systems Early Career Award.
Björn Ommer is a full professor at LMU where he heads the Computer Vision & Learning Group. He has studied computer science together with physics as a minor subject at the University of Bonn, Germany. After that he pursued his doctoral studies in computer science at ETH Zurich. He received his Ph.D. degree from ETH Zurich for his dissertation “Learning the Compositional Nature of Objects for Visual Recognition” which was awarded the ETH Medal. Thereafter, Björn held a post-doc position in the Computer Vision Group of Jitendra Malik at UC Berkeley. He serves in the Bavarian AI Council, as an associate editor for the journal IEEE T-PAMI, and previously for Pattern Recognition Letters. Björn is an ELLIS Fellow, an ELLIS unit faculty of the ELLIS unit Munich, affiliated with the Helmholtz foundation, and a PI of the Munich Center for Machine Learning (MCML). Björn delivered the opening keynote at NeurIPS’23, was awarded the German AI-Prize 2024, the Technology-Prize of Eduard-Rhein-Foundation 2024, and the work leading to Stable Diffusion has been nominated for the German Future Prize of the President of Germany (“Deutscher Zukunftspreis des Bundespräsidenten für Technik und Innovation”).
Afshin leads the Multimodal Intelligence group in Hardware Technology at Apple, where he drives critical research and development in multimodal technologies that bridge vision, language, and spatial understanding. His team developed RoomPlan, Apple’s breakthrough 3D parametric room mapping solution, which set a new benchmark in spatial computing by leveraging LiDAR for high-fidelity scene understanding. His group has shipped core 2D and 3D perception technologies across iOS and VisionPro, and is now advancing Apple Intelligence through cutting-edge work in visual foundation models.
Challenge Winners
Our workshop challenge is proudly supported by: