OpenSUN3D

4th Workshop on Open-World 3D Scene Understanding with Foundation Models

in conjunction with CVPR June 12 (afternoon ), 2025 in Nashville, USA.

Introduction

The ability to perceive, understand and interact with arbitrary 3D environments is a long-standing goal in research with applications in AR/VR, health, robotics and so on. Current 3D scene understanding models are largely limited to low-level recognition tasks such as object detection or semantic segmentation, and do not generalize well beyond the a pre-defined set of training labels. More recently, large visual-language models (VLM), such as CLIP, have demonstrated impressive capabilities trained solely on internet-scale image-language pairs. Some initial works have shown that these models have the potential to extend 3D scene understanding not only to open set recognition, but also offer additional applications such as affordances, materials, activities, and properties of unseen environments. The goal of this workshop is to bundle these efforts and to discuss and establish clear task definitions, evaluation metrics, and benchmark datasets.

Schedule

13:45 - 14:00 Welcome & Introduction
14:00 - 14:30 Keynote 1      Jeannette Bohg (Stanford)
14:30 - 15:00 Keynote 2      Laura Leal-Taixé (NVIDIA)
15:00 - 15:45 Poster Session & Coffee Break
15:45 - 16:15 Keynote 3      Afshin Dehghan (Apple)
16:15 - 16:45 Keynote 4      Lukas Schmit (MIT)
16:45 - 17:15 Keynote 5      Vasileios Balntas (NTUA)
17:15 - 17:45 Challenge Winners
17:45 - 18:00 Concluding Remarks

Keynote Speakers


Dr. Laura Leal-Taixé is a Senior Research Manager at NVIDIA and also an Adjunct Professor at the Technical University of Munich (TUM), leading the Dynamic Vision and Learning group. From 2018 until 2022, she was a tenure-track professor at TUM. Before that, she spent two years as a postdoctoral researcher at ETH Zurich, Switzerland, and a year as a senior postdoctoral researcher in the Computer Vision Group at the Technical University in Munich. She obtained her PhD from the Leibniz University of Hannover in Germany, spending a year as a visiting scholar at the University of Michigan, Ann Arbor, USA. She pursued B.Sc. and M.Sc. in Telecommunications Engineering at the Technical University of Catalonia (UPC) in her native city of Barcelona. She went to Boston, USA to do her Masters Thesis at Northeastern University with a fellowship from the Vodafone foundation. She is a recipient of the Sofja Kovalevskaja Award of 1.65 million euros in 2017, the Google Faculty Award in 2019, and the ERC Starting Grant in 2021.


Lukas Schmid is a Research Scientist at the MIT SPARK Lab led by Prof. Luca Carlone at the Massachusetts Institute of Technology (MIT). Before, he was a Postdoctoral Fellow at MIT SPARK, and briefly a Postdoctoral Researcher at the Autonomous Systems Lab (ASL) led by Prof. Roland Siegwart at ETH Zürich (ETHZ). He earned his PhD in 2022 from ASL at ETHZ, where he was a visiting researcher at the Microsoft Spatial AI Lab led by Prof. Marc Pollefeys in 2022, and also obtained his M.Sc. in Robotics, Systems, and Control (RSC) in 2019. His work has been recognized by several honors, including RSS Pioneers 2025, the RSS 2024 Outstanding Systems Paper Award, two ETH Medals for outstanding PhD and M.Sc. Theses, the Willi Studer Prize for the best graduate of the year at ETHZ, the first place in the 2024 Hilti SLAM challenge, and a Swiss National Science Foundation (SNSF) Postdoc Fellowship. His research focuses on active perception and understanding of complex, dynamic, human-centric environments for robot autonomy and augmented reality. This includes research on dense geometric and semantic scene representations and abstraction, on detection, prediction, and understanding of moving and changing entities, as well as lifelong learning for continuous improvement and adaptation to the robot environment, embodiment, and human preference.


Jeannette Bohg is an Assistant Professor of Computer Science at Stanford University. She was a group leader at the Autonomous Motion Department (AMD) of the MPI for Intelligent Systems until September 2017. Before joining AMD in January 2012, Jeannette Bohg was a PhD student at the Division of Robotics, Perception and Learning (RPL) at KTH in Stockholm. In her thesis, she proposed novel methods towards multi-modal scene understanding for robotic grasping. She also studied at Chalmers in Gothenburg and at the Technical University in Dresden where she received her Master in Art and Technology and her Diploma in Computer Science, respectively. Her research focuses on perception and learning for autonomous robotic manipulation and grasping. She is specifically interested in developing methods that are goal-directed, real-time and multi-modal such that they can provide meaningful feedback for execution and learning. Jeannette Bohg has received several Early Career and Best Paper awards, most notably the 2019 IEEE Robotics and Automation Society Early Career Award and the 2020 Robotics: Science and Systems Early Career Award.



Challenge

This year, we host a challenge on the SceneFun3D benchmark which focuses on fine-grained functionality and affordance understanding in 3D indoor environments. It consists of two tracks, functionality segmentation and open-vocabulary 3D affordance grounding. Below, you will find key resources and important dates.

Our workshop challenge is proudly supported by:

Poster Presentations

All paper submission that are accepted will be presented as posters during the workshop. Please follow the official CVPR 2025 guidelines for poster preparation. Note the early bird poster printing deadline of May 25, 2025.

Paper Track

We invite 8-page full papers for inclusion in the proceedings, as well as 4-page extended abstracts. Extended abstracts may present either new or previously published work but will not be included in the proceedings. 4-page extended abstracts generally do not conflict with the dual submission policies of other conferences, whereas 8-page full papers, if accepted, will be part of the proceedings and are therefor subject to the dual submission policy (i.e., they cannot be under review for another conference at the same time or already accepted at another conference). All submissions should be anonymous and follow the official CVPR 2025 guidelines.

Organizers