OpenSUN3D

5th Workshop on Open-World 3D Scene Understanding with Foundation Models

in conjunction with ICCV, October 19, (afternoon) in Honolulu, USA.

Introduction

The ability to perceive, understand and interact with 3D scenes is a long-standing goal in research with applications in AR/VR, health, robotics and so on. Current 3D scene understanding models are largely limited to low-level recognition tasks such as object detection or semantic segmentation, and do not generalize well beyond the a pre-defined set of training labels. More recently, large visual-language models (VLM), such as CLIP, have demonstrated impressive capabilities trained solely on internet-scale image-language pairs. Some initial works have shown that these models have the potential to extend 3D scene understanding not only to open set recognition, but also offer additional applications such as affordances, materials, activities, and properties of unseen environments. The goal of this workshop is to bundle these efforts and to discuss and establish clear task definitions, evaluation metrics, and benchmark datasets.

Schedule

13:30 - 13:40 Welcome & Introduction
13:40 - 14:20 Keynote 1      Mackenzie W. Mathis (EPFL)
14:20 - 15:00 Keynote 2      Saining Xie (NYU)     
15:00 - 16:00 Coffee Break & Poster Session
16:00 - 16:40 Keynote 3      Julian Straub (Meta RLR)     
16:40 - 17:20 Keynote 4      Angel X. Chang (SFU)     
17:20 - 18:00 Keynote 5      Iro Armeni (Stanford)     
18:00 - 18:10 Concluding Remarks

Keynote Speakers

Prof. Mackenzie W. Mathis is the Bertarelli Foundation Chair of Integrative Neuroscience and an Assistant Professor at the Swiss Federal Institute of Technology, Lausanne (EPFL). Following the award of her PhD at Harvard University in 2017 with Prof. Naoshige Uchida, she was awarded the prestigious Rowland Fellowship at Harvard to start her independent laboratory (2017-2020). Before starting her group, she worked with Prof. Matthias Bethge at the University of Tübingen in the summer of 2017 with the support of the Women & the Brain Project ALS Fellowship. She is an ELLIS Scholar, Vallee Scholar, a former NSF Graduate Fellow, and her work has been featured in the news at Bloomberg BusinessWeek, Nature, and The Atlantic. She was awarded the FENS EJN Young Investigator Prize 2022, the Eric Kandel Young Neuroscientist Prize in 2023, The Robert Bing 2024 Prize, and the National Swiss Science Prize Latsis 2024.


Saining Xie is an Assistant Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University and is affiliated with NYU Center for Data Science. Before joining NYU in 2023, he was a research scientist at FAIR, Meta. In 2018, he received his Ph.D. degree in computer science from the University of California San Diego. He works in computer vision and machine learning, with a particular interest in scalable visual representation learning. His contributions have been recognized with several honors, including the Marr Prize Honorable Mention, CVPR Best Paper Finalists, AISTATS Test of Time Award, Amazon Research Award, NSF CAREER Award, and the PAMI Young Researcher Award.


Julian Straub is a Lead Spatial AI Research Scientist at Meta Reality Labs Research (RLR) working on Computer Vision and 3D Perception. Before joining RLR, Julian obtained his PhD on Nonparametric Directional Perception from MIT, where he was advised by John W. Fisher III and John Leonard within the CS and AI Laboratory (CSAIL). On his way to MIT, Julian graduated from the Technische Universität München (TUM) and the Georgia Institute of Technology with a M.Sc. He did his Diploma thesis in Eckehard Steinbach’s group with the NavVis founding team and in particular with Sebastian Hilsenbeck. At Georgia Tech Julian had the pleasure to work with Frank Daellart’s group.


Angel Change is an Associate Professor at Simon Fraser University. Prior to this, she was a visiting research scientist at Facebook AI Research and a research scientist at Eloquent Labs working on dialogue. She received her Ph.D. in Computer Science from Stanford, where she was part of the Natural Language Processing Group and advised by Chris Manning. Her research focuses on connecting language to 3D representations of shapes and scenes and grounding of language for embodied agents in indoor environments. She has worked on methods for synthesizing 3D scenes and shapes from natural language, and various datasets for 3D scene understanding. In general, she us interested in the semantics of shapes and scenes, the representation and acquisition of common sense knowledge, and reasoning using probabilistic models. Her group also works on using machine learning for biodiversity monitoring, specifically with DNA barcodes as part of the larger BIOSCAN project.


Iro Armeni is an Assistant Professor at Stanford University. Her area of focus is on developing quantitative and data-driven methods that learn from real-world visual data to generate, predict, and simulate new or renewed built environments that place the human in the center. Her goal is to create sustainable, inclusive, and adaptive built environments that can support our current and future physical and digital needs. As part of her research vision, she is particularly interested in creating spaces that blend from the 100% physical (real reality) to the 100% digital (virtual reality) and anything in between, with the use of Mixed Reality.

Challenge

This year, we host a challenge based on the SceneFun3D benchmark and the Articulate3D dataset. The challenge focuses on fine-grained functionality, affordance, and interaction understanding in 3D indoor environments. It consists of three tracks:

🏆 Prize per challenge winner team: €1,000

Below are key resources and important dates for each track.

Tracks 1 & 2 [SceneFun3D]

Track 3 [Articulate3D]

Our workshop challenge is proudly supported by:

Paper Track

We invite 8-page full papers for inclusion in the proceedings, as well as 4-page extended abstracts. Extended abstracts may present either new or previously published work but will not be included in the proceedings. 4-page extended abstracts generally do not conflict with the dual submission policies of other conferences, whereas 8-page full papers, if accepted, will be part of the proceedings and are therefore subject to the dual submission policy (i.e., they cannot be under review for another conference at the same time or already accepted at another conference). All submissions should be anonymous and follow the official ICCV 2025 guidelines.

Organizers