Open☀️3D

Introduction

The ability to perceive, understand and interact with 3D scenes is a long-standing goal in research with applications in AR/VR, health, robotics and so on. Current 3D scene understanding models are largely limited to low-level recognition tasks such as object detection or semantic segmentation, and do not generalize well beyond the a pre-defined set of training labels. More recently, large visual-language models (VLM), such as CLIP, have demonstrated impressive capabilities trained solely on internet-scale image-language pairs. Some initial works have shown that these models have the potential to extend 3D scene understanding not only to open set recognition, but also offer additional applications such as affordances, materials, activities, and properties of unseen environments. The goal of this workshop is to bundle these efforts and to discuss and establish clear task definitions, evaluation metrics, and benchmark datasets.

Schedule (Room 306 B)

13:30 - 13:40	Welcome & Introduction
13:40 - 14:20	Keynote 1 Mackenzie W. Mathis (EPFL)
14:20 - 15:00	Keynote 2 Saining Xie (NYU)
15:00 - 16:00	Coffee Break & Poster Session (Hall II, Boards 148-157)
16:00 - 16:40	Keynote 3 Julian Straub (Meta RLR)
16:40 - 17:20	Keynote 4 Angel X. Chang (SFU)
17:20 - 18:00	Keynote 5 Iro Armeni (Stanford)
18:00 - 18:10	Concluding Remarks

Keynote Speakers

Mackenzie W. Mathis

Assistant Professor at EPFL

Prof. Mackenzie W. Mathis is the Bertarelli Foundation Chair of Integrative Neuroscience and an Assistant Professor at the Swiss Federal Institute of Technology, Lausanne (EPFL). Following the award of her PhD at Harvard University in 2017 with Prof. Naoshige Uchida, she was awarded the prestigious Rowland Fellowship at Harvard to start her independent laboratory (2017-2020). Before starting her group, she worked with Prof. Matthias Bethge at the University of Tübingen in the summer of 2017 with the support of the Women & the Brain Project ALS Fellowship. She is an ELLIS Scholar, Vallee Scholar, a former NSF Graduate Fellow, and her work has been featured in the news at Bloomberg BusinessWeek, Nature, and The Atlantic. She was awarded the FENS EJN Young Investigator Prize 2022, the Eric Kandel Young Neuroscientist Prize in 2023, The Robert Bing 2024 Prize, and the National Swiss Science Prize Latsis 2024.

Saining Xie

Assistant Professor at New York University

Saining Xie is an Assistant Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University and is affiliated with NYU Center for Data Science. Before joining NYU in 2023, he was a research scientist at FAIR, Meta. In 2018, he received his Ph.D. degree in computer science from the University of California San Diego. He works in computer vision and machine learning, with a particular interest in scalable visual representation learning. His contributions have been recognized with several honors, including the Marr Prize Honorable Mention, CVPR Best Paper Finalists, AISTATS Test of Time Award, Amazon Research Award, NSF CAREER Award, and the PAMI Young Researcher Award.

Julian Straub

Spatial AI Research Scientist at Meta Reality Labs Research

Julian Straub is a Lead Spatial AI Research Scientist at Meta Reality Labs Research (RLR) working on Computer Vision and 3D Perception. Before joining RLR, Julian obtained his PhD on Nonparametric Directional Perception from MIT, where he was advised by John W. Fisher III and John Leonard within the CS and AI Laboratory (CSAIL). On his way to MIT, Julian graduated from the Technische Universität München (TUM) and the Georgia Institute of Technology with a M.Sc. He did his Diploma thesis in Eckehard Steinbach’s group with the NavVis founding team and in particular with Sebastian Hilsenbeck. At Georgia Tech Julian had the pleasure to work with Frank Daellart’s group.

Angel X. Chang

Associate Professor at Simon Fraser University

Angel Change is an Associate Professor at Simon Fraser University. Prior to this, she was a visiting research scientist at Facebook AI Research and a research scientist at Eloquent Labs working on dialogue. She received her Ph.D. in Computer Science from Stanford, where she was part of the Natural Language Processing Group and advised by Chris Manning. Her research focuses on connecting language to 3D representations of shapes and scenes and grounding of language for embodied agents in indoor environments. She has worked on methods for synthesizing 3D scenes and shapes from natural language, and various datasets for 3D scene understanding. In general, she us interested in the semantics of shapes and scenes, the representation and acquisition of common sense knowledge, and reasoning using probabilistic models. Her group also works on using machine learning for biodiversity monitoring, specifically with DNA barcodes as part of the larger BIOSCAN project.

Iro Armeni

Assistant Professor at Stanford University

Iro Armeni is an Assistant Professor at Stanford University. Her area of focus is on developing quantitative and data-driven methods that learn from real-world visual data to generate, predict, and simulate new or renewed built environments that place the human in the center. Her goal is to create sustainable, inclusive, and adaptive built environments that can support our current and future physical and digital needs. As part of her research vision, she is particularly interested in creating spaces that blend from the 100% physical (real reality) to the 100% digital (virtual reality) and anything in between, with the use of Mixed Reality.

Challenge

This year, we host a challenge based on the SceneFun3D benchmark and the Articulate3D dataset. The challenge focuses on fine-grained functionality, affordance, and interaction understanding in 3D indoor environments. It consists of three tracks:

🏆 Prize per challenge winner team: €1,000

Track 1: Functionality Segmentation [SceneFun3D] Benchmark
Track 2: Open-Vocabulary 3D Affordance Grounding [SceneFun3D] Benchmark
Track 3: Interaction Understanding [Articulate3D]

Below are key resources and important dates for each track.

Tracks 1 & 2 [SceneFun3D]

SceneFun3D Documentation
Benchmark Submission Portal
Benchmark Submission Instructions
Submission Opens: June 25, 2025
Submission Deadline: October 14, 2025

Track 3 [Articulate3D]

Track 3 Documentation
Articulate3D
Track 3 Submission Portal
Submission Opens: August 1, 2025
Submission Deadline: October 14, 2025

Our workshop challenge is proudly supported by:

Poster Presentations

All paper submission that are accepted have to option to be presented as posters during the workshop.

For poster presenters: Please use poster boards 148-157 in Hall D (Level 3) to put your poster.

Paper Track

We invite 8-page full papers for inclusion in the proceedings, as well as 4-page extended abstracts. Extended abstracts may present either new or previously published work but will not be included in the proceedings. 4-page extended abstracts generally do not conflict with the dual submission policies of other conferences, whereas 8-page full papers, if accepted, will be part of the proceedings and are therefore subject to the dual submission policy (i.e., they cannot be under review for another conference at the same time or already accepted at another conference). All submissions should be anonymous and follow the official ICCV 2025 guidelines.

Submission portal: OpenReview
Paper submission opens: June 23, 2025

SUBMISSION DEADLINE 1 - Full Papers in Proceedings

Submission deadline 1 (full-papers in proceedings) : July 4, 2025 (23:59 GMT) (Countdown: )
Notification to authors (full-papers in proceedings): July 10, 2025
Camera ready submission: August 10, 2025

SUBMISSION DEADLINE 2 - Extended Abstracts and Non-Archival Full Papers (No Proceedings!)

Submission deadline 2 : September 1, 2025 (23:59 GMT) (Countdown: )
Notification to authors: September 15, 2025

Accepted Papers (Proceedings)

Open-Ended 3D Point Cloud Instance Segmentation (Poster #148)
Phuc Nguyen, Minh Luu, Anh Tuan Tran, Cuong Pham, Khoi Nguyen
Sparse Multiview Open-Vocabulary 3D Detection (Poster #149)
Olivier Moliner, Viktor Larsson, Kalle Åström

Extended Abstracts and Non-Archival Full Papers

RefAV: Towards Planning-Centric Scenario Mining (Poster #150)
Cainan Davidson, Deva Ramanan, Neehar Peri
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos (Poster #151)
Marvin Heidinger, Snehal Jauhri, Vignesh Prasad, Georgia Chalvatzaki
Learning 3D Scene Analogies with Neural Contextual Scene Maps (Poster #152)
Junho Kim, Gwangtak Bae, Eun Sun Lee, Young Min Kim
Text-Embedded 3DGS as Enhanced Language Embeddings (Poster #153)
Dahye Lee, Kim Yu-Ji, GeonU Kim, Jaesung Choe, Tae-Hyun Oh
PanSt3R: Multi-View Consistent Panoptic Segmentation (Poster #154)
Lojze Zust, Yohann Cabon, Juliette Marrie, Leonid Antsfeld, Boris Chidlovskii, Jerome Revaud, Gabriela Csurka
BlendCLIP: Bridging Synthetic and Real Domains for Open-Set 3D Object Classification with Multimodal Pretraining (Poster #155)
Ajinkya Khoche, Gergő László Nagy, Maciej Wozniak, Thomas Gustafsson, Patric Jensfelt
VISUAL-FF: Visual and Spatial Uncertainties in Any-Scene via Learned Feature Fields (Poster #156)
Christian Maurer, Snehal Jauhri, Sophie Lueth, Georgia Chalvatzaki
Shape-Based Features Complement CLIP Features And Features Learned from Voxels in 3D Object Classification (Poster #157)
Zhi Ji, Michael Guerzhoy