β˜€οΈ OpenSUN 3D 🌍

2nd Workshop on Open-Vocabulary 3D Scene Understanding

in conjunction with CVPR 2024, Seattle, USA.

June 18 Tuesday (1:30 PM – 5:30 PM) in Arch 211

Motivation πŸ’‘

The ability to perceive, understand and interact with arbitrary 3D environments is a long-standing goal in both academia and industry with applications in AR/VR as well as robotics. Current 3D scene understanding models are largely limited to recognizing a closed set of pre-defined object classes. Recently, large visual-language models, such as CLIP, have demonstrated impressive capabilities trained solely on internet-scale image-language pairs. Some initial works have shown that these models have the potential to extend 3D scene understanding not only to open set recognition, but also offer additional applications such as affordances, materials, activities, and properties of unseen environments. The goal of this workshop is to bundle these siloed efforts and to discuss and establish clear task definitions, evaluation metrics, and benchmark datasets.

Schedule ⏰ (tentative)

13:30 - 13:45 Welcome & Introduction
13:45 - 14:15 Keynote 1 Kristen Grauman (Uni. of Texas at Austin)
14:15 - 14:45 Keynote 2 Chung Min Kim, Justin Kerr (UC Berkeley)
14:45 - 15:00 Winner Presentations Track 1: VinAI-3DIS Track 2: PICO-MR (2024)
15:00 - 15:45 Poster Session & Coffee Break
15:45 - 16:15 Keynote 3 Jiajun Wu (Stanford University)
16:15 - 16:45 Keynote 4 Dave Gausebeck (Matterport)
16:45 - 17:00 Closing

Challenge πŸš€

This year, our challenge will consist of two tracks, open-vocabulary 3D object instance search and open-vocabulary 3D functionality grounding.
  • Challenge Track 1: Open-vocabulary 3D Object Instance Search
    • Submission Portal: EvalAI
    • Data Instructions & Helper Scripts: April 17, 2024
    • Dev Phase Start: April 17, 2024
    • Submission Portal Start: April 19, 2024
    • Test Phase Start: May 1, 2024
    • Test Phase End: June 14, 2024 (14:00 Pacific Time)
  • Challenge Track 2: Open-vocabulary 3D Functionality Grounding
    • Submission Portal: EvalAI
    • Data Instructions & Helper Scripts: April 17, 2024
    • Dev Phase Start: April 17, 2024
    • Submission Portal Start: April 19, 2024
    • Test Phase Start: May 4, 2024
    • Test Phase End: June 14, 2024 (14:00 Pacific Time)

Please check this page out for an overview of last year's challenge results. We have also published a technical report providing an overview of our ICCV 2023 workshop challenge.

Our workshop challenge is proudly supported by:

Invited Speakers πŸ§‘β€πŸ«

Kristen Grauman

University of Texas at Austin

Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Director in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on video, visual recognition, and action for perception or embodied AI. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an IEEE Fellow, AAAI Fellow, Sloan Fellow, a Microsoft Research New Faculty Fellow, and a recipient of NSF CAREER and ONR Young Investigator awards, the PAMI Young Researcher Award in 2013, the 2013 Computers and Thought Award from the International Joint Conference on Artificial Intelligence (IJCAI), the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2013. She was inducted into the UT Academy of Distinguished Teachers in 2017. She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award).

Jiajun Wu

Stanford University

Jiajun Wu is an Assistant Professor of Computer Science at Stanford University, working on computer vision, machine learning, and computational cognitive science. Before joining Stanford, he was a Visiting Faculty Researcher at Google Research. He received his PhD in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. Wu's research has been recognized through the Young Investigator Programs (YIP) by ONR and by AFOSR, paper awards and finalists at ICCV, CVPR, SIGGRAPH Asia, CoRL, and IROS, dissertation awards from ACM, AAAI, and MIT, the 2020 Samsung AI Researcher of the Year, and faculty research awards from J.P. Morgan, Samsung, Amazon, and Meta.

Chung Min Kim

University of California, Berkeley

Chung Min Kim is a PhD student at UC Berkeley, where she is advised by Ken Goldberg and Angjoo Kanazawa. She received her dual B.S. degree in EECS (Electrical Engineering and Computer Science) and Mechanical Engineering from UC Berkeley in 2021. She is currently funded by the NSF GRFP. Her research interests include 3D scene understanding for computer vision and robotics. In particular, she is interested in modeling multi-scale semantics with 3D, using large vision-language models. Her goal is to apply these models to robots in the real world, which is challenging due to lack of structure and large variability in the real world.

Justin Kerr

University of California, Berkeley

Justin Kerr is a PhD student at UC Berkeley co-advised by Ken Goldberg and Angjoo Kanazawa working primarily on NeRF for robot manipulation, 3D scene understanding, and visuo-tactile representation learning. Recently Justin is interested in leveraging NeRF for language grounding, and how it could change how we interact with 3D. His work is supported by the NSF GRFP. Previously he finished my bachelor's at CMU where he worked with Howie Choset on multi-agent path planning, and spent time at Berkshire Grey and NASA's JPL.

Dave Gausebeck

Matterport

Dave Gausebeck is the co-founder and Chief Scientist Officer of Matterport, leading the technological research and operations at Matterport. As one of Matterport's founders, Dave developed much of the computer vision technology that makes Matterport tick. He continues to develop and improve those algorithms alongside a great team of vision researchers and software engineers. Dave was at PayPal when the whole company fit into a single conference room, building core back-end and security systems as well as developing the first commercial implementation of a CAPTCHA. Dave earned a BS in Computer Science from the University of Illinois at Urbana-Champaign.

Related Works πŸ§‘β€πŸ€

Below is a collection of concurrent and related works in the field of open-set 3D scene understanding. Please feel free to get in touch to add other works as well. and many more ...

Important Dates πŸ—“οΈ

Paper Track: We accept novel full 8-page papers for publication in the proceedings, and either shorter 4-page extended abstracts or 8-page papers of novel or previously published work that will not be included in the proceedings. All submissions have to follow the CVPR 2024 author guidelines.

Accepted Papers πŸ“„

AffordanceLLM: Grounding Affordance from Vision Language Models
Shengyi Qian, Weifeng Chen, Min Bai, Xiong Zhou, Zhuowen Tu, Li Erran Li

Zero-shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation
Tri Ton, Ji Woo Hong, SooHwan Eom, Jun Yeop Shim, Junyeong Kim, Chang D. Yoo

Auto-Vocabulary Segmentation for LiDAR Points
Weijie Wei, Osman Ülger, Fatemeh Karimi Nejadasl, Theo Gevers, Martin R. Oswald

Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships
Sebastian Koch, Narunas Vaskevicius, Mirco Colosi, Pedro Hermosilla Casajus, Timo Ropinski

ODIN: A Single Model for 2D and 3D Segmentation
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding
Kumaraditya Gupta, Rohit Jayanti, Yash Mehan, Anirudh Govil, Sourav Garg, Madhava Krishna

Situational Awareness Matters in 3D Vision Language Reasoning
Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Taisei Hanyu, Kashu Yamazaki, Benjamin R Runkle, Ngan Le

Organizers