☀️ OpenSUN 3D 🌍

1st Workshop on Open-Vocabulary 3D Scene Understanding

in conjunction with ICCV 2023, Paris, France.

Room E06 - Oct. 3rd Tuesday Afternoon

Motivation 💡

The ability to perceive, understand and interact with arbitrary 3D environments is a long-standing goal in both academia and industry with applications in AR/VR as well as robotics. Current 3D scene understanding models are largely limited to recognizing a closed set of pre-defined object classes. Recently, large visual-language models, such as CLIP, have demonstrated impressive capabilities trained solely on internet-scale image-language pairs. Some initial works have shown that these models have the potential to extend 3D scene understanding not only to open set recognition, but also offer additional applications such as affordances, materials, activities, and properties of unseen environments. The goal of this workshop is to bundle these initial siloed efforts and to discuss and establish clear task definitions, evaluation metrics, and benchmark datasets.

Schedule ⏰

13:20 - 13:30 Welcome & Introduction
13:30 - 14:00 Keynote: Jen Jen Chung
14:00 - 14:30 Keynote: Vishal Patel
14:30 - 14:45 Oral Sessions / Challenge Winners
14:45 - 15:15 Keynote: Thomas Funkhouser
15:15 - 16:00 Poster Session & Coffee Break
16:00 - 16:30 Keynote: Angela Dai
16:30 - 17:00 Keynote: Manolis Savva
17:00 - 17:30 Panel Discussion

Invited Speakers 🧑‍🏫

Placeholder image

Professor Vishal Patel

Johns Hopkins University

Vishal M. Patel is an associate professor of electrical and computer engineering and a member of the Vision and Image Understanding Lab. His research interests are focused on computer vision, machine learning, image processing, medical image analysis, and biometrics. Patel is an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence journal and chairs the conference subcommittee of IAPR Technical Committee on Biometrics (TC4). He has received a number of awards including the 2021 IEEE Signal Processing Society (SPS) Pierre-Simon Laplace Early Career Technical Achievement Award, the 2021 NSF CAREER Award, the 2021 IAPR Young Biometrics Investigator Award (YBIA), the 2016 ONR Young Investigator Award, and the 2016 Jimmy Lin Award for Invention.

Placeholder image

Professor Angela Dai

Technical University of Munich

Angela Dai is an assistant professor at the Technical University of Munich (TUM) where she leads the 3D AI Lab. Her research focuses on understanding how the 3D world around us can be modeled and semantically understood. Prof. Dai is the creator of the seminal ScanNet benchmark that sparked the development of numerous 3D scene understanding works.

Placeholder image

Professor Manolis Savva

Simon Fraser University

Manolis Savva is an assistant professor in the School of Computing Science at Simon Fraser University, and a Canada Research Chair in Computer Graphics. His research focuses on analysis, organization and generation of 3D content. The methods that he works on are stepping stones towards holistic 3D scene understanding revolving around people, with applications in computer graphics, computer vision, and robotics. Prof. Savva contributed highly influential works towards embodied AI including Matterport and Habitat.

Placeholder image

Professor Thomas Funkhouser

Google / Princeton University

Thomas Funkhouser is a full professor at Princeton University and a senior research scientist at Google. His research focuses on computer graphics, computer vision, and in particular 3D machine perception. In recent years, Professor Funkhouser has greatly impacted the field of 3D scene understanding.

Placeholder image

Professor Jen Jen Chung

University of Queensland

Jen Jen Chung is an associate professor in Mechatronics within the School of Information Technology and Electrical Engineering at the University of Queensland. Her current research interests include perception, planning and learning for robotic mobile manipulation, algorithms for robot navigation through human crowds, informative path planning and adaptive sampling.

Important Dates 🗓️

  • Paper Track: We accept novel full 8-page papers for publication in the proceedings, and shorter 4-page extended abstracts of either novel or previously published work that will not be included in the proceedings. All submissions shall follow the ICCV 2023 author guidelines.
    • Submission Portal: CMT
    • Paper Submission Deadline: July 31, 2023 (23:59 Pacific Time)
    • Notification to Authors: August 9, 2023
    • Camera-ready submission: August 21, 2023
  • Challenge Track:
    • Submission Portal: EvalAI
    • Data Instructions & Helper Scripts: GitHub
    • Dev Phase Start: July 13, 2023
    • Submission Portal Start: July 17, 2023
    • Test Phase Start: August 16, 2023
    • Test Phase End: September 30, 2023 (Winners are decided on this date)

Accepted Papers 📄

CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition
Deepti B. Hegde, Jeya Maria Jose Valanarasu, Vishal Patel

CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP
Junbo Zhang, Runpei Dong, Kaisheng Ma

The Change You Want to See (Now in 3D)
Ragav Sachdeva, Andrew Zisserman

Learning to Prompt CLIP for Monocular Depth Estimation: Exploring the Limits of Human Language
Dylan Auty, Krystian Mikolajczyk

SAM3D: Segment Anything in 3D Scenes
Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu

POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
Antonin Vobecky, Oriane Siméoni, David Hurych, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Josef Sivic

OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Shiyang Lu, Haonan Chang, Eric P. Jing, Yu Wu, Abdeslam Boularias, Kostas Bekris

Challenge Results

We have published a technical report providing an overview of our workshop challenge, results, and the methods of the winning teams!

Top-3 ranking teams from our workshop challenge are listed below:

Rank Team Method mAP (↑)  AP_50 (↑)  AP_25 (↑) 

Hongbo Tian1,2, Chunjie Wang1, Xiaosheng Yan1, Bingwen Wang1, Xuanyang Zhang1, Xiao Liu1

1PICO, ByteDance, Beijing      2Beijing University of Posts and Telecommunications
- 6.08 14.08 17.67
2 VinAI-3DIS

Phuc Nguyen1, Khoi Nguyen1, Anh Tran1, Cuong Pham1

1VinAI Research
GitHub 4.13 12.14 39.41

Zhening Huang1, Xiaoyang Wu2, Xi Chen2, Hengshuang Zhao2, Lei Zhu3, Joan Lasenby1

1University of Cambridge      2HKU      3HKUST (Guangzhou)
- 2.67 5.06 13.98


Program Comittee

  • Alexander Hermans (RWTH)
  • Alexey Nekrasov (RWTH)
  • Ayush Jain (CMU)
  • Dávid Rozenberszki (TUM)
  • Francis Engelmann (ETH)
  • Ji Hou (Meta)
  • Jonas Schult (RWTH)
  • Or Litany (NVIDIA)
  • Songyou Peng (ETH)
  • Yujin Chen (TUM)
<<<<<<< HEAD ======= >>>>>>> ea8bf6c040240009a1fc8aa2bb4f6c61c31c5eee