Hang Zhao

Selected Projects

G0.5: One Autoregressive Stream for Reasoning and Action
Galaxea Team
Technical Report 2026
"A unified VLA that reasons, plans, and acts in one autoregressive token stream."
Report Project

Fast-WAM: Do World Action Models Need Test-time Future Imagination?
Tianyuan Yuan, Zibin Dong, Yicheng Liu, Hang Zhao
Preprint 2026
"Video co-training without test-time future imagination: a real-time world action model at 190 ms latency, 4x+ faster."

Paper Code Project

Deep Whole-body Parkour
Ziwen Zhuang*, Shaoting Zhu*, Mengjie Zhao, Hang Zhao
In Submission
Paper Project
Humanoid Parkour Learning
Ziwen Zhuang, Shenzhe Yao, Hang Zhao
CoRL 2024
Paper Project

SLAM-Former: Putting SLAM into One Transformer
Yijun Yuan, Zhuoguang Chen, Kenan Li, Weibang Wang, Hang Zhao
Preprint 2025
"SLAM in the new age: frontend and backend in one transformer."

Paper Code Project

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong Zhao, Kun Zhan
Peng Jia, Xianpeng Lang, Hang Zhao
CoRL 2024
"Slow-Fast Dual System for autnomous driving!"
Paper Project

Latent Consistency Models: Synthesizing High-Resolution Images With Few-Step Inference
Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, Hang Zhao
"Generating high-resolution images in only 2-4 steps!"
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolinário Passos,
Longbo Huang, Jian Li, Hang Zhao
"Accelerating your LoRA model by 5x without training!"
LCM Paper LCM-LoRA Report Demo Project

Robot Parkour Learning
Ziwen Zhuang, Zipeng Fu, Jianren Wang, Christopher G Atkeson, Sören Schwertfeger,
Chelsea Finn, Hang Zhao
CoRL 2023 Oral Best System Paper Finalist (Top 3)
"Robot parkour skills empowered by onboard vision and a neural network!"

Paper Code Project

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Simian Luo, Chuanhao Yan, Chenxu Hu, Hang Zhao
NeurIPS 2023

Paper Project

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving
Xiaoyu Tian, Tao Jiang, Longfei Yun, Yucheng Mao, Huitong Yang,
Yue Wang, Yilun Wang, Hang Zhao
NeurIPS Dataset Track 2023

Paper Project

ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory
Chenxu Hu, Jie Fu, Chenzhuang Du, Simian Luo, Junbo Zhao, Hang Zhao
LLM@IJCAI 2023

Paper Project

VCAD: Vision-Centric Autonomous Driving
Hang Zhao, Yue Wang, Yilun Wang, Justin Solomon, Vitor Guizilini, et al.
"A research effort pushing the frontiers of camera-centric autonomous driving technology."

Project Workshop

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries
Junru Gu*, Chenxu Hu*, Tianyuan Zhang, Xuanyao Chen, Yilun Wang, Yue Wang, Hang Zhao
CVPR 2023
"Vision-based trajectory prediction autonomous driving."
Paper Project

VectorMapNet: End-to-end Vectorized HD Map Learning
Yicheng Liu, Tianyuan Yuan, Yue Wang, Yilun Wang, Hang Zhao
ICML 2023
"Vectorized mapping from onboard sensors!"
Paper Project

HDMapNet: An Online HD Map Construction and Evaluation Framework
Qi Li, Yue Wang, Yilun Wang, Hang Zhao
CVPR 2021 Workshop best paper, ICRA 2022
"HD map learning from onboard sensors!"

Paper Project

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries
Yue Wang, Vitor Campagnolo Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, Justin Solomon
CoRL 2021
"A new paradigm of 3D object detection from 2D images!"

Paper

On Feature Decorrelation in Self-Supervised Learning
Tianyu Hua, Wenxiao Wang, Zihui Xue, Yue Wang, Sucheng Ren, Hang Zhao
ICCV 2021 Oral
"It reveals the connection between model collapse and feature correlations!"

Paper Project

TNT: Target-driveN Trajectory Prediction
Hang Zhao, Jiyang Gao, Tian Lan, Chen Sun, Benjamin Sapp,
Balakrishnan Varadarajan, Yue Shen, Yi Shen, Yuning Chai,
Cordelia Schmid, Congcong Li, Dragomir Anguelov
CoRL 2020
"A new motion prediction framework for self-driving!"

Paper

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation
Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen,
Dragomir Anguelov, Congcong Li, Cordelia Schmid
CVPR 2020

Paper Waymo Blog

Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Pei Sun et al.
CVPR 2020
Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset
Scott Ettinger, et al.
ICCV 2021 Oral
CVPR Paper ICCV Paper Project Page Challenges Waymo Blog

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
Hang Zhao, Zhicheng Yan, Lorenzo Torresani, Antonio Torralba
ICCV 2019
"A large-scale dataset for temporal action localization and recognition."

Paper (arXiv) Project Page GitHub Page

The Sound of Pixels
Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh McDermott, Antonio Torralba
ECCV 2018
"Listen to the sound of pixels!"
Paper (arXiv) Project Page Code News Coverage

Through-Wall Human Pose Estimation Using Radio Signals
Mingmin Zhao, Tianhong Li, Mohammad Alsheikh, Yonglong Tian, Hang Zhao,
Antonio Torralba, Dina Katabi
CVPR 2018
Paper Project Page News Coverage

Scene Parsing through ADE20K Dataset
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba
CVPR 2017
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba
IJCV 2018 ILSVRC'16 MIT Scene Parsing Challenge
CVPR Paper IJCV Paper Dataset Code Benchmark Challenge

Loss Functions for Neural Networks for Image Processing
Hang Zhao, Orazio Gallo, Iuri Frosio and Jan Kautz
arXiv:1511.08861
TCI 2017
"How important are loss functions for image processing tasks in deep neural nets?"
Paper (Journal) Paper (arXiv) Project Page Code

Duckietown: an Open, Inexpensive and Flexible Platform for Autonomy Education and Research
ICRA 2017
"We are building an open-source education and research platform for autonomous driving. "
Paper Video Project Page Code News Coverage

Unbounded High Dynamic Range Photography using a Modulo Camera
Hang Zhao, Boxin Shi, Christy Fernandez-Cull, Sai-Kit Yeung and Ramesh Raskar
ICCP 2015
Oral Presentation [Best Paper runner-up]

Paper Poster Video Project Page News Coverage

Teaching

[Tsinghua] Advances in Autonomous Driving and Intelligent Vehicles (Lecturer)
[Tsinghua] Introduction to Multimodal Learning (Lecturer)
[MIT 6.869] Advances in Computer Vision (Teaching Assistant)
[MIT 2.166] Autonomous Vehicles, also known as "Duckietown" (Course Developer, Teaching Assistant)
[MIT 6.870] Smartphone Vision (Teaching Assistant)
[MIT 2.007] Design and Manufacturing I (Teaching Assistant)

Professional Activities

Co-organizer of Workshop on Post-Training for Robotics Foundation Models at RSS 2026.
Co-organizer of Workshop on Sight and Sound at CVPR 2020, CVPR 2021, CVPR 2022, CVPR 2023, CVPR 2024, CVPR 2025, CVPR 2026.
Co-organizer of Workshop on Vision-Centric Autonomous Driving (VCAD) at CVPR 2023 and ECCV 2024.
Co-organizer of Workshop on Vision and Language for Autonomous Driving and Robotics at CVPR 2024.
Workshop co-chair (organizing committee) of ICLR 2023.
Co-organizer of HACS Temporal Action Localization Challenge at Workshop on International Challenge on Activity Recognition at CVPR 2020.
Co-organizer of Workshop on Multi-modal Video Analysis and Moments in Time Challenge at ICCV 2019.
Co-organizer of Weakly Supervised Learning for Real-World Computer Vision Applications and the 1st Learning from Imperfect Data (LID) Challenge at CVPR 2019.
Co-organizer of Places Challenge 2017.
Co-organizer of Joint COCO and Places Recognition Challenge Workshop at ICCV 2017.
Co-organizer of MIT Scene Parsing Challenge 2016.
Co-organizer of ILSVRC'16 challenge workshop at ECCV 2016.
Journal reviewer for TPAMI, IJCV, TIP, CVIU, TCI, OE, etc.
Conference reviewer for CVPR, ICCV, ECCV, NIPS, ICML, ICLR, etc.
Co-chair of MIT Vision Seminar.

Talks

Invited talk at RSS 2026 Workshop on Semantic Reasoning and Goal Understanding in Robotics, Jul 2026.
Invited talk at ICCV Workshop on Human-Robot-Scene Interaction and Collaboration, Oct 2025.
Keynote speech at CoRL 2025, Sep 2025.
Invited talk at CoRL Workshop on Robotics World Modeling, Sep 2025.
Invited talk at ECCV Workshop on Autonomous Vehicles Meet Multimodal Foundation Models, Oct 2024.
Invited talk at ICCV Workshop on Visual Learning of Sounds in Spaces (AV4D), Oct 2023.
Invited talk at CVPR Workshop on Autonomous Driving (WAD), Jun 2023.
Invited talk at VALSE Workshop on Autonomous Driving, Jun 2023.
Invited talk at ICLR Workshop on Representation for Autonomous Driving, May 2023.
Invited talk at NeurIPS Workshop on Machine Learning for Autonomous Driving (ML4AD), Dec 2022.
Invited talk at IROS Workshop on Behavior-Driven Autonomous Driving in Unstructured Environments, Oct 2022.
Invited talk at CVPR Tutorial on OpenMMLab, Jun 2022.
Invited talk at VALSE APR on Autonomous Driving, Jun 2022.
Invited talk at VALSE Workshop on Multimodal Learning, Jun 2022.
Invited talk at ICCV Workshop on Benchmarking Trajectory Forecasting Models, Oct 2021.
Invited talk at CVPR Workshop on Autonomous Driving, Jun 2021.
Invited talk at Samsung Research Lab (UK), Apr 2021.
Invited talk at Amazon Alexa, May 2019.
Invited talk at Samsung Workshop at MIT, Apr 2019.
Invited talk at Machine Intelligence Conference, Mar 2019.
Invited talk at Philips, Feb 2019.
Invited talk at VALSE, Jun 2018.
Invited talk at Harvard Vision Seminar, May 2018.
Invited talk at Google Cambridge, Apr 2018.
Invited talk at MIT Graphics Seminar, Sep 2015.

In Chinese:

Invited talk at TechBeat on BEV Perception for Vision-Centric Autonomous Driving, Mar 2022. [Recording]
Invited talk at TechBeat on Motion Prediction for Autonomous Driving, Dec 2020. [Recording]
Invited talk at TechBeat on Cross-Modal Audio-Visual Self-Supervised Learning, Jun 2018. [Recording]

Hang Zhao

Contact

Selected Projects

Teaching

Professional Activities

Talks

Current and Past Affiliations