![]() |
I am a Research Scientist at Google Research. I did my Ph.D. in Computer Science at The University of Texas at Austin, supervised by Prof. Philipp Krähenbühl. Before that, I obtained my bachelor degree from School of Computer Science at Fudan University. I have interned at Microsoft Research Asia, Google Research, Intel Labs, and Facebook AI Research. My research focuses on object-level visual recognition in videos, including object captioning, detection, 3D perception, pose estimation, and tracking. CV / Google Scholar / GitHub / LinkedIn |
Last updated June 2023
Large-scale well-curated datasets are treasures in computer vision. However, most datasets only focus on one single domain with a specific task and a fixed label set. Computer vision models trained on a single dataset can not generalize to all applications in the real world.
The goal of my research is to remove the artificial barriers of datasets and make object recognition generalize in the wild. There should be one single computer vision model, not a zoo of dataset-specific models. The model should be trained on a diverse set of datasets and should be able to recognize objects from different data sources in all domains.
Towards this goal, my research focuses on three aspects: 1. How to build a unified object representation for various vision tasks. I am proposing a point-based representation for object detection, 3D detection, and pose estimation (CenterNet, CenterNet2, CenterPoint). 2. How to generalize object recognition through time. I propose a simple solution to extend our point-based detector into a local tracker (CenterTrack), and then introduce a global tracker that does recognition globally over time (GTR). 3. How to expand the vocabulary of an object detection system. I learn a unified label space for different detection datasets (UniDet) and combine classification datasets with twenty-thousand classes (Detic).
Going further, I am interested in training a unified object recognition system that performs multi-tasks using weak or incomplete supervisions.