I am a third-year Ph.D. student at The Chinese University of Hong Kong, advised by Prof. Pheng-Ann Heng and Prof. Chi-Wing Fu. Prior to my Ph.D. studies, I worked as an Algorithm Engineer at Tencent AI Lab, where I developed RL-based game agents and designed diverse reward strategies to encourage varied playstyles. I hold an M.Sc. in Big Data Technology from The Hong Kong University of Science and Technology and a B.Sc. (Hons) in Computer Science and Technology from Beijing Normal-Hong Kong Baptist University.
My research focuses on multimodal understanding and agentic reasoning, exploring native active perception to bridge passive audio-visual understanding with active, agentic decision-making in foundation models. Currently, I am translating this vision into practice as a Research Intern at Alibaba Qwen, where I design agentic RL frameworks, algorithms, and data pipelines to enhance the complex audio-visual reasoning capabilities of foundational models.
My long-term goal is to teach machines to perceive and interact like humans. I believe that video, naturally coupled with audio, serves as the essential bridge to understand the world and ground foundational models in real-world dynamics.
Beyond academia, I am passionate about basketball, tennis, and regular gym workouts. Sports define my mindset.
News & Announcements
- 05/2026 One paper (OmniAgent) accepted to ICML 2026!
- 03/2026 Say hello to the new Qwen3.5-Omni.
- 01/2026 One paper accepted to ICLR 2026.
- 01/2026 One paper accepted to IJCV.
- 09/2025 Qwen3-Omni released. Check it out.
- 09/2025 One paper accepted to NeurIPS 2025.
- 02/2025 One paper accepted to CVPR 2025.
- 09/2024 One paper accepted to IEEE TIP.
- 07/2024 I finally got into the dormitory from the waiting list!
Selected Publications
* marks joint first authors. Includes peer-reviewed publications and technical reports.
-
Native Active Perception as Reasoning for Omni-Modal Understanding
Zhenghao Xing*, Ruiyang Xu*, Yuxuan Wang*, Jinzheng He, Ziyang Ma, Qize Yang, Yunfei Chu, Jin Xu, Junyang Lin, Chi-Wing Fu, and Pheng-Ann Heng.
International Conference on Machine Learning (ICML), 2026.
[Conference] -
Qwen3.5-Omni Technical Report
Qwen Team
ArXiv Tech Report, 2026.
[Technical Report] [arXiv] [Qwen Chat] -
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
Ziyang Ma*, Ruiyang Xu*, Zhenghao Xing*, Yunfei Chu, Yuxuan Wang, Jinzheng He, Jin Xu, Pheng-Ann Heng, Kai Yu, Junyang Lin, Eng Siong Chng, and Xie Chen.
The Fourteenth International Conference on Learning Representations (ICLR), 2026.
[Conference] [arXiv] [Code] -
Qwen3-Omni Technical Report
Qwen Team
ArXiv Tech Report, 2025.
[Technical Report] [arXiv] [Code] [Qwen Chat] -
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
Zhenghao Xing, Xiaowei Hu, Chi-Wing Fu, Wenhai Wang, Jifeng Dai, and Pheng-Ann Heng.
ArXiv Tech Report, 2025.
[Technical Report] [arXiv] [Code] -
EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights
Zhenghao Xing*, Hao Chen*, Binzhu Xie, Jiaqi Xu, Ziyu Guo, Xuemiao Xu, Jianye Hao, Chi-Wing Fu, Xiaowei Hu, and Pheng-Ann Heng.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
[Conference] [Paper] [Code] -
Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era
Xiaowei Hu*, Zhenghao Xing*, Tianyu Wang, Chi-Wing Fu, and Pheng-Ann Heng.
International Journal of Computer Vision (IJCV), 2026.
[Journal] [Paper] [arXiv] [Code] -
Video Instance Shadow Detection Under the Sun and Sky
Zhenghao Xing*, Tianyu Wang*, Xiaowei Hu, Haoran Wu, Chi-Wing Fu, and Pheng-Ann Heng.
IEEE Transactions on Image Processing (IEEE TIP), 2024.
[Journal] [Paper] [arXiv] [Code]
Academic Service
Reviewer: CVPR 2026, ICML 2026 (Silver Reviewer), ECCV 2026.