Skyline and blue-toned clouds

I am a third-year Ph.D. student at The Chinese University of Hong Kong, advised by Prof. Pheng-Ann Heng and Prof. Chi-Wing Fu. Prior to my Ph.D. studies, I worked as an Algorithm Engineer at Tencent AI Lab, where I developed RL-based game agents and designed diverse reward strategies to encourage varied playstyles. I hold an M.Sc. in Big Data Technology from The Hong Kong University of Science and Technology and a B.Sc. (Hons) in Computer Science and Technology from Beijing Normal-Hong Kong Baptist University.

My research focuses on multimodal understanding and agentic reasoning, exploring native active perception to bridge passive audio-visual understanding with active, agentic decision-making in foundation models. Currently, I am translating this vision into practice as a Research Intern at Alibaba Qwen, where I design agentic RL frameworks, algorithms, and data pipelines to enhance the complex audio-visual reasoning capabilities of foundational models.

My long-term goal is to teach machines to perceive and interact like humans. I believe that video, naturally coupled with audio, serves as the essential bridge to understand the world and ground foundational models in real-world dynamics.

Beyond academia, I am passionate about basketball, tennis, and regular gym workouts. Sports define my mindset.

News & Announcements

  • 05/2026 One paper (OmniAgent) accepted to ICML 2026!
  • 03/2026 Say hello to the new Qwen3.5-Omni.
  • 01/2026 One paper accepted to ICLR 2026.
  • 01/2026 One paper accepted to IJCV.
  • 09/2025 Qwen3-Omni released. Check it out.
  • 09/2025 One paper accepted to NeurIPS 2025.
  • 02/2025 One paper accepted to CVPR 2025.
  • 09/2024 One paper accepted to IEEE TIP.
  • 07/2024 I finally got into the dormitory from the waiting list!

Selected Publications

* marks joint first authors. Includes peer-reviewed publications and technical reports.

  • Native Active Perception as Reasoning for Omni-Modal Understanding
    Zhenghao Xing*, Ruiyang Xu*, Yuxuan Wang*, Jinzheng He, Ziyang Ma, Qize Yang, Yunfei Chu, Jin Xu, Junyang Lin, Chi-Wing Fu, and Pheng-Ann Heng.
    International Conference on Machine Learning (ICML), 2026.
    [Conference]
  • Qwen3.5-Omni Technical Report
    Qwen Team
    ArXiv Tech Report, 2026.
    [Technical Report] [arXiv] [Qwen Chat]
  • Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
    Ziyang Ma*, Ruiyang Xu*, Zhenghao Xing*, Yunfei Chu, Yuxuan Wang, Jinzheng He, Jin Xu, Pheng-Ann Heng, Kai Yu, Junyang Lin, Eng Siong Chng, and Xie Chen.
    The Fourteenth International Conference on Learning Representations (ICLR), 2026.
    [Conference] [arXiv] [Code]
  • Qwen3-Omni Technical Report
    Qwen Team
    ArXiv Tech Report, 2025.
    [Technical Report] [arXiv] [Code] [Qwen Chat]
  • EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
    Zhenghao Xing, Xiaowei Hu, Chi-Wing Fu, Wenhai Wang, Jifeng Dai, and Pheng-Ann Heng.
    ArXiv Tech Report, 2025.
    [Technical Report] [arXiv] [Code]
  • EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights
    Zhenghao Xing*, Hao Chen*, Binzhu Xie, Jiaqi Xu, Ziyu Guo, Xuemiao Xu, Jianye Hao, Chi-Wing Fu, Xiaowei Hu, and Pheng-Ann Heng.
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
    [Conference] [Paper] [Code]
  • Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era
    Xiaowei Hu*, Zhenghao Xing*, Tianyu Wang, Chi-Wing Fu, and Pheng-Ann Heng.
    International Journal of Computer Vision (IJCV), 2026.
    [Journal] [Paper] [arXiv] [Code]
  • Video Instance Shadow Detection Under the Sun and Sky
    Zhenghao Xing*, Tianyu Wang*, Xiaowei Hu, Haoran Wu, Chi-Wing Fu, and Pheng-Ann Heng.
    IEEE Transactions on Image Processing (IEEE TIP), 2024.
    [Journal] [Paper] [arXiv] [Code]

Academic Service

Reviewer: CVPR 2026, ICML 2026 (Silver Reviewer), ECCV 2026.