I am a third-year Ph.D. student at The Chinese University of Hong Kong, advised by
Prof. Pheng-Ann Heng and
Prof. Chi-Wing Fu.
Prior to my Ph.D. studies, I worked as an algorithm engineer at Tencent AI Lab, where I developed RL-based game
agents and designed diverse reward strategies to encourage varied playstyles.
I hold an M.Sc. in Big Data Technology from The Hong Kong University of Science and
Technology and a B.Sc. (Hons) in Computer Science and Technology from Beijing Normal–Hong Kong Baptist University.
My research focuses on multimodal understanding and agentic reasoning, exploring native active perception to bridge passive audio-visual understanding with active, agentic decision-making in foundation models.
Currently, I am translating this vision into practice as a Research Intern at Alibaba Qwen, where I design agentic RL frameworks, algorithms, and data pipelines to enhance the complex audio-visual reasoning capabilities of foundational models.
My long-term goal is to teach machines to perceive and interact like humans. I believe that video, naturally coupled with audio, serves as the essential bridge to understand the world and ground foundational models in real-world dynamics.
Beyond academia, I am passionate about basketball, tennis, and regular gym workouts—sports define my mindset.
zhxing [at] cse.cuhk.edu.hk / harryhsing [at] outlook.com
Room 404, Academic Building 1
The Chinese University of Hong Kong
Shatin, New Territories
Hong Kong