Shiyu Huang   黄世宇

Researcher, XPENG

Haidian District, Beijing, China, 100084.
Email: huangsy1314@163.com

[OpenRL]     [知乎]     [Google Scholar]     [GitHub]     [Linkedin]     [CV]

Visitors: 11038

Biography

I am a researcher at XPeng Inc. Before that, I worked as a researcher in Zhipu AI and 4Paradigm Inc., and I am the leader of the OpenRL Lab. I received my B.E. and Ph.D. degrees (co-advised by Prof. Jun Zhu and Prof. Ting Chen) from the Department of Computer Science and Technology, Tsinghua University in July, 2017 and June, 2022. My researches focus on deep reinforcement learning, multi-agent reinforcement learning, distributed reinforcement learning, RL for robotics, LLM as agent, artificial general intelligence (AGI) and generative artificial intelligence (GAI). I have also spent time working at RealAI Inc. , Huawei Noah's Ark Lab, Tencent AI Lab, Carnegie Mellon University and Sensetime Inc. . And I am also the founder of the OpenRL Lab(GitHub stars) and TARTRL group.

We are looking for self-motivated interns and full-timers who have a strong background in mathematics/computer science and are eager to get involved in cutting-edge, fundamental AI research. Please feel free to drop me an email if you are interested in collaborating with me.

Publications && Preprints

(* equal contribution)

Highlight
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Boyan Shi, Changyu Pang, Chenhui Zhang, Da Yin, Fan Yang, Guoqing Chen, Jiazheng Xu, Jiali Chen, Jing Chen, Jinhao Chen, Jinghao Lin, Jinjiang Wang, Junjie Chen, Leqi Lei, Letian Gong, Leyi Pan, Mingzhi Zhang, Qinkai Zheng, Sheng Yang, Shi Zhong, Shiyu Huang, Shuyuan Zhao, Siyan Xue, Shangqin Tu, Shengbiao Meng, Tianshu Zhang, Tianwei Luo, Tianxiang Hao, Wenkai Li, Wei Jia, Xin Lyu, Xuancheng Huang, Yanling Wang, Yadong Xue, Yanfeng Wang, Yifan An, Yifan Du, Yiming Shi, Yiheng Huang, Yilin Niu, Yuan Wang, Yuanchang Yue, Yuchen Li, Yutao Zhang, Yuxuan Zhang, Zhanxiao Du, Zhenyu Hou, Zhao Xue, Zhengxiao Du, Zihan Wang, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Minlie Huang, Yuxiao Dong, Jie Tang
arXiv:2507.01006, 2025
GitHub stars GitHub forks
[PDF] [Code] [BibTeX]
@article{hong2025glm, title={GLM-4.1 V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning}, author={Hong, Wenyi and Yu, Wenmeng and Gu, Xiaotao and Wang, Guo and Gan, Guobing and Tang, Haomiao and Cheng, Jiale and Qi, Ji and Ji, Junhui and Pan, Lihang and others}, journal={arXiv preprint arXiv:2507.01006}, year={2025} }
Highlight
Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation
Chenxu Wang, Yonggang Jin, Cheng Hu, Youpeng Zhao, Zipeng Dai, Jian Zhao, Shiyu Huang, Liuyu Xiang, Junge Zhang, Zhaofeng He
Neurocomputing (2025): 130912
GitHub stars GitHub forks
[PDF] [Code] [BibTeX]
@article{wang2025generalizable, title={Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation}, author={Wang, Chenxu and Jin, Yonggang and Hu, Cheng and Zhao, Youpeng and Dai, Zipeng and Zhao, Jian and Xiang, Liuyu and Zhang, Junge and He, Zhaofeng}, journal={Neurocomputing}, pages={130912}, year={2025}, publisher={Elsevier} }
Highlight
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Wenyi Hong*, Yean Cheng*, Zhuoyi Yang, Weihan Wang, Lefan Wang, Xiaotao Gu, Shiyu Huang, Yuxiao Dong, Jie Tang
The IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2025
GitHub stars GitHub forks
[Blog] [PDF] [Code] [Dataset] [Leaderboard] [BibTeX]
@article{hong2025motionbench, title={MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models}, author={Hong, Wenyi and Cheng, Yean and Yang, Zhuoyi and Wang, Weihan and Wang, Lefan and Gu, Xiaotao and Huang, Shiyu and Dong, Yuxiao and Tang, Jie}, journal={arXiv preprint arXiv:2501.02955}, year={2025} }
Highlight
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
Jiazheng Xu*, Yu Huang*, Jiale Cheng, Yuanming Yang, Jiajun Xu, Yuan Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li, Jiayan Teng, Zhuoyi Yang, Wendi Zheng, Xiao Liu, Ming Ding, Xiaohan Zhang, Xiaotao Gu, Shiyu Huang, Minlie Huang, Jie Tang, Yuxiao Dong
arXiv:2412.21059, 2024
GitHub stars GitHub forks
[PDF] [Code] [Huggingface] [BibTeX]
@article{xu2024visionreward, title={VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation}, author={Xu, Jiazheng and Huang, Yu and Cheng, Jiale and Yang, Yuanming and Xu, Jiajun and Wang, Yuan and Duan, Wenbo and Yang, Shen and Jin, Qunlin and Li, Shurun and others}, journal={arXiv preprint arXiv:2412.21059}, year={2024} }
Highlight
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen*, Tianshu Zhang*, Shiyu Huang, Yuwei Niu, Linfeng Zhang, Lijie Wen, Xuming Hu
The IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2025
[PDF] [BibTeX]
@article{chen2024ict, title={ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models}, author={Chen, Junzhe and Zhang, Tianshu and Huang, Shiyu and Niu, Yuwei and Zhang, Linfeng and Wen, Lijie and Hu, Xuming}, journal={arXiv preprint arXiv:2411.15268}, year={2024} }
Highlight
DreamPolish: Domain Score Distillation With Progressive Geometry Generation
Yean Cheng*, Ziqi Cai*, Ming Ding, Wendi Zheng, Shiyu Huang, Yuxiao Dong, Jie Tang, Boxin Shi
arXiv:2411.01602, 2024
[PDF] [BibTeX]
@article{cheng2024dreampolish, title={DreamPolish: Domain Score Distillation With Progressive Geometry Generation}, author={Cheng, Yean and Cai, Ziqi and Ding, Ming and Zheng, Wendi and Huang, Shiyu and Dong, Yuxiao and Tang, Jie and Shi, Boxin}, journal={arXiv preprint arXiv:2411.01602}, year={2024} }
Highlight
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang
arXiv:2408.16500, 2024
GitHub stars GitHub forks
[PDF] [Code] [Huggingface] [BibTeX]
@article{hong2024cogvlm2, title={CogVLM2: Visual Language Models for Image and Video Understanding}, author={Hong, Wenyi and Wang, Weihan and Ding, Ming and Yu, Wenmeng and Lv, Qingsong and Wang, Yan and Cheng, Yean and Huang, Shiyu and Ji, Junhui and Xue, Zhao and others}, journal={arXiv preprint arXiv:2408.16500}, year={2024} }
Highlight
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang*, Jiayan Teng*, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Xiaohan Zhang, Xiaotao Gu, Guanyu Feng, Da Yin, Wenyi Hong, Weihan Wang, Yean Cheng, Yuxuan Zhang, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang
The Thirteenth International Conference on Learning Representations (ICLR), 2025
GitHub stars GitHub forks
[PDF] [Code] [Huggingface] [BibTeX]
@article{yang2024cogvideox, title={CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer}, author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others}, journal={arXiv preprint arXiv:2408.06072}, year={2024} }
Highlight
LVBench: An Extreme Long Video Understanding Benchmark
Weihan Wang, Zehai He, Wenyi Hong, Yean Cheng, Xiaohan Zhang, Ji Qi, Shiyu Huang, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang
International Conference on Computer Vision(ICCV), 2025
GitHub stars
[PDF] [Project] [Code] [Huggingface] [BibTeX]
@misc{wang2024lvbench, title={LVBench: An Extreme Long Video Understanding Benchmark}, author={Weihan Wang and Zehai He and Wenyi Hong and Yean Cheng and Xiaohan Zhang and Ji Qi and Shiyu Huang and Bin Xu and Yuxiao Dong and Ming Ding and Jie Tang}, year={2024}, eprint={2406.08035}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Talks

Projects

Patents

Honors & Awards

Competitions

Services

Organizer for:
NeurIPS 2023 Workshop on New in ML

Reviewer for:
NeurIPS 2025, ICCV 2025, ICML 2025, CVPR 2025, ICLR 2025, AAAI 2025, NeurIPS 2024, ICML 2024, ICLR 2024, AAAI 2024, NeurIPS 2023, AISTATS 2023, AAAI 2023, ICLR 2023, NeurIPS 2022, ICML 2022, AISTATS 2022, AAAI 2022, ICLR 2022, NeurIPS 2021, ICML 2021, AAAI 2021, NeurIPS 2020

Teaching

2020 Spring, TA in Big Data and Machine Intelligence, instructed by Zhen Chen
2019 Fall, TA in Big Data and Machine Intelligence, instructed by Zhen Chen
2019 Spring, TA in Machine Learning, instructed by Prof. Jun Zhu