Hello, I am Hongyao Tang, 汤宏垚. I am an Associate Research Fellow at TJU RL Lab, College of Intelligence and Computing, Tianjin University.
Prior to this, I was a postdoctoral researcher with Professor Glen Berseth, in Robotics and Embodied AI Lab (REAL) at the Mila and Université de Montréal. I obtained my Ph.D. (Master's and Bachelor's Degree as well) in Deep Reinforcement Learning (DRL) Lab, Tianjin University advised by Professor Jianye Hao, Zhaopeng Meng, and Li Wang.
My research interests lie in unveiling the learning dynamics of Deep Reinforcement Learning (DRL) and realizing new approaches/paradigms for efficient, performant and generalizable agents. My current research focus is on Learning under Nonstationarity (i.e., one intriguing nature of RL), and concretely, I study Continual RL and RL problems in LLMs and embodied intelligence. I am also interested in Meta RL, MARL, Offline RL, Machine Unlearning, etc.
I have experience in applying DRL in practical problems like Electronic Design Automation (EDA), Drug Discovery, Online Games and etc. I am very willing to contribute to addressing real-world problems.
For prospective students (master, ph.d., RA, etc.) and collaboration partners, please contact me via my email (above) or contact via the homepage of TJU RL Lab.
PS: Authors with equal contribution are marked by *. Corresponding authors are marked by 📮.
Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
Hongyao Tang, Glen Berseth
NeurIPS 2024
| [Paper]
[Code]
⚡️TL;DR: identifies the chain effect of churn in DRL, reducing it mitigates the learning issues in DQN/PPO/SAC and leads to better stability and performance
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Hongyao Tang📮, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Glen Berseth
ICML 2025
| [Paper]
[Code]
⚡️TL;DR: reveals the relationship between churn and plasticity loss, continually reducing churn mitigates plasticity loss and achieves the best overall performance across 24 envs
ScaleMoE: Mixture-of-Experts for Scalable Continuous Control in Actor-Critic Reinforcement Learning
Yi Ma, Chenjun Xiao, Hongyao Tang📮, Yaodong Yang, Jinyi Liu, Jing Liang, Jiye Liang
ICML 2026 (Spotlight💎)
| [Paper]
⚡️TL;DR: proposes a new way to scale deep AC methods with MoE architectures, outperforms SimBa and BRC in single-task RL and multi-task RL, respectively
Neuro-evolutionary Continual Reinforcement Learning
Pengyi Li, Hongyao Tang📮, Yifu Yuan, Yan Zheng, Jianye Hao
ICML 2026 (Spotlight💎)
| [Paper]
⚡️TL;DR: dynamically allocates, reuses, and re-cycles neurons in a monolithic model, guarantees no forgetting, achieves SOTA results in ContinualWorld
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
Yifu Yuan, Haiqin Cui, Yaoting Huang, Yibin Chen, Fei Ni, Zibin Dong, Pengyi Li, Yan Zheng, Hongyao Tang, Jianye Hao
ICLR 2026
| [Paper]
[Project Page]
⚡️TL;DR: proposes a 3B VLM called ER-1 that leverages pointing to bridges the "seeing to doing" gap, acheives SOTA performance on 11 embodied spatial and pointing benchmarks and zero-shot generalization in SIMPLEREnv and real XArm tasks
PACE: Unleashing the Power of Code Embeddings to Boost AutoML Agents
Gangyi Zhao, Hebin Liang, Hongyao Tang📮, Yi Ma, Jinyi Liu, Zhaocheng Du, Yan Zheng, Chenjun Xiao, Jianye Hao
KDD 2026
| [Paper]
⚡️TL;DR: idenifies the structural locality in the multi-view low-d representation space of code solutions, leverages it to build admission strategies that realizes more efficient autoML
The Ladder in Chaos: Improving Policy Learning by Harnessing the Parameter Evolving Path in A Low-dimensional Space
Hongyao Tang, Min Zhang, Chen Chen, Jianye Hao
NeurIPS 2024
| [Paper]
⚡️TL;DR: identifies the low-rank param space of policy evolvement with temporal SVD, improves learning stability and efficiency by editing the SVD coeffs directly
Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
Jing Liang*, Jinyi Liu*, Yi Ma*, Hongyao Tang📮, Yan Zheng, Shuyue Hu, Lei Bai, Jianye Hao
ICLR 2026
| [Paper] [Project Page]
⚡️TL;DR: revisits Generalized PPO and extends it to LLM RL post-training, achieves SOTA level math reasoning results with a 30x-450x🚀 rollout cost reduction
The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
Zihao Wu, Hongyao Tang📮, Yi Ma, Jiashun Liu, Yan Zheng, Jianye Hao
ICLR 2026
| [Paper]
⚡️TL;DR: theoretically analyzes the learning dynamics under non-stationarity, proposes a simple plug-and-play remedy to DRL algorithms that improves TD3/SAC with SimBa in DMC
Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies
Yi Ma*, Hongyao Tang*, Chenjun Xiao, Yaodong Yang, Wei Wei, Jianye Hao, Jiye Liang
arXiv preprint 2025
| [Paper]
⚡️TL;DR: proposes a new taxonomy for RL scaling research, from four perspectives: data, network, training, and priors
Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes
Min Zhang, Hongyao Tang📮, Jianye Hao, Yan Zheng
AAMAS 2026 (Full Paper, Oral🌈)
| [Paper]
⚡️TL;DR: just as the title, proposes the systematic definition, hierarchy of policy abstraction and representation for the first time, a solute to Li et al. (2006)
Efficient Morphology-Aware Policy Transfer to New Embodiments
Michael Przystupa, Hongyao Tang, Glen Berseth, Mariano Phielipp, Santiago Miret, Martin Jägersand, Matthew E. Taylor
RLC 2025
| [Paper]
⚡️TL;DR: investigates different param-effient fine-tuning (PeFT) methods for multi-task policies of locomotion robots
Can We Optimize Deep RL Policy Weights as Trajectory Modeling?
Hongyao Tang
ICLR 2025 Workshop on Weight Space Learning
| [Paper]
⚡️TL;DR: since DRL policies evolve in a low-d space, we can use the low-d representation and convert policy optimization into a trajectory prediction problem
EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search
Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zheng
ICML 2024
| [Paper]
[Code]
⚡️TL;DR: just like Rainbow(-DQN), this is a high-performance ERL algorithm along with comprehensive comparison among algorithmic choices
Reining Generalization in Offline Reinforcement Learning via Representation Distinction
Yi Ma, Hongyao Tang📮, Dong Li, Zhaopeng Meng
NeurIPS 2023
| [Paper]
⚡️TL;DR: cuts the unrestricted generalization from in-d to ood, this simplifies algorithmic design while achieving good performance for offline RL
HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation
Boyan Li*, Hongyao Tang*, Yan Zheng, Jianye Hao, Pengyi Li, Zhen Wang, Zhaopeng Meng, Li Wang
ICLR 2022
| [Paper]
[Code]
⚡️TL;DR: learns a unified self-supervised representation for hybrid actions, converts hybrid-action RL back to normal RL, thus solving high-d hybrid RL problems effectively
What about Inputting Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator
Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, Li Wang
AAAI 2022 Oral🌈 Presentation (< 5%)
| [Paper]
[Code]
⚡️TL;DR: proposes a new RL paradigm that allows value generalization among policies and utilize it to improve learning efficiency, learns low-d representation for NN policies
RLC
2024 - 2026
NeurIPS
2021 - 2025 (Top Reviewer Award at NeurIPS 2022)
ICLR
2022 - 2026 (Highlighted Reviewer Award at ICLR 2022)
ICML
2021 - 2026
AAMAS
2021 - 2024
Nature Communications
Transactions on Machine Learning Research (TMLR)
IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
2025.03
RL in the Era of Large Models
2025 IEEE International Conference on Industrial Technology
2024.10
Where is the Road to Flawless RL? — Unsolved Problems and New Approaches
Google DeepMind, Discovery Team London (Remote)
2023.05
A Tale of Representations in Deep Reinforcement Learning
Robotics and Embodied AI Lab (REAL), the Université de Montréal
2022.07
Towards Understanding The Learning Dynamics of Deep Reinforcement Learning
Huawei Noah's Ark Lab, Decision-making and Reasoning Group (during internship)
2021.10
Self-supervised Reinforcement Learning — A Perspective of Representation
2021 TJU RL Summer Seminar
2020.11
State Abstraction and State Representation Learning in Reinforcement Learning
Huawei Noah's Ark Lab, Decision-making and Reasoning Group (during internship)
2019.08
Bias and Variance in Deep Reinforcement Learning
2019 TJU RL Summer Seminar