Hongyao Tang

Associate Research Fellow @ Tianjin University

📧 tanghyyy [at] gmail.com | Google Scholar GitHub Xiaohongshu

"Lesson 5: The shortest shortcut is the long way around.
The long way around is my shortest shortcut."

personal photo

Personal Information

Hello, I am Hongyao Tang, 汤宏垚. I am an Associate Research Fellow at TJU RL Lab, College of Intelligence and Computing, Tianjin University.

Prior to this, I was a postdoctoral researcher with Professor Glen Berseth, in Robotics and Embodied AI Lab (REAL) at the Mila and Université de Montréal. I obtained my Ph.D. (Master's and Bachelor's Degree as well) in Deep Reinforcement Learning (DRL) Lab, Tianjin University advised by Professor Jianye Hao, Zhaopeng Meng, and Li Wang.

My research interests lie in unveiling the learning dynamics of Deep Reinforcement Learning (DRL) and realizing new approaches/paradigms for efficient, performant and generalizable agents. My current research focus is on Learning under Nonstationarity (i.e., one intriguing nature of RL), and concretely, I study Continual RL and RL problems in LLMs and embodied intelligence. I am also interested in Meta RL, MARL, Offline RL, Machine Unlearning, etc.

I have experience in applying DRL in practical problems like Electronic Design Automation (EDA), Drug Discovery, Online Games and etc. I am very willing to contribute to addressing real-world problems.


⚡️ Highlight

For prospective students (master, ph.d., RA, etc.) and collaboration partners, please contact me via my email (above) or contact via the homepage of TJU RL Lab.


🌟 News


🛣 Education & Work Experiences

2025.01 - Present Associate Research Fellow TJU RL Lab, College of Intelligence and Computing, Tianjin University
2023.11 - 2024.12 Postdoctoral Research Fellow Robotics and Embodied AI Lab (REAL), Mila/UdeM (work with Glen Berseth, and co-author with Johan Obando-Ceron, Pablo Samuel Castro, and Aaron Courville)
2019.09 - 2023.06 Ph.D. College of Intelligence and Computing, Tianjin University (advised by Jianye Hao and Zhaopeng Meng)
2020.05 - 2023.04 DRL Research Intern Noah's Ark Lab, Huawei (work with Chen Chen and Zhentao Tang)
2019.09 - 2020.04 AI Research Intern Quantum Lab, Tencent (work with Guangyong Chen)
2018.07 - 2018.10 DRL Research Intern Fuxi AI Lab, NetEase (work with Tangjie Lv)
2017.09 - 2019.07 Master College of Intelligence and Computing, Tianjin University (advised by Jianye Hao and Li Wang)
2013.09 - 2017.07 Bachelor School of Software Engineering, Tianjin University

🧠 Selected Publications & Preprints

PS: Authors with equal contribution are marked by *. Corresponding authors are marked by 📮.

Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
Hongyao Tang, Glen Berseth
NeurIPS 2024 | [Paper] [Code]
⚡️TL;DR: identifies the chain effect of churn in DRL, reducing it mitigates the learning issues in DQN/PPO/SAC and leads to better stability and performance

Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Hongyao Tang📮, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Glen Berseth
ICML 2025 | [Paper] [Code]
⚡️TL;DR: reveals the relationship between churn and plasticity loss, continually reducing churn mitigates plasticity loss and achieves the best overall performance across 24 envs

ScaleMoE: Mixture-of-Experts for Scalable Continuous Control in Actor-Critic Reinforcement Learning
Yi Ma, Chenjun Xiao, Hongyao Tang📮, Yaodong Yang, Jinyi Liu, Jing Liang, Jiye Liang
ICML 2026 (Spotlight💎) | [Paper]
⚡️TL;DR: proposes a new way to scale deep AC methods with MoE architectures, outperforms SimBa and BRC in single-task RL and multi-task RL, respectively

Neuro-evolutionary Continual Reinforcement Learning
Pengyi Li, Hongyao Tang📮, Yifu Yuan, Yan Zheng, Jianye Hao
ICML 2026 (Spotlight💎) | [Paper] ⚡️TL;DR: dynamically allocates, reuses, and re-cycles neurons in a monolithic model, guarantees no forgetting, achieves SOTA results in ContinualWorld

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
Yifu Yuan, Haiqin Cui, Yaoting Huang, Yibin Chen, Fei Ni, Zibin Dong, Pengyi Li, Yan Zheng, Hongyao Tang, Jianye Hao
ICLR 2026 | [Paper] [Project Page] ⚡️TL;DR: proposes a 3B VLM called ER-1 that leverages pointing to bridges the "seeing to doing" gap, acheives SOTA performance on 11 embodied spatial and pointing benchmarks and zero-shot generalization in SIMPLEREnv and real XArm tasks

PACE: Unleashing the Power of Code Embeddings to Boost AutoML Agents
Gangyi Zhao, Hebin Liang, Hongyao Tang📮, Yi Ma, Jinyi Liu, Zhaocheng Du, Yan Zheng, Chenjun Xiao, Jianye Hao
KDD 2026 | [Paper] ⚡️TL;DR: idenifies the structural locality in the multi-view low-d representation space of code solutions, leverages it to build admission strategies that realizes more efficient autoML

The Ladder in Chaos: Improving Policy Learning by Harnessing the Parameter Evolving Path in A Low-dimensional Space
Hongyao Tang, Min Zhang, Chen Chen, Jianye Hao
NeurIPS 2024 | [Paper] ⚡️TL;DR: identifies the low-rank param space of policy evolvement with temporal SVD, improves learning stability and efficiency by editing the SVD coeffs directly

Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
Jing Liang*, Jinyi Liu*, Yi Ma*, Hongyao Tang📮, Yan Zheng, Shuyue Hu, Lei Bai, Jianye Hao
ICLR 2026 | [Paper] [Project Page] ⚡️TL;DR: revisits Generalized PPO and extends it to LLM RL post-training, achieves SOTA level math reasoning results with a 30x-450x🚀 rollout cost reduction

The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
Zihao Wu, Hongyao Tang📮, Yi Ma, Jiashun Liu, Yan Zheng, Jianye Hao
ICLR 2026 | [Paper] ⚡️TL;DR: theoretically analyzes the learning dynamics under non-stationarity, proposes a simple plug-and-play remedy to DRL algorithms that improves TD3/SAC with SimBa in DMC

Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies
Yi Ma*, Hongyao Tang*, Chenjun Xiao, Yaodong Yang, Wei Wei, Jianye Hao, Jiye Liang
arXiv preprint 2025 | [Paper] ⚡️TL;DR: proposes a new taxonomy for RL scaling research, from four perspectives: data, network, training, and priors

Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes
Min Zhang, Hongyao Tang📮, Jianye Hao, Yan Zheng
AAMAS 2026 (Full Paper, Oral🌈) | [Paper] ⚡️TL;DR: just as the title, proposes the systematic definition, hierarchy of policy abstraction and representation for the first time, a solute to Li et al. (2006)

Efficient Morphology-Aware Policy Transfer to New Embodiments
Michael Przystupa, Hongyao Tang, Glen Berseth, Mariano Phielipp, Santiago Miret, Martin Jägersand, Matthew E. Taylor
RLC 2025 | [Paper] ⚡️TL;DR: investigates different param-effient fine-tuning (PeFT) methods for multi-task policies of locomotion robots

Can We Optimize Deep RL Policy Weights as Trajectory Modeling?
Hongyao Tang
ICLR 2025 Workshop on Weight Space Learning | [Paper] ⚡️TL;DR: since DRL policies evolve in a low-d space, we can use the low-d representation and convert policy optimization into a trajectory prediction problem

EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search
Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zheng
ICML 2024 | [Paper] [Code] ⚡️TL;DR: just like Rainbow(-DQN), this is a high-performance ERL algorithm along with comprehensive comparison among algorithmic choices

Reining Generalization in Offline Reinforcement Learning via Representation Distinction
Yi Ma, Hongyao Tang📮, Dong Li, Zhaopeng Meng
NeurIPS 2023 | [Paper] ⚡️TL;DR: cuts the unrestricted generalization from in-d to ood, this simplifies algorithmic design while achieving good performance for offline RL

HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation
Boyan Li*, Hongyao Tang*, Yan Zheng, Jianye Hao, Pengyi Li, Zhen Wang, Zhaopeng Meng, Li Wang
ICLR 2022 | [Paper] [Code] ⚡️TL;DR: learns a unified self-supervised representation for hybrid actions, converts hybrid-action RL back to normal RL, thus solving high-d hybrid RL problems effectively

What about Inputting Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator
Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, Li Wang
AAAI 2022 Oral🌈 Presentation (< 5%) | [Paper] [Code] ⚡️TL;DR: proposes a new RL paradigm that allows value generalization among policies and utilize it to improve learning efficiency, learns low-d representation for NN policies

More Publications


Academic Service

Reviewer

RLC

2024 - 2026

NeurIPS

2021 - 2025 (Top Reviewer Award at NeurIPS 2022)

ICLR

2022 - 2026 (Highlighted Reviewer Award at ICLR 2022)

ICML

2021 - 2026

AAMAS

2021 - 2024

Nature Communications

Transactions on Machine Learning Research (TMLR)

IEEE Transactions on Neural Networks and Learning Systems (TNNLS)


Invited Talks

2025.03

RL in the Era of Large Models
2025 IEEE International Conference on Industrial Technology

2024.10

Where is the Road to Flawless RL? — Unsolved Problems and New Approaches
Google DeepMind, Discovery Team London (Remote)

2023.05

A Tale of Representations in Deep Reinforcement Learning
Robotics and Embodied AI Lab (REAL), the Université de Montréal

2022.07

Towards Understanding The Learning Dynamics of Deep Reinforcement Learning
Huawei Noah's Ark Lab, Decision-making and Reasoning Group (during internship)

2021.10

Self-supervised Reinforcement Learning — A Perspective of Representation
2021 TJU RL Summer Seminar

2020.11

State Abstraction and State Representation Learning in Reinforcement Learning
Huawei Noah's Ark Lab, Decision-making and Reasoning Group (during internship)

2019.08

Bias and Variance in Deep Reinforcement Learning
2019 TJU RL Summer Seminar


Updated by Hongyao Tang, May 2026.