Hongyao Tang

Associate Research Fellow @ Tianjin University

📧 tanghyyy [at] gmail.com |

"Lesson 5: The shortest shortcut is the long way around.
The long way around is my shortest shortcut."

Personal Information

Hello, I am Hongyao Tang, 汤宏垚. I am an Associate Research Fellow at TJU RL Lab, College of Intelligence and Computing, Tianjin University.

Prior to this, I was a postdoctoral researcher with Professor Glen Berseth, in Robotics and Embodied AI Lab (REAL) at the Mila and Université de Montréal. I obtained my Ph.D. (Master's and Bachelor's Degree as well) in Deep Reinforcement Learning (DRL) Lab, Tianjin University advised by Professor Jianye Hao, Zhaopeng Meng, and Li Wang.

My research interests lie in unveiling the learning dynamics of Deep Reinforcement Learning (DRL) and realizing new approaches/paradigms for efficient, performant and generalizable agents. My current research focus is on Learning under Nonstationarity (i.e., one intriguing nature of RL), and concretely, I study Continual RL and RL problems in LLMs and embodied intelligence. I am also interested in Meta RL, MARL, Offline RL, Machine Unlearning, etc.

I have experience in applying DRL in practical problems like Electronic Design Automation (EDA), Drug Discovery, Online Games and etc. I am very willing to contribute to addressing real-world problems.

⚡️ Highlight

For prospective students (master, ph.d., RA, etc.) and collaboration partners, please contact me via my email (above) or contact via the homepage of TJU RL Lab.

欢迎有意向的同学（硕士、博士、研究助理等）以及有合作意向的伙伴通过上方邮箱联系我，或通过天津大学强化学习实验室主页与我们取得联系 🚗🚗🚗

🌟 News

2026.07 ACM MM 2026 One paper accepted to ACM MM 2026 on grounded embodied reasoning via pinned chain-of-thought!
2026.06 Embodied-R1.5 We released Embodied-R1.5🤖🧠, a 8B embodied foundation model that has comprehensive embodied reasoning capabilitys, which achieves SOTA performance on 16/24 embodied VLM benchmarks and can be fine-tuned into a VLA that outperforms π_0.5 across 4 popular manipulation benchmark suites
2026.05 KDD 2026 One paper accepted to KDD 2026 on Code Representation for AutoML!
2026.04 ICML 2026 Two papers accepted as ICML 2026 Spotlight💎 on DRL scaling with MOE structure and efficient non-forgetting network for continual RL!
2026.04 中国科学基金 Our survey report for Embodied Intelligence🤖 has been published on 《中国科学基金》!
2026.01 ICLR 2026 Three papers accepted to ICLR 2026 on off-policy RFT for LLM Reasoning, plasticity loss in RL and reinforced embodied reasoning for general robotic manipulation!
2025.12 AAMAS 2026 One paper accepted as AAMAS 2026 Oral🌈 on a unified theory of policy abstraction and representation!
2025.09 NeurIPS 2025 Four papers accepted to NeurIPS 2025 on LLM-driven reward function design, multi-objective RL, offline RL, and EDA floorplanning!
2025.05 RLC 2025 One paper accepted to RLC 2025 on efficient cross-morphology adaptation!
2025.05 ICML 2025 Two papers accepted to ICML 2025 on continual RL and LLM-driven reward design!
2025.01 Faculty Pos. Excited to join TJU RL Lab as an Associate Research Fellow!
2024.12 AAMAS 2025 One paper accepted to AAMAS 2025 on multi-agent RL with better sample efficiency!
2024.09 NeurIPS 2024 Two papers accepted to NeurIPS 2024 on network churn and learning dynamics of DRL!
2023.11 PostDoc Excited to join Mila/UdeM as a Postdoctoral Research Fellow! Working with Prof. Glen Berseth.
2023.06 Ph.D. Got my Ph.D. at Tianjin University! Grateful to Prof. Jianye Hao and Prof. Zhaopeng Meng.

🛣 Education & Work Experiences

2025.01 - Present Associate Research Fellow TJU RL Lab, College of Intelligence and Computing, Tianjin University

2023.11 - 2024.12 Postdoctoral Research Fellow Robotics and Embodied AI Lab (REAL), Mila/UdeM (work with Glen Berseth, and co-author with Johan Obando-Ceron, Pablo Samuel Castro, and Aaron Courville)

2019.09 - 2023.06 Ph.D. College of Intelligence and Computing, Tianjin University (advised by Jianye Hao and Zhaopeng Meng)

2020.05 - 2023.04 DRL Research Intern Noah's Ark Lab, Huawei (work with Chen Chen and Zhentao Tang)

2019.09 - 2020.04 AI Research Intern Quantum Lab, Tencent (work with Guangyong Chen)

2018.07 - 2018.10 DRL Research Intern Fuxi AI Lab, NetEase (work with Tangjie Lv)

2017.09 - 2019.07 Master College of Intelligence and Computing, Tianjin University (advised by Jianye Hao and Li Wang)

2013.09 - 2017.07 Bachelor School of Software Engineering, Tianjin University

🧠 Selected Publications & Preprints

PS: Authors with equal contribution are marked by *. First / co-first authors played by me are marked by 🔥. Corresponding authors are marked by 💧.

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models
Yifu Yuan, Yaoting Huang, Xianze Yao, Yutong Li, Shuoheng Zhang, Linqi Han, Pengyi Li, Jiangeng Sun, Wenting Jia, Zhao Zhang, Yuhao Liu, Ruihao Liao, Yucheng Hu, Qiyu Wu, Yuxiao Li, Zibin Dong, Fei Ni, Yan Zheng, Shuyang Gu, Yi Ma, Hongyao Tang^💧, Han Hu, Jianye Hao
arXiv preprint 2026 | [Paper] [Project Page] [Code] ⚡️TL;DR: Embodied-R1.5 is a unified Embodied Foundation Model that integrates comprehensive embodied reasoning capabilities within a single 8B-parameter architecture. It achieves SOTA on 16 out of 24 embodied VLM benchmarks and can be fine-tuned into a VLA that outperforms π_0.5 across 4 popular manipulation benchmark suites

The Mirage of Optimizing Training Policies: Monotonic Inference Policies as the Real Objective for LLM Reinforcement Learning
Jing Liang, Hongyao Tang^🔥, Yi Ma, Yancheng He, Weixun Wang, Xiaoyang Li, Ju Huang, Wenbo Su, Jinyi Liu, Yan Zheng, Jianye Hao, Bo Zheng
arXiv preprint 2026 | [Paper] [Project Page] ⚡️TL;DR: reveals that training-inference mismatch in LLM RL creates a persistent off-policy problem due to objective misalignment, and proposes MIPI objective with MIPU framework that selectively accepts synchronized policy updates via an inference-side gap proxy, achieving better reasoning performance and training stability

Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
Hongyao Tang^🔥, Glen Berseth
NeurIPS 2024 | [Paper] [Code]
⚡️TL;DR: identifies the chain effect of churn in DRL, reducing it mitigates the learning issues in DQN/PPO/SAC and leads to better stability and performance

Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Hongyao Tang^🔥^💧, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Glen Berseth
ICML 2025 | [Paper] [Code]
⚡️TL;DR: reveals the relationship between churn and plasticity loss, continually reducing churn mitigates plasticity loss and achieves the best overall performance across 24 envs

ScaleMoE: Mixture-of-Experts for Scalable Continuous Control in Actor-Critic Reinforcement Learning
Yi Ma, Chenjun Xiao, Hongyao Tang^💧, Yaodong Yang, Jinyi Liu, Jing Liang, Jiye Liang
ICML 2026 (Spotlight💎) | [Paper]
⚡️TL;DR: proposes a new way to scale deep AC methods with MoE architectures, outperforms SimBa and BRC in single-task RL and multi-task RL, respectively

Neuro-evolutionary Continual Reinforcement Learning
Pengyi Li, Hongyao Tang^💧, Yifu Yuan, Yan Zheng, Jianye Hao
ICML 2026 (Spotlight💎) | [Paper] ⚡️TL;DR: dynamically allocates, reuses, and re-cycles neurons in a monolithic model, guarantees no forgetting, achieves SOTA results in ContinualWorld

PACE: Unleashing the Power of Code Embeddings to Boost AutoML Agents
Gangyi Zhao, Hebin Liang, Hongyao Tang^💧, Yi Ma, Jinyi Liu, Zhaocheng Du, Yan Zheng, Chenjun Xiao, Jianye Hao
KDD 2026 | [Paper] ⚡️TL;DR: idenifies the structural locality in the multi-view low-d representation space of code solutions, leverages it to build admission strategies that realizes more efficient autoML

The Ladder in Chaos: Improving Policy Learning by Harnessing the Parameter Evolving Path in A Low-dimensional Space
Hongyao Tang^🔥, Min Zhang, Chen Chen, Jianye Hao
NeurIPS 2024 | [Paper] ⚡️TL;DR: identifies the low-rank param space of policy evolvement with temporal SVD, improves learning stability and efficiency by editing the SVD coeffs directly

Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
Jing Liang*, Jinyi Liu*, Yi Ma*, Hongyao Tang^💧, Yan Zheng, Shuyue Hu, Lei Bai, Jianye Hao
ICLR 2026 | [Paper] [Project Page] ⚡️TL;DR: revisits Generalized PPO and extends it to LLM RL post-training, achieves SOTA level math reasoning results with a 30x-450x🚀 rollout cost reduction

The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
Zihao Wu, Hongyao Tang^💧, Yi Ma, Jiashun Liu, Yan Zheng, Jianye Hao
ICLR 2026 | [Paper] ⚡️TL;DR: theoretically analyzes the learning dynamics under non-stationarity, proposes a simple plug-and-play remedy to DRL algorithms that improves TD3/SAC with SimBa in DMC

Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies
Yi Ma*, Hongyao Tang^🔥*, Chenjun Xiao, Yaodong Yang, Wei Wei, Jianye Hao, Jiye Liang
arXiv preprint 2025 | [Paper] ⚡️TL;DR: proposes a new taxonomy for RL scaling research, from four perspectives: data, network, training, and priors

Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes
Min Zhang, Hongyao Tang^💧, Jianye Hao, Yan Zheng
AAMAS 2026 (Full Paper, Oral🌈) | [Paper] ⚡️TL;DR: just as the title, proposes the systematic definition, hierarchy of policy abstraction and representation for the first time, a solute to Li et al. (2006)

Efficient Morphology-Aware Policy Transfer to New Embodiments
Michael Przystupa, Hongyao Tang, Glen Berseth, Mariano Phielipp, Santiago Miret, Martin Jägersand, Matthew E. Taylor
RLC 2025 | [Paper] ⚡️TL;DR: investigates different param-effient fine-tuning (PeFT) methods for multi-task policies of locomotion robots

Can We Optimize Deep RL Policy Weights as Trajectory Modeling?
Hongyao Tang^🔥
ICLR 2025 Workshop on Weight Space Learning | [Paper] ⚡️TL;DR: since DRL policies evolve in a low-d space, we can use the low-d representation and convert policy optimization into a trajectory prediction problem

EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search
Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zheng
ICML 2024 | [Paper] [Code] ⚡️TL;DR: just like Rainbow(-DQN), this is a high-performance ERL algorithm along with comprehensive comparison among algorithmic choices

Reining Generalization in Offline Reinforcement Learning via Representation Distinction
Yi Ma, Hongyao Tang^💧, Dong Li, Zhaopeng Meng
NeurIPS 2023 | [Paper] ⚡️TL;DR: cuts the unrestricted generalization from in-d to ood, this simplifies algorithmic design while achieving good performance for offline RL

HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation
Boyan Li*, Hongyao Tang^🔥*, Yan Zheng, Jianye Hao, Pengyi Li, Zhen Wang, Zhaopeng Meng, Li Wang
ICLR 2022 | [Paper] [Code] ⚡️TL;DR: learns a unified self-supervised representation for hybrid actions, converts hybrid-action RL back to normal RL, thus solving high-d hybrid RL problems effectively

What about Inputting Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator
Hongyao Tang^🔥, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, Li Wang
AAAI 2022 Oral🌈 Presentation (< 5%) | [Paper] [Code] ⚡️TL;DR: proposes a new RL paradigm that allows value generalization among policies and utilize it to improve learning efficiency, learns low-d representation for NN policies

More Publications

Academic Service

Reviewer

RLC

2024 - 2026

NeurIPS

2021 - 2025 (Top Reviewer Award at NeurIPS 2022)

ICLR

2022 - 2026 (Highlighted Reviewer Award at ICLR 2022)

ICML

2021 - 2026

AAMAS

2021 - 2024

Nature Communications

Transactions on Machine Learning Research (TMLR)

IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

Invited Talks

2025.03

RL in the Era of Large Models
2025 IEEE International Conference on Industrial Technology

2024.10

Where is the Road to Flawless RL? — Unsolved Problems and New Approaches
Google DeepMind, Discovery Team London (Remote)

2023.05

A Tale of Representations in Deep Reinforcement Learning
Robotics and Embodied AI Lab (REAL), the Université de Montréal

2022.07

Towards Understanding The Learning Dynamics of Deep Reinforcement Learning
Huawei Noah's Ark Lab, Decision-making and Reasoning Group (during internship)

2021.10

Self-supervised Reinforcement Learning — A Perspective of Representation
2021 TJU RL Summer Seminar

2020.11

State Abstraction and State Representation Learning in Reinforcement Learning
Huawei Noah's Ark Lab, Decision-making and Reasoning Group (during internship)

2019.08

Bias and Variance in Deep Reinforcement Learning
2019 TJU RL Summer Seminar

Updated by Hongyao Tang, May 2026.