I am a Ph.D. candidate in Computer Science at the University of Waterloo and the Vector Institute, supervised by Prof. Pascal Poupart. My research focuses on cooperative and safe agentic AI, spanning reinforcement learning, large language models, mechanism design, information design, and game theory. I am particularly interested in designing mechanisms and algorithms that promote cooperation, alignment, and safety in mixed-motive multi-agent systems, including both RL-driven and generative agents.


You can reach me at shuhui [dot] zhu [at] uwaterloo [dot] ca.

News

Publications

Paper Figure Paper Figure

Talk, Judge, Cooperate: Gossip-Driven Indirect Reciprocity in Self-Interested LLM Agents

Shuhui Zhu, Yue Lin, Shriya Kaistha, Wenhao Li, Baoxiang Wang, Hongyuan Zha, Gillian K Hadfield, Pascal Poupart
ICML, 2026
Paper | Code | Talk

We introduce public gossip as a decentralized reputation mechanism that enables self-interested LLM agents to cooperate in mixed-motive settings. Building on this idea, our ALIGN framework uses hierarchical gossip to assess trustworthiness, sustain reciprocity, and reliably exclude defectors.

Paper Figure

Learning to Negotiate via Voluntary Commitment

Shuhui Zhu, Baoxiang Wang, Sriram Ganapathi Subramanian, Pascal Poupart
AISTATS, 2025
Paper | Code | Talk | Poster

We present a novel framework where RL agents can propose and voluntarily commit to actions in strategic interactions, improving cooperation among self-interested agents in challenging mixed-motive environments.

Paper Figure

The Reciprocity Gradient

Yue Lin, Pascal Poupart, Shuhui Zhu, Dan Qiao, Wenhao Li, Yuan Liu, Hongyuan Zha, Baoxiang Wang
Working Paper
Paper

We introduce the reciprocity gradient, a novel method for learning cooperative policies in multi-agent environments by explicitly backpropagating reward gradients through private estimators of opponents' policies, enabling agents to account for the complex influence of their actions on others' reputations and future rewards without relying on intrinsic rewards or reward shaping.

Paper Figure

Policy-Conditioned Policies for Multi-Agent Task Solving

Yue Lin, Shuhui Zhu, Wenhao Li, Ang Li, Dan Qiao, Pascal Poupart, Hongyuan Zha, Baoxiang Wang
Working Paper
Paper

We introduce Policy-Conditioned Policies, a paradigm that represents multi-agent strategies as human-interpretable code and leverages Large Language Models to iteratively synthesize and optimize these programmatic policies for adaptive task solving.

Paper Figure

Information Bargaining: Bilateral Commitment in Bayesian Persuasion

Yue Lin, Shuhui Zhu, William A Cunningham, Wenhao Li, Pascal Poupart, Hongyuan Zha, Baoxiang Wang
Working Paper
Paper

This paper reframes Bayesian persuasion as an information bargaining problem to address its complexity in long-term interactions. Unlike one-sided commitment models, the proposed framework enables fairer and more efficient cooperation by balancing the sender's and receiver's roles. Empirical validation using LLMs confirms the framework’s predictions.

Paper Figure

Altared Environments: The Role of Normative Infrastructure in AI Alignment

Rakshit Trivedi, Nikhil Chandak, Andrei Ioan Muresanu, Shuhui Zhu, Atrisha Sarkar, Joel Z Leibo, Dylan Hadfield-Menell, Gillian K Hadfield
Submitted to ICLR, 2024
Paper

We propose Altared Games, a novel Markov game framework integrating a classification institution to enable AI agents to adapt to dynamic norms, demonstrating its effectiveness in enhancing cooperation and social welfare in multi-agent reinforcement learning environments.

Paper Figure

Spline Parameterization for Continuous Normalizing Flows

Shuhui Zhu
Master's Thesis, 2021
Thesis

I develop a Spline-based parameterization method for Continuous Normalizing Flows using Neural ODEs, formulating the problem as an optimal control task to efficiently learn time-dependent patterns while reducing computational cost and maintaining accuracy.