Xuhui Zhou
AboutPublicationsCVBlogMore

Blog

Thinking in RL

Apr 13, 2026

An opinionated tour through the algorithm tree of modern LLM RL — PPO, GRPO, REINFORCE, REINFORCE++, DPO, and the theoretical ideas that tie them together.

The Quest of User-Effective AI Agents

Nov 2, 2025

Exploring what makes AI agents truly effective for users, beyond benchmark performance.

The overlooked "bad" word list ☠️

Dec 15, 2024

Stop using outdated bad word lists. Use ToxicTrig instead for better toxic language analysis.