An opinionated tour through the algorithm tree of modern LLM RL — PPO, GRPO, REINFORCE, REINFORCE++, DPO, and the theoretical ideas that tie them together.
Exploring what makes AI agents truly effective for users, beyond benchmark performance.
Stop using outdated bad word lists. Use ToxicTrig instead for better toxic language analysis.