
- IntroductionVideo・3 mins
- Introduction to reinforcement learningVideo・7 mins
- Benefits of reinforcement finetuningVideo・4 mins
- Can a large language model master WordleVideo with Code Example・10 mins
- Reward functionsVideo with Code Example・10 mins
- Reward functions with LLM as a judgeVideo with Code Example・12 mins
- Reward hackingVideo with Code Example・7 mins
- Calculating loss in GRPOVideo with Code Example・18 mins
- Putting it all together: Training WordleVideo with Code Example・8 mins
- ConclusionVideo・1 min
- Appendix – Tips, Help, and DownloadCode Example・1 min