Sutton and Barto Book Implementation

https://news.ycombinator.com/rss Hits: 14
Summary

Reinforcement Learning Installation $ python setup.py install Overview This repository contains code that implements algorithms and models from Sutton's book on reinforcement learning. The book, titled "Reinforcement Learning: An Introduction," is a classic text on the subject and provides a comprehensive introduction to the field. The code in this repository is organized into several modules, each of which covers differents topics. Methods Multi Armed Bandits Epsilon Greedy Optimistic Initial Values Gradient α (non stationary) Multi Armed Bandits Model Based Policy Evaluation Policy Iteration Value Iteration Model Based Monte Carlo estimation and control First-visit α-MC Every-visit α-MC MC with Exploring Starts Off-policy MC, ordinary and weighted importance sampling Monte Carlo estimation and control Temporal Difference TD(n) estimation n-step SARSA n-step Q-learning n-step Expected SARSA double Q learning n-step Tree Backup Temporal Difference Planning Dyna-Q/Dyna-Q+ Prioritized Sweeping Trajectory Sampling MCTS Planning On-policy Prediction Gradient MC $n$ -step semi-gradient TD ANN Least-Squares TD Kernel-based On-policy Prediction On-policy Control Episodic semi-gradient Semi-gradient n-step Sarsa Differential Semi-gradient n-step Sarsa On-policy Control Elegibility Traces TD( $\lambda$ ) True Online Sarsa( $\lambda$ ) True Online Sarsa( $\lambda$ ) Elegibility Traces Policy Gradient REINFORCE: Monte Carlo Policy Gradient w/wo Baseline Actor-Critic (episodic) w/wo eligibility traces Actor-Critic (continuing) with eligibility traces Policy Gradient All model free solvers will work just by defining states actions and a trasition function. Transitions are defined as a function that takes a state and an action and returns a tuple of the next state and the reward. The transition function also returns a boolean indicating whether the episode has terminated. states : Sequence [ Any ] actions : Sequence [ Any ] transtion : Callable [[ Any , Any ], Tuple [ Tuple [ Any...

First seen: 2025-05-06 23:02

Last seen: 2025-05-07 12:04