Course curriculum
-
1
Introduction to Course and Instructor
- Overview on Reinforcement Learning Course FREE PREVIEW
- Introduction to Course and Instructor FREE PREVIEW
- Introduction to Instructor
-
2
Motivation Reinforcement Learning
- What is Reinforcement Learning
- What is Reinforcement Learning Hiders and Seekers by OpenAI
- RL vs Other ML Frameworks
- Why Reinforcement Learning
- Examples of Reinforcement Learning
- Limitations of Reinforcement Learning
- Exercises
-
3
Terminology of Reinforcement Learning
- What is Environment
- What is Environment_2
- What is Agent
- What is State
- State Belongs to Environment and not to Agent
- What is Action
- What is Reward
- Goal
- Policy
- Summary
-
4
GridWorld Example
- Setup 1
- Setup 2
- Setup 3
- Policy Comparison
- Deterministic Environment
- Stochastic Environment
- Stochastic Environment 2
- Stochastic Environment 3
- Non Stationary Environment
- GridWorld Summary
- Activity
-
5
Markov Decision Process Prerequisites
- Probability
- Probability 2
- Probability 3
- Conditional Probability
- Conditional Probability Fun Example
- Joint Probability
- Joint probability 2
- Joint Probability 3
- Expected Value
- Conditional Expectation
- Modeling Uncertainity of Environment
- Modeling Uncertainity of Environment 2
- Modeling Uncertainity of Environment 3
- Modeling Uncertainity of Environment Stochastic Policy
- Modeling Uncertainity of Environment Stochastic Policy 2
- Modeling Uncertainity of Environment Value Functions
- Running Averages
- Running Averages as Temporal Difference
- Activity
-
6
Elements of Markov Decision Process
- Markov Property
- State Space
- Action Space
- Transition Probabilities
- Reward Function
- Discount Factor
- Summary
- Activity
-
7
More on Reword
- MOR Quiz 1
- MOR Quiz Solution 1
- MOR Quiz 2
- MOR Quiz Solution 2
- MOR Reward Scaling
- MOR Infinite Horizons
- MOR Quiz 3
- MOR Quiz Solution 3
-
8
Solving MDP
- MDP Recap
- Value Functions
- Optimal Value Function
- Optimal Policy
- Balman Equation
- Value Iteration
- Value Iteration Quiz
- Value Iteration Quiz Gamma Missing
- Value Iteration Solution
- Problems of Value Iteration
- Policy Evaluation
- Policy Evaluation 2
- Policy Evaluation 3
- Policy Evaluation Closed Form Solution
- Policy Iteration
- State Action Values
- V and Q Comparisons
-
9
Value Approximation
- What does it mean that MDP is Unknown
- Why Transition Probabilities are Important
- Model Based Solutions
- Model Free Solutions
- Monte-Carlo Learning
- Monte-Carlo Learning Example
- Monte-Carlo Learning Limitations
-
10
Temporal Differencing-Q Learning
- Running Average
- Learning Rate
- Learning Equation
- TD Algorithm
- Exploration vs Exploitation
- Epsilon Greedy Policy
- SARSA
- Q-Learning
- Q-Learning Implementation for MAPROVER Clipped
-
11
TD Lambda
- N Step Look a Head
- Formulation
- Values
- TD Eligibility Trace
- TD Q-Learning TD Lambda
-
12
Project Frozenlake (Open AI Gym)
- Frozenlake 1
- Frozenlake Implementation