Course curriculum

  • 2

    Motivation Reinforcement Learning

    • What is Reinforcement Learning
    • What is Reinforcement Learning Hiders and Seekers by OpenAI
    • RL vs Other ML Frameworks
    • Why Reinforcement Learning
    • Examples of Reinforcement Learning
    • Limitations of Reinforcement Learning
    • Exercises
  • 3

    Terminology of Reinforcement Learning

    • What is Environment
    • What is Environment_2
    • What is Agent
    • What is State
    • State Belongs to Environment and not to Agent
    • What is Action
    • What is Reward
    • Goal
    • Policy
    • Summary
  • 4

    GridWorld Example

    • Setup 1
    • Setup 2
    • Setup 3
    • Policy Comparison
    • Deterministic Environment
    • Stochastic Environment
    • Stochastic Environment 2
    • Stochastic Environment 3
    • Non Stationary Environment
    • GridWorld Summary
    • Activity
  • 5

    Markov Decision Process Prerequisites

    • Probability
    • Probability 2
    • Probability 3
    • Conditional Probability
    • Conditional Probability Fun Example
    • Joint Probability
    • Joint probability 2
    • Joint Probability 3
    • Expected Value
    • Conditional Expectation
    • Modeling Uncertainity of Environment
    • Modeling Uncertainity of Environment 2
    • Modeling Uncertainity of Environment 3
    • Modeling Uncertainity of Environment Stochastic Policy
    • Modeling Uncertainity of Environment Stochastic Policy 2
    • Modeling Uncertainity of Environment Value Functions
    • Running Averages
    • Running Averages as Temporal Difference
    • Activity
  • 6

    Elements of Markov Decision Process

    • Markov Property
    • State Space
    • Action Space
    • Transition Probabilities
    • Reward Function
    • Discount Factor
    • Summary
    • Activity
  • 7

    More on Reword

    • MOR Quiz 1
    • MOR Quiz Solution 1
    • MOR Quiz 2
    • MOR Quiz Solution 2
    • MOR Reward Scaling
    • MOR Infinite Horizons
    • MOR Quiz 3
    • MOR Quiz Solution 3
  • 8

    Solving MDP

    • MDP Recap
    • Value Functions
    • Optimal Value Function
    • Optimal Policy
    • Balman Equation
    • Value Iteration
    • Value Iteration Quiz
    • Value Iteration Quiz Gamma Missing
    • Value Iteration Solution
    • Problems of Value Iteration
    • Policy Evaluation
    • Policy Evaluation 2
    • Policy Evaluation 3
    • Policy Evaluation Closed Form Solution
    • Policy Iteration
    • State Action Values
    • V and Q Comparisons
  • 9

    Value Approximation

    • What does it mean that MDP is Unknown
    • Why Transition Probabilities are Important
    • Model Based Solutions
    • Model Free Solutions
    • Monte-Carlo Learning
    • Monte-Carlo Learning Example
    • Monte-Carlo Learning Limitations
  • 10

    Temporal Differencing-Q Learning

    • Running Average
    • Learning Rate
    • Learning Equation
    • TD Algorithm
    • Exploration vs Exploitation
    • Epsilon Greedy Policy
    • SARSA
    • Q-Learning
    • Q-Learning Implementation for MAPROVER Clipped
  • 11

    TD Lambda

    • N Step Look a Head
    • Formulation
    • Values
    • TD Eligibility Trace
    • TD Q-Learning TD Lambda
  • 12

    Project Frozenlake (Open AI Gym)

    • Frozenlake 1
    • Frozenlake Implementation