Modern Reinforcement Learning: Actor-Critic Algorithms

LeeAndro · Sep 14, 2022

Last updated 10/2020MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHzLanguage: English | Size: 3.24 GB | Duration: 8h 9m

How to Implement Cutting Edge Artificial Intelligence Research Papers in the Open AI Gym Using the PyTorch Framework

What you'll learn
How to code policy gradient methods in PyTorch
How to code Deep Deteistic Policy Gradients (DDPG) in PyTorch
How to code Twin Delayed Deep Deteistic Policy Gradients (TD3) in PyTorch
How to code actor critic algorithms in PyTorch
How to implement cutting edge artificial intelligence research papers in Python
Requirements
Understanding of college level calculus
Prior courses in reinforcement learning
Able to code deep neural networks independently
Description
In this advanced course on deep reinforcement learning, you will learn how to implement policy gradient, actor critic, deep deteistic policy gradient (DDPG), twin delayed deep deteistic policy gradient (TD3), and soft actor critic (SAC) algorithms in a variety of challeg environments from the Open AI gym.

There will be a strong focus on dealing with environments with continuous action spaces, which is of particular interest for those looking to do research into robotic control with deep reinforcement learning.Rather than being a course that spoon feeds the student, here you are going to learn to read deep reinforcement learning research papers on your own, and implement them from scratch. You will learn a repeatable framework for quickly implementing the algorithms in advanced research papers. Mastering the content in this course will be a quantum leap in your capabilities as an artificial intelligence eeer, and will put you in a league of your own among students who are reliant on others to break down complex ideas for them.Fear not, if it's been a while since your last reinforcement learning course, we will b with a briskly paced review of core topics.The course bs with a practical review of the fundamentals of reinforcement learning, including topics such as:The Bellman EquationMarkov Decision ProcessesMonte Carlo PredictionMonte Carlo ControlTemporal Difference Prediction TD(0)Temporal Difference Control with Q LearningAnd moves straight into coding up our first agent: a blackjack playing artificial intelligence. From there we will progress to teaching an agent to balance the cart pole using Q learning. After mastering the fundamentals, the pace quickens, and we move straight into an introduction to policy gradient methods. We cover the REINFORCE algorithm, and use it to teach an artificial intelligence to land on the moon in the lunar lander environment from the Open AI gym. Next we progress to coding up the one step actor critic algorithm, to again beat the lunar lander.With the fundamentals out of the way, we move on to our harder projects: implementing deep reinforcement learning research papers. We will start with Deep Deteistic Policy Gradients (DDPG), which is an algorithm for teaching robots to excel at a variety of continuous control tasks. DDPG combines many of the advances of Deep Q Learning with traditional actor critic methods to achieve state of the art results in environments with continuous action spaces.Next, we implement a state of the art artificial intelligence algorithm: Twin Delayed Deep Deteistic Policy Gradients (TD3). This algorithm sets a new benchmark for performance in continuous robotic control tasks, and we will demonstrate world class performance in the Bipedal Walker environment from the Open AI gym. TD3 is based on the DDPG algorithm, but addresses a number of approximation issues that result in poor performance in DDPG and other actor critic algorithms.Finally, we will implement the soft actor critic algorithm (SAC). SAC approaches deep reinforcement learning from a totally different angle: by considering entropy maximization, rather than score maximization, as a viable objective. This results in increased exploration by our agent, and world class performance in a number of important Open AI Gym environments.By the end of the course, you will know the answers to the following fundamental questions in Actor-Critic methods:Why should we bother with actor critic methods when deep Q learning is so successfulCan the advances in deep Q learning be used in other fields of reinforcement learningHow can we solve the explore-exploit dilemma with a deteistic policyHow do we get and deal with overestimation bias in actor-critic methodsHow do we deal with the inherent approximation errors in deep neural networksThis course is for the highly motivated and advanced student. To succeed, you must have prior course work in all the following topics:College level calculusReinforcement learningDeep learningThe pace of the course is brisk and the topics are at the cutting edge of deep reinforcement learning research, but the payoff is that you will come out knowing how to read research papers and turn them into functional code as quickly as possible. You'll never have to rely on dodgy medium blog posts again.

Overview

Section 1: Introduction

Lecture 1 What You Will Learn in this Course

Lecture 2 Required Background, Software, and Hardware

Lecture 3 How to Succeed in this Course

Section 2: Fundamentals of Reinforcement Learning

Lecture 4 Review of Fundamental Concepts

Lecture 5 Teaching an AI about Black Jack with Monte Carlo Prediction

Lecture 6 Teaching an AI How to Play Black Jack with Monte Carlo Control

Lecture 7 Review of Temporal Difference Learning Methods

Lecture 8 Teaching an AI about Balance with TD(0) Prediction

Lecture 9 Teaching an AI to Balance the Cart Pole with Q Learning

Section 3: Landing on the Moon with Policy Gradients & Actor Critic Methods

Lecture 10 What's so Great About Policy Gradient Methods

Lecture 11 Combining Neural Networks with Monte Carlo: REINFORCE Policy Gradient Algorithm

Lecture 12 Introducing the Lunar Lander Environment

Lecture 13 Coding the Agent's Brain: The Policy Gradient Network

Lecture 14 Coding the Policy Gradient Agent's Basic Functionality

Lecture 15 Coding the Agent's Learn Function

Lecture 16 Coding the Policy Gradient Main Loop and Watching our Agent Land on the Moon

Lecture 17 Actor Critic Learning: Combining Policy Gradients & Temporal Difference Learning

Lecture 18 Coding the Actor Critic Networks

Lecture 19 Coding the Actor Critic Agent

Lecture 20 Coding the Actor Critic Main Loop and Watching Our Agent Land on the Moon

Section 4: Deep Deteistic Policy Gradients (DDPG): Actor Critic with Continuous Actions

Lecture 21 Getting up to Speed With Deep Q Learning

Lecture 22 How to Read and Understand Cutting Edge Research Papers

Lecture 23 Analyzing the DDPG Paper Abstract and Introduction

Lecture 24 Analyzing the Background Material

Lecture 25 What Algorithm Are We Going to Implement

Lecture 26 What Results Should We Expect

Lecture 27 What Other Solutions are Out There

Lecture 28 What Model Architecture and Hyperparameters Do We Need

Lecture 29 Handling the Explore-Exploit Dilemma: Coding the OU Action Noise Class

Lecture 30 Giving our Agent a Memory: Coding the Replay Memory Buffer Class

Lecture 31 Deep Q Learning for Actor Critic Methods: Coding the Critic Network Class

Lecture 32 Coding the Actor Network Class

Lecture 33 Giving our DDPG Agent Simple Autonomy: Coding the Basic Functions of Our Agent

Lecture 34 Giving our DDPG Agent a Brain: Coding the Agent's Learn Function

Lecture 35 Coding the Network Parameter Update Functionality

Lecture 36 Coding the Main Loop and Watching Our DDPG Agent Land on the Moon

Section 5: Twin Delayed Deep Deteistic Policy Gradients (TD3)

Lecture 37 Some Tips on Reading this Paper

Lecture 38 Analyzing the TD3 Paper Abstract and Introduction

Lecture 39 What Other Solutions Have People Tried

Lecture 40 Reviewing the Fundamental Concepts

Lecture 41 Is Overestimation Bias Even a Problem in Actor-Critic Methods

Lecture 42 Why is Variance a Problem for Actor-Critic Methods

Lecture 43 What Results Can We Expect

Lecture 44 Coding the Brains of the TD3 Agent - The Actor and Critic Network Classes

Lecture 45 Giving our TD3 Agent Simple Autonomy - Coding the Basic Agent Functionality

Lecture 46 Giving our TD3 Agent a Brain - Coding the Learn Function

Lecture 47 Coding the Network Parameter Update Functionality

Lecture 48 Coding the Main Loop And Watching our Agent Learn to Walk

Section 6: Soft Actor Critic

Lecture 49 A Quick Word on the Paper

Lecture 50 Getting Acquainted With a New Framework

Lecture 51 Checking Out What Has Been Done Before

Lecture 52 Inspecting the Foundation of this New Framework

Lecture 53 Digging Into the Mathematics of Soft Actor Critic

Lecture 54 Seeing How the New Algorithm Measures Up

Lecture 55 Coding the Neural Networks

Lecture 56 Coding the Soft Actor Critic Basic Functionality

Lecture 57 Coding the Soft Actor Critic Algorithm

Lecture 58 Coding the Main Loop and Evaluating Our Agent

Advanced students of artificial intelligence who want to implement state of the art acad research papers

HomePage:

Code:

Https://anonymz.com/https://www.udemy.com/course/actor-critic-methods-from-paper-to-code-with-pytorch/

DOWNLOAD

Code:

https://1dl.net/3m3ml5a3gtgj/TzR8IMIt__Modern_Rei.part1.rar.html
https://1dl.net/0es4ux2a4e52/TzR8IMIt__Modern_Rei.part2.rar.html
https://1dl.net/6vzvm0a5lz70/TzR8IMIt__Modern_Rei.part3.rar.html
https://1dl.net/gg4woxpvs1u0/TzR8IMIt__Modern_Rei.part4.rar.html

Search

Search

Modern Reinforcement Learning: Actor-Critic Algorithms

LeeAndro

Trusted Editor