日韩三级在线_国产精品3区_亚洲精品a_成人网页_国产成人精品久久_国产精品国产精品国产专区不片

課程目錄: 基于樣本的學(xué)習(xí)方法培訓(xùn)
4401 人關(guān)注
(78637/99817)
課程大綱:

    基于樣本的學(xué)習(xí)方法培訓(xùn)

 

 

 

Welcome to the Course!
Welcome to the second course in the Reinforcement Learning Specialization:
Sample-Based Learning Methods, brought to you by the University of Alberta,
Onlea, and Coursera.
In this pre-course module, you'll be introduced to your instructors,
and get a flavour of what the course has in store for you.
Make sure to introduce yourself to your classmates in the "Meet and Greet" section!
Monte Carlo Methods for Prediction & Control
This week you will learn how to estimate value functions and optimal policies,
using only sampled experience from the environment.
This module represents our first step toward incremental learning methods
that learn from the agent’s own interaction with the world,
rather than a model of the world.
You will learn about on-policy and off-policy methods for prediction
and control, using Monte Carlo methods---methods that use sampled returns.
You will also be reintroduced to the exploration problem,
but more generally in RL, beyond bandits.
Temporal Difference Learning Methods for Prediction
This week, you will learn about one of the most fundamental concepts in reinforcement learning:
temporal difference (TD) learning.
TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods.
TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world,
and do not require knowledge of the model.
TD methods are similar to DP methods in that they bootstrap,
and thus can learn online---no waiting until the end of an episode.
You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping.
For this module, we first focus on TD for prediction, and discuss TD for control in the next module.
This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.
Temporal Difference Learning Methods for ControlThis week,
you will learn about using temporal difference learning for control,
as a generalized policy iteration strategy.
You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa,
Q-learning and Expected Sarsa. You will see some of the differences between
the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both.
You will implement Expected Sarsa and Q-learning, on Cliff World.
Planning, Learning & ActingUp until now,
you might think that learning with and without a model are two distinct,
and in some ways, competing strategies: planning with
Dynamic Programming verses sample-based learning via TD methods.
This week we unify these two strategies with the Dyna architecture.
You will learn how to estimate the model from data and then use this model
to generate hypothetical experience (a bit like dreaming)
to dramatically improve sample efficiency compared to sample-based methods like Q-learning.
In addition, you will learn how to design learning systems that are robust to inaccurate models.

主站蜘蛛池模板: 97精品国产91久久久久久久 | 在线免费观看国产精品 | 欧美综合国产精品日韩一 | 在线亚洲激情 | 亚洲精品911 | 影音先锋中文字幕在线 | 欧美日韩影院 | 在线v片| 日本aⅴ精品一区二区三区久久 | 欧美日韩视频在线 | 久久精品一区二区三区不卡牛牛 | 亚洲欧美日本在线 | 国内一区亚洲综合图区欧美 | 国产一区二区三区在线看 | 久久频道毛片免费不卡片 | 中文字幕免费 | 伊人色综合一区二区三区 | 日韩国产欧美一区二区三区 | 国产日韩一区二区三区在线观看 | 91视频久久| 欧美成人看片一区二区三区尤物 | 国产成人精品免费视频大 | 精品国产亚洲一区二区三区 | 在线观着免费观看国产黄 | 91九色国产porny | 欧美一区二区三区视频在线 | 高清在线一区二区 | 久久99精品一区二区三区 | 成人无高清96免费 | 看全黄大色大黄美女 | 国产免费一区二区三区免费视频 | 欧美第一区| 国产成人精品aaaa视频一区 | 美女牲交视频一级毛片 | 另类日韩 | 性欧美激情xxxd | 久久一区二区三区精品 | 亚洲欧美日韩精品永久在线 | 曰韩欧美| 国产最新网站 | 亚洲视频入口 |