日韩三级在线_国产精品3区_亚洲精品a_成人网页_国产成人精品久久_国产精品国产精品国产专区不片

課程目錄: 基于函數逼近的預測與控制培訓
4401 人關注
(78637/99817)
課程大綱:

    基于函數逼近的預測與控制培訓

 

 

 

Welcome to the Course!

Welcome to the third course in the Reinforcement Learning Specialization:

Prediction and Control with Function Approximation, brought to you by the University of Alberta,

Onlea, and Coursera.

In this pre-course module, you'll be introduced to your instructors,

and get a flavour of what the course has in store for you.

Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

On-policy Prediction with Approximation

This week you will learn how to estimate a value function for a given policy,

when the number of states is much larger than the memory available to the agent.

You will learn how to specify a parametric form of the value function,

how to specify an objective function, and how estimating gradient descent can be used to estimate values from interaction with the world.

Constructing Features for Prediction

The features used to construct the agent’s value estimates are perhaps the most crucial part of a successful learning system.

In this module we discuss two basic strategies for constructing features: (1) fixed basis that form an exhaustive partition of the input,

and (2) adapting the features while the agent interacts with the world via Neural Networks and Backpropagation.

In this week’s graded assessment you will solve a simple but infinite state prediction task with a Neural Network and

TD learning.Control with ApproximationThis week,

you will see that the concepts and tools introduced in modules two and three allow straightforward extension of classic

TD control methods to the function approximation setting. In particular,

you will learn how to find the optimal policy in infinite-state MDPs by simply combining semi-gradient

TD methods with generalized policy iteration, yielding classic control methods like Q-learning, and Sarsa.

We conclude with a discussion of a new problem formulation for RL---average reward---which will undoubtedly

be used in many applications of RL in the future.

Policy GradientEvery algorithm you have learned about so far estimates

a value function as an intermediate step towards the goal of finding an optimal policy.

An alternative strategy is to directly learn the parameters of the policy.

This week you will learn about these policy gradient methods, and their advantages over value-function based methods.

You will also learn how policy gradient methods can be used

to find the optimal policy in tasks with both continuous state and action spaces.

主站蜘蛛池模板: 四虎影视最新网址 | 国产精品女同一区二区久久 | 国产亚洲综合成人91精品 | 国产视频a| 久久精品国产亚洲 | 国产精品亚洲精品日韩动图 | 亚洲第一页在线 | 久久久久久亚洲精品不卡 | 精品国产一区二区三区久久久蜜臀 | 久久精品a一国产成人免费网站 | 亚洲国产第一页 | 精品免费在线 | 国产福利一区二区三区在线视频 | 亚洲欧美日韩另类精品一区二区三区 | 欧美日韩在线国产 | 欧美日韩小视频 | 国产精品视频播放 | 亚洲小说欧美激情另类 | a一区二区三区视频 | 高清国产美女一级a毛片 | 欧美区一区二区三 | 欧洲亚洲一区 | 国产一区亚洲二区三区 | 亚洲永久精品一区二区三区 | 国产亚洲美女精品久久久2020 | 欧美日韩国产一区二区 | 国产欧美日韩精品第二区 | 亚洲国产视频网 | 欧美 亚洲 中文字幕 | 中日韩美中文字幕 | 亚洲一区 中文字幕 | 好看的电影网站亚洲一区 | 国产区精品 | 国产欧美在线观看视频 | 日韩在线电影 | 中文字幕 自拍偷拍 | 成人国产一区二区 | 国内精品伊人久久大香线焦 | 天天操天天曰 | 精品一区二区三区三区 | 国产人成久久久精品 |