News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
Reinforcement learning of artificial intelligence
Reinforcement learning of artificial intelligence
1. What is intensive learning
Reinforcement learning is a branch of machine learning. It is an intelligent agent who improves its behavior from time to time to interact with the environment, thus accumulating the most rewarding decision process.
When an agent completes a task, it will interact with the surrounding environment first through action A. Under the action of A and environment, intelligence will comprehend the new state and at the same time, the environment will give an immediate reward. In this cycle, the agent and the environment cease to interact from time to time to generate a lot of data. Reinforcement learning algorithm is applied to generate data correction of the action itself strategy, and interaction with the environment, and generate new data, and using the new data to further improve their behavior, after several iterative learning, intelligent fitness to learn the optimal action finally complete the task (optimal strategy).
It consists mainly of four elements, agent, environmental state, action, reward, and reinforcement learning to achieve the most cumulative rewards.
Input and output
Remember, the input of reinforcement learning is:
State (States) = environment, for example, every lattice of a maze is a state
Action (Actions) = in each state, what action is allowed
Rewards = when you enter each state, it can bring a positive or negative value (utility).
And the output is:
Plan (Policy) = which action do you choose in each state? Strategic chain
The tuple (S, A, R, P), which are the 4 elements, constitutes a system of intensive learning. In general algebra we often use this tuple method to define systems or structures.
Two, why should we strengthen learning (what problems can be dealt with by intensive study)
First of all, it is important to strengthen learning at two ten.
1. any problem that can be integrated into the environment, state, behavior and reward can be solved by this algorithm.
2. do not need artificial rules to set the original image as a state.
Deepmind's deep and intensive learning is constantly screenshots of the game screen, and then as input signal to the program, so that the program learns to play arbitrary games, and does not need any manual participation.
We can see that task oriented and goal can be described by rewards and punishment functions, which can be processed by deep and intensive learning, so its application scope is still extensive.
Game strategy
Robot control
Pilotless
Inquiry environment
Learn to walk
Reinforcement learning is a very lively and interesting category in machine learning. Compared with other learning methods, enhancing learning is closer to the essence of biological learning, and therefore, it is expected to achieve higher intelligence. This has been shown in chess games. Tesauro (1995) depicts the TD-Gammon program and has become a world-class chess player in the world by strengthening learning. After 1 million 500 thousand self generated chess exercises, the program has nearly reached the level of the best player in the world, and gained 40 sets of good results in 1 sets only by competing with the top players of the human race.
Reinforcement learning is a branch of machine learning. It is an intelligent agent who improves its behavior from time to time to interact with the environment, thus accumulating the most rewarding decision process.
When an agent completes a task, it will interact with the surrounding environment first through action A. Under the action of A and environment, intelligence will comprehend the new state and at the same time, the environment will give an immediate reward. In this cycle, the agent and the environment cease to interact from time to time to generate a lot of data. Reinforcement learning algorithm is applied to generate data correction of the action itself strategy, and interaction with the environment, and generate new data, and using the new data to further improve their behavior, after several iterative learning, intelligent fitness to learn the optimal action finally complete the task (optimal strategy).
It consists mainly of four elements, agent, environmental state, action, reward, and reinforcement learning to achieve the most cumulative rewards.
Input and output
Remember, the input of reinforcement learning is:
State (States) = environment, for example, every lattice of a maze is a state
Action (Actions) = in each state, what action is allowed
Rewards = when you enter each state, it can bring a positive or negative value (utility).
And the output is:
Plan (Policy) = which action do you choose in each state? Strategic chain
The tuple (S, A, R, P), which are the 4 elements, constitutes a system of intensive learning. In general algebra we often use this tuple method to define systems or structures.
Two, why should we strengthen learning (what problems can be dealt with by intensive study)
First of all, it is important to strengthen learning at two ten.
1. any problem that can be integrated into the environment, state, behavior and reward can be solved by this algorithm.
2. do not need artificial rules to set the original image as a state.
Deepmind's deep and intensive learning is constantly screenshots of the game screen, and then as input signal to the program, so that the program learns to play arbitrary games, and does not need any manual participation.
We can see that task oriented and goal can be described by rewards and punishment functions, which can be processed by deep and intensive learning, so its application scope is still extensive.
Game strategy
Robot control
Pilotless
Inquiry environment
Learn to walk
Reinforcement learning is a very lively and interesting category in machine learning. Compared with other learning methods, enhancing learning is closer to the essence of biological learning, and therefore, it is expected to achieve higher intelligence. This has been shown in chess games. Tesauro (1995) depicts the TD-Gammon program and has become a world-class chess player in the world by strengthening learning. After 1 million 500 thousand self generated chess exercises, the program has nearly reached the level of the best player in the world, and gained 40 sets of good results in 1 sets only by competing with the top players of the human race.