AI robots achieve visual adaptation

News classification

Contact us

Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
Tel: 13146317170 廖经理
Fax:
Email: 398017534@qq.com

People are very good at manipulating objects without having to adjust the angle of view to a fixed or specific location. This genius (called visual motion integration) is learned during childhood as a result of manipulating objects in a variety of contexts and is controlled by a self-compliant error correction mechanism that uses rich sensory signals and vision as a response. However, with regard to vision-based controllers in robotics, it is very difficult to have this talent.

Until now, such controllers have been based on a fixed installation for reading visual input data from a fixed-mount camera that cannot be moved or re-adjusted during exercise and testing. The ability to quickly acquire visual motion control skills under conditions of dramatic changes in perspective will have a serious impact on the autonomous robotic system. For example, this kind of talent is especially necessary for robots involved in emergency or disaster relief operations.

At this week's CVPR 2018 conference, we submitted a paper called "Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control." In this paper, we explore a new deep network architecture (consisting of two complete convolutional networks and one long and short-term memory unit) that can learn from past actions and view results to stop self-calibration. Our visual compliant web application uses a variety of imitation data consisting of presentation trajectories and enhanced learning objectives. It can control the robotic arm to various visual indications from various perspectives and is independent of camera calibration.

Write a picture here

Stereoscopic operation with physical mechanical arm to reach the visual indication purpose
We learned a strategy that can reach different purposes through sensory input captured from a very different camera perspective.
The first line shows the purpose of visual indication

Fight
The effect of a controllable free level (DoF) on visual motion through a single image captured from an unknown perspective may not be sufficiently clear and detailed. The affirmative action on the image-space motion and the task required to successfully execute requires a powerful perception system with the ability to remember past actions. To deal with this challenging issue, we must deal with the following fundamental issues:

• How to provide the right experience for robots to learn self-adapting behavior on the basis of pure visual inspection that mimics the lifelong learning paradigm?

• How to design a model that combines powerful perception and self-consistent control and can quickly move to an unknown environment?

To this end, we have devised a new operational task to provide an image of an object for a seven-free manipulator and to instruct it to acquire a specific object in a series of interferers, while the angle of view of each experiment will occur. Great changes. In this way, we can imitate the learning of complex behaviors and the transfer to an unknown environment.

Complete the task of reaching the visual indication with the physics arm and various camera angles of view

Write a picture here

Application simulation to learn complex behavior
It takes time and effort to collect robots to experience data. In the previous blog post, we showed how to extend learning skills by distributing data collection and experimentation to multiple robots. Although this approach speeds up learning, it is still not feasible to learn complex behaviors such as visual self-calibration. When learning complex behaviors, we need to place the robot in a large space containing various perspectives.

Thus, we chose to learn such complex behaviors in imitation, we were able to collect unlimited experimental data from the robot and easily move the camera to random angles. In addition to quickly collecting data in the imitation, we are also able to get rid of the hardware limitations of installing multiple cameras around the robot.

We use domain randomization techniques in imitation to learn generalizable strategies.

In order to learn the powerful visual features that are to be transferred to the unknown environment, we used a technique called “domain randomization” (also known as “imitation randomization”) proposed by Sadeghi & Levine in 2017 to make the robot fully mimicked. Learn about vision-based strategies that can be generalized to the ideal world. This technology has proven to be suitable for a variety of robotic tasks such as indoor navigation, object positioning, and selection and placement. In addition, in order to learn the complex behaviors such as self-calibration, we apply the simulation function to generate a synthetic presentation and separate the reinforcement learning purpose to learn a powerful robotic arm controller.

Write a picture here
Using the simulated seven-degree-of-freedom manipulator to reach the visual indication purpose
We learned a strategy that can reach different purposes through sensory input captured from a very different camera perspective.

Separate perception and control
In order to quickly move to an unknown environment, we designed a deep neural network that separates perception and control while stopping end-to-end exercise and allowing the two to stop learning separately if necessary. It is easy to transfer perception and control to an unknown environment, and makes the model both sensitive and efficient, since each of its parts (ie, "perception" or "control") can adapt to the new environment with a small amount of data.

In addition, although the control of the network partially uses the imitation data exercise, the perception of the network is partially supplemented by collecting a small number of static images with the object border frame, and it is not required to collect the entire motion sequence track by the physical robot. In theory, we only used 76 object border boxes from 22 images to fine tune the perceived part of the network.
Write a picture here
The first row of the ideal world robot and moving camera settings is shown as the scene layout, and the second behavior is the visual sensory input of the robot.

Early results
We tested the visually compliant version of the network on physical robots and real objects, the appearance of these objects being different from the ones used in the imitation. In the experiment, one or two objects - "seen objects" (shown below) are used on the table for visual compliance. The experiment uses a small static real image set. I did not see "unseen objects" during visual compliance. During the test, the robotic arm is instructed to reach the visual indication object from various angles of view. For the two-object experiment, the second object is used to "confuse" the robotic arm.

Because the pure imitation network has good generalization ability (because it uses the domain randomization technology to stop exercising), and our network architecture is very sensitive, although the experiment only collects a very small amount of static visual data for visual use. In response, the performance of the controller is still greatly improved.

Write a picture here
Performance improved by more than 10% after using a small number of real images to stop visual feature compliance. All the real objects used are completely different from the objects seen in the imitation.

We believe that learning online visual adaptation is an important and challenging topic. The goal is to learn a generalization strategy that allows robots to operate in a diverse, non-structural and ideal world. Our approach can be extended to any type of automatic self-calibration.

PREVIOUS：Machine learning overview NEXT：Mobile OCR ID Identification Technology