News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
The application of deep learning in target tracking
The application of deep learning in target tracking
The deep learning lecture hall is a platform for high-quality original content. We invite experts from the academic circles and industry to write articles, and strive to push artificial intelligence and deep learning of the latest technology, products and activities information.
Before beginning this article, we first look at the 3 pictures given above, which are the first, 40, and 80 frames of the same video. After a runner's border (bounding-box) was given in the first frame, the following fortieth frames, 80 frames, and the bounding-box still accurately circled the same runner. The above display is actually the process of visual object tracking. Target tracking (especially single purpose tracking) means that the initial state, such as location and size, of the object in the first frame of tracking video is given, and the state of the object in subsequent frames is automatically estimated.
Human eyes can be more relaxed with a particular purpose in a period of time. But for machine, this task is not simple, especially in the tracking process, it will show various complex situations such as the purpose of attack, violent deformation, being blocked by other purposes or showing similar objects interference. Over the past decades, the research of target tracking has made great progress. Especially since all kinds of machine learning algorithms have been introduced, the purpose tracking algorithm presents a trend of letting flowers bloom. Since 2013, deep learning methods have begun to show their head in the category of target tracking, and gradually surpass the traditional methods in performance and have made great breakthroughs. This paper first introduces the mainstream traditional target tracking method, then introduces the algorithm of target tracking based on deep learning, and concludes the application of deep learning in target tracking.
Classical target tracking method
At present, the tracking algorithm can be divided into two categories: generative model and discriminative model.
The generation method uses the generation model to describe the apparent feature of the target, and then to minimize the reconfiguration error through the search candidate. More representative algorithms are sparse coding, online density estimation and principal component analysis (PCA). The production method focuses on the description of the purpose itself, neglects the background information, and easily produces drifting when the purpose changes violently or is blocked.
In contrast, the discriminant method is used to distinguish between the purpose and the background through an exercise classifier. This method is also often referred to as tracking-by-detection. In recent years, various machine learning algorithms have been applied to discriminant methods. There are more representative multiple instance learning, boosting and structural SVM (structured SVM). The discriminant method is more robust because it distinguishes the information from the background and foreground, and gradually occupies the mainstream position in the target tracking category. It is worth mentioning that most of the current deep learning goal tracking methods are also attributable to the discriminant framework.
In recent years, the tracking method based on the correlation filtering (correlation filter) has absorbed the attention of many researchers because of its fast speed and good effect. The related filters exercise the filters by returning the input features to the purpose of the Gauss distribution. In the follow-up tracking, the peak of the response in the prediction distribution is found to locate the destination position. The associated filter is skillfully applied to the fast Fu Liye transform to gain a large amplitude. At present, there are many ways to expand correlation filtering, including kernelized correlation filter (KCF), and correlation filter (DSST) with scale estimation.
A target tracking method based on depth learning
Different from the trend of the detection and identification of visual category deep learning rule, deep learning in the category of applications is not easy to track. The main problem is the lack of training data: the magic of deep models comes from effective learning of a large number of annotated exercise data, while target tracking provides only the first frame of bounding-box as exercise data. In this case, it is difficult to exercise a depth model from the beginning of the tracking beginning to the current purpose. At present, based on deep learning objective tracking algorithm using several thoughts to deal with this problem, the following will be based on the thoughts of the different development introduction, and finally introduce the current tracking using recurrent neural network (recurrent neural network category of the present new thoughts) disposal tracking problem.
Use auxiliary picture data to pre exercise depth model and tune on line
In the exercise of data tracking in very limited circumstances, the use of non data aided tracking exercise to suspend pre exercise, to obtain common object features (general representation) said, in the theory of tracking, the pre exercise model tuning after application of current tracking to limited sample information (fine-tune), so that the model has better classification performance the current tracking, this transfer learning thoughts greatly reduced on the tracking exercise sample needs, but also improve the performance of the tracking algorithm.
The representative works in this area are DLT and SO-DLT, both from Dr. Wang Naiyan of Hong Kong University Science & Technology.
DLT (NIPS2013) Learning a Deep Compact Image Representation for Visual TrackingDLT is the first tracking algorithm that applies depth models to single purpose tracking tasks. Its main thoughts are as shown above.
(1) using stack noise (stacked denoising autoencoder, the auto encoder SDAE Tiny Images dataset) in such a wide range of natural image data sets to suspend the off-line pre exercise no surveillance to obtain object representations of general ability. The network structure of the pre exercise, as shown in B, is a total of 4 noise reduction self encoders, and noise reduction from the encoder to the input.
Before beginning this article, we first look at the 3 pictures given above, which are the first, 40, and 80 frames of the same video. After a runner's border (bounding-box) was given in the first frame, the following fortieth frames, 80 frames, and the bounding-box still accurately circled the same runner. The above display is actually the process of visual object tracking. Target tracking (especially single purpose tracking) means that the initial state, such as location and size, of the object in the first frame of tracking video is given, and the state of the object in subsequent frames is automatically estimated.
Human eyes can be more relaxed with a particular purpose in a period of time. But for machine, this task is not simple, especially in the tracking process, it will show various complex situations such as the purpose of attack, violent deformation, being blocked by other purposes or showing similar objects interference. Over the past decades, the research of target tracking has made great progress. Especially since all kinds of machine learning algorithms have been introduced, the purpose tracking algorithm presents a trend of letting flowers bloom. Since 2013, deep learning methods have begun to show their head in the category of target tracking, and gradually surpass the traditional methods in performance and have made great breakthroughs. This paper first introduces the mainstream traditional target tracking method, then introduces the algorithm of target tracking based on deep learning, and concludes the application of deep learning in target tracking.
Classical target tracking method
At present, the tracking algorithm can be divided into two categories: generative model and discriminative model.
The generation method uses the generation model to describe the apparent feature of the target, and then to minimize the reconfiguration error through the search candidate. More representative algorithms are sparse coding, online density estimation and principal component analysis (PCA). The production method focuses on the description of the purpose itself, neglects the background information, and easily produces drifting when the purpose changes violently or is blocked.
In contrast, the discriminant method is used to distinguish between the purpose and the background through an exercise classifier. This method is also often referred to as tracking-by-detection. In recent years, various machine learning algorithms have been applied to discriminant methods. There are more representative multiple instance learning, boosting and structural SVM (structured SVM). The discriminant method is more robust because it distinguishes the information from the background and foreground, and gradually occupies the mainstream position in the target tracking category. It is worth mentioning that most of the current deep learning goal tracking methods are also attributable to the discriminant framework.
In recent years, the tracking method based on the correlation filtering (correlation filter) has absorbed the attention of many researchers because of its fast speed and good effect. The related filters exercise the filters by returning the input features to the purpose of the Gauss distribution. In the follow-up tracking, the peak of the response in the prediction distribution is found to locate the destination position. The associated filter is skillfully applied to the fast Fu Liye transform to gain a large amplitude. At present, there are many ways to expand correlation filtering, including kernelized correlation filter (KCF), and correlation filter (DSST) with scale estimation.
A target tracking method based on depth learning
Different from the trend of the detection and identification of visual category deep learning rule, deep learning in the category of applications is not easy to track. The main problem is the lack of training data: the magic of deep models comes from effective learning of a large number of annotated exercise data, while target tracking provides only the first frame of bounding-box as exercise data. In this case, it is difficult to exercise a depth model from the beginning of the tracking beginning to the current purpose. At present, based on deep learning objective tracking algorithm using several thoughts to deal with this problem, the following will be based on the thoughts of the different development introduction, and finally introduce the current tracking using recurrent neural network (recurrent neural network category of the present new thoughts) disposal tracking problem.
Use auxiliary picture data to pre exercise depth model and tune on line
In the exercise of data tracking in very limited circumstances, the use of non data aided tracking exercise to suspend pre exercise, to obtain common object features (general representation) said, in the theory of tracking, the pre exercise model tuning after application of current tracking to limited sample information (fine-tune), so that the model has better classification performance the current tracking, this transfer learning thoughts greatly reduced on the tracking exercise sample needs, but also improve the performance of the tracking algorithm.
The representative works in this area are DLT and SO-DLT, both from Dr. Wang Naiyan of Hong Kong University Science & Technology.
DLT (NIPS2013) Learning a Deep Compact Image Representation for Visual TrackingDLT is the first tracking algorithm that applies depth models to single purpose tracking tasks. Its main thoughts are as shown above.
(1) using stack noise (stacked denoising autoencoder, the auto encoder SDAE Tiny Images dataset) in such a wide range of natural image data sets to suspend the off-line pre exercise no surveillance to obtain object representations of general ability. The network structure of the pre exercise, as shown in B, is a total of 4 noise reduction self encoders, and noise reduction from the encoder to the input.