Gauss mixed model GMM

News classification

Contact us

Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
Tel: 13146317170 廖经理
Fax:
Email: 398017534@qq.com

The Gauss mixed model GMM is a very basic and widely used model. A thorough understanding of it is very important. Most of the material introduction to GMM on the Internet is a large section formula, and the expression of the symbol is not very clear, or the writing is very rigid. This article tries to introduce GMM in an all - round way with simple words, and the deficiency is also expected to be corrected.

First, the definition of GMM is given

The definition of "statistical learning method" is used here, as follows:

Here's a picture depiction

The definition is well understood. The Gauss mixed model is a mixed model, and the fundamental dispersion of the mixture is Gauss's dispersal.

The first detail: why the sum of the coefficients is 0?

A graph is given on PRML:

Here's a picture depiction

This graph shows how the GMM of a dimension with three Gauss weights is superimposed by the weight of his Gauss. This picture was once a disturbance to my understanding of GMM. Because if this is the case, then the three Gauss weight coefficients should be 1, so that the sum of the coefficient is 3, will have such a direct superposition effect. And this clearly does not fit the definition of GMM. As a result, this graph only shows the principle of GMM generation in a way that is not accurate.

So, why should the sum of the coefficients of each of the Gauss's weight of GMM be 1?

In fact, the answer is simple. The definition we call GMM is essentially a probability density function. The sum of the integral of the probability density function in its scope is 1. The probability density function of the whole GMM is composed of a plurality of Gauss weights of the probability density function of the linear superposition of integral, and the probability density function of each of the 1 Gauss weight is the inevitable, therefore, to the probability density of the overall GMM score of 1, it is necessary to give a value of not more than 1 of the weight of every Gauss weight, and the weight is 1.

Second details: why EM algorithm is used to solve GMM parameters

It is well known that the EM algorithm is used to solve the GMM parameters. But why? Is this necessary?

First of all, similar to other model solutions, we first use the maximum likelihood estimation to try to solve the parameters of GMM. As follows:

Here's a picture depiction

Here's a picture depiction

It can be found that the purpose function is the logarithm of the sum, it is difficult to expand the problem, and it is difficult to optimize the problem, and it is difficult to stop the partial guidance. So we can only find other ways. That's the EM algorithm.

Third details: an understanding of the hidden variables of the EM algorithm for GMM

The implicit variable must be understood by using the EM algorithm. When solving the imagination of the GMM x data is created by a Gauss coefficient probability weight GMM (due to the preferred dependence coefficient between 0~1, which can be seen as a probability value) to choose the weight of the Gauss, and then on the basis of the selected Gauss weight generation observation data. Then the implicit variable is whether a certain Gauss weight can be selected: the selection is 1, or 0.

According to the imagination, the implicit variable is a vector, and in this vector, as long as one element is 1, the other is 0. It is assumed that as long as a Gauss weight is selected and the observation data is produced. But an observational data of our GMM should be generated by every Gauss weight, instead of being generated by a single Gauss weight. So, is such a implicit variable presumed to be reasonable?

The answer is reasonable, only to learn more than "laborious".

The first thing to understand is: what is the observation data of GMM, and what is the result of GMM's function. If it is a one - dimensional GMM, then the observation data is a real number. And the probability density function of GMM outputs the probability that the real number is generated by the GMM after the input of the observation data.

Then, now we don't know the detailed parameter values of GMM, and we want to solve their parameters according to the observation data. The parameter of the GMM is made up of the weight coefficients of each Gauss and the weight coefficient. So let's first assume that if the observed value is only produced by one of the Gauss weights, to solve one of the Gauss weight parameters. We assume that all the observed values have one single destination, just like the K-means algorithm. Then, in the following iterative process, an optimal allocation plan is gradually found on the basis of the optimization process of the overall likelihood function of the data. However, unlike the K-means algorithm, we end up with the probability value that only one observation is generated by a single Gauss weight instead of a certain category. Every Gauss weight can actually produce this observation data that is only different from the output, that is, the probability of producing the observation data is different. Finally, the probability of generating the observation data according to each Gauss weight is different, and its weight is separated to gather the probability value of the whole GMM to produce the observed data.

Ultimate understanding: using EM algorithm to solve GMM parameters

1. Define implicit variables

We introduce implicit variables, which can only be 1 or 0.

The value is 1: the first observation variable comes from the Gauss weight
The value is 0: the first observation variable is not from the Gauss weight
So every observation data will correspond to a qualitative variable, so there is:

In this, the number of the weight of the GMM Gauss is the weight of the weight of the first Gauss. The observation data from GMM Gauss each weight independent of each other, and it can be viewed as the observation data from the Gauss probability weight, which can direct the prior hidden variables obtained by multiplication of the probability distribution.
2. The likelihood function of the complete data

As to the observation data, it has been done

PREVIOUS：Summary of deep learning knowledge NEXT：Human image comparison algorithm, human