News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
Speech recognition technology
Speech recognition technology
Speech recognition technology, also known as Automatic Speech Recognition (ASR), aims to convert vocabulary content in human speech into computer readable input such as buttons, binary codes or sequences of characters. Unlike speaker recognition and speaker confirmation, the latter attempts to identify or confirm the speaker who made the speech rather than the vocabulary content contained therein.
Introduction:
Applications of speech recognition technology include voice dialing, voice navigation, indoor device control, voice document retrieval, and simple dictation data entry. Speech recognition technology combined with other natural language processing techniques such as machine translation and speech synthesis technology can be used to build more complex applications, such as speech-to-speech translation.
The areas covered by speech recognition technology include: signal processing, pattern recognition, probability theory and information theory, vocal mechanism and auditory mechanism, artificial intelligence, and so on.
history:
Before the invention of the computer, the idea of automatic speech recognition has been put on the agenda, and the early vocoder can be regarded as the prototype of speech recognition and synthesis. The "Radio Rex" toy dog produced in the 1920s was probably the earliest speech recognizer. When the dog's name was called, it could be ejected from the base. The earliest computer-based speech recognition system was the Audrey speech recognition system developed by AT&T Bell Labs, which recognizes 10 English digits. Its identification method is to track the formants in the speech. The system received a 98% correct rate. By the end of the 1950s, Denes of the College of London had added grammatical probabilities to speech recognition.
In the 1960s, artificial neural networks were introduced into speech recognition. Two major breakthroughs in this era were Linear Predictive Coding (LPC) and Dynamic Time Warp.
The most significant breakthrough in speech recognition technology is the application of the hidden Markov model Hidden Markov Model. From Baum, the related mathematical reasoning was proposed. After research by Labiner et al., Kai-fu Lee of Carnegie Mellon University finally realized the first large vocabulary speech recognition system Sphinx based on hidden Markov model. . Strictly speaking, speech recognition technology has not left the HMM framework.
Although researchers have been trying to promote "dictation machines" for many years, speech recognition technology is currently unable to support unlimited fields, unlimited speaker dictation applications.
principle:
The speech recognition system prompts the customer to use the new password in a new situation so that the user does not need to remember the fixed password and the system will not be deceived by the recording. Text-related voice recognition methods can be classified into dynamic time warping or hidden Markov model methods. Text-independent voice recognition has been studied for a long time, and the performance degradation caused by inconsistent environments is a big obstacle in the application.
How it works:
The dynamic time warping method uses instantaneous, variable scrambling. In 1963, Bogert et al. published "Sequence Scrambling Analysis of Echoes". By exchanging the alphabetical order, they define a new signal processing technique with a broad terminology, and the calculation of the cepstrum usually uses a fast Fourier transform.
Since 1975, hidden Markov models have become very popular. Using the hidden Markov model, the statistical variation of the spectral features is measured. Examples of text-independent speech recognition methods are average spectral method, vector quantization method, and multivariate autoregressive method.
The average spectral method uses a favorable scrambling distance, and the influence of the phoneme in the speech spectrum is removed by the average spectrum. Using vector quantization, the set of short-term training eigenvectors of the speaker can be used directly to describe the essential features of the speaker. However, when the number of training vectors is large, this direct depiction is impractical because the amount of storage and computation becomes bizarre. So try to use vector quantization to find an effective way to compress the training data. Montacie et al applied multivariate autoregressive mode to determine the speaker characteristics in the time series of scrambling vectors, and achieved good results.
I want to fool the speech recognition system to have a high-quality recorder, which is not very easy to buy. A typical recorder cannot record the complete spectrum of sound, and the quality loss of the recording system must also be very low. For most speech recognition systems, the imitation sound will not succeed. The use of speech recognition to identify an identity is very complicated, so the speech recognition system will combine personal identification number identification or chip card.
Speech recognition systems benefit from inexpensive hardware, and most computers have sound cards and microphones that are easy to use. But speech recognition still has some shortcomings. Voice changes over time, so biometric templates must be used. Voice can also change due to cold, hoarseness, emotional stress or puberty. Speech recognition systems have a higher false positive rate than fingerprint recognition systems because people's voices are not as unique and unique as fingerprints. For fast Fourier transform calculations, the system requires a synergistic processor and more performance than a fingerprint system. Currently, speech recognition systems are not suitable for mobile applications or battery powered systems.
Learn about speech recognition technology:
Introduction:
Applications of speech recognition technology include voice dialing, voice navigation, indoor device control, voice document retrieval, and simple dictation data entry. Speech recognition technology combined with other natural language processing techniques such as machine translation and speech synthesis technology can be used to build more complex applications, such as speech-to-speech translation.
The areas covered by speech recognition technology include: signal processing, pattern recognition, probability theory and information theory, vocal mechanism and auditory mechanism, artificial intelligence, and so on.
history:
Before the invention of the computer, the idea of automatic speech recognition has been put on the agenda, and the early vocoder can be regarded as the prototype of speech recognition and synthesis. The "Radio Rex" toy dog produced in the 1920s was probably the earliest speech recognizer. When the dog's name was called, it could be ejected from the base. The earliest computer-based speech recognition system was the Audrey speech recognition system developed by AT&T Bell Labs, which recognizes 10 English digits. Its identification method is to track the formants in the speech. The system received a 98% correct rate. By the end of the 1950s, Denes of the College of London had added grammatical probabilities to speech recognition.
In the 1960s, artificial neural networks were introduced into speech recognition. Two major breakthroughs in this era were Linear Predictive Coding (LPC) and Dynamic Time Warp.
The most significant breakthrough in speech recognition technology is the application of the hidden Markov model Hidden Markov Model. From Baum, the related mathematical reasoning was proposed. After research by Labiner et al., Kai-fu Lee of Carnegie Mellon University finally realized the first large vocabulary speech recognition system Sphinx based on hidden Markov model. . Strictly speaking, speech recognition technology has not left the HMM framework.
Although researchers have been trying to promote "dictation machines" for many years, speech recognition technology is currently unable to support unlimited fields, unlimited speaker dictation applications.
principle:
The speech recognition system prompts the customer to use the new password in a new situation so that the user does not need to remember the fixed password and the system will not be deceived by the recording. Text-related voice recognition methods can be classified into dynamic time warping or hidden Markov model methods. Text-independent voice recognition has been studied for a long time, and the performance degradation caused by inconsistent environments is a big obstacle in the application.
How it works:
The dynamic time warping method uses instantaneous, variable scrambling. In 1963, Bogert et al. published "Sequence Scrambling Analysis of Echoes". By exchanging the alphabetical order, they define a new signal processing technique with a broad terminology, and the calculation of the cepstrum usually uses a fast Fourier transform.
Since 1975, hidden Markov models have become very popular. Using the hidden Markov model, the statistical variation of the spectral features is measured. Examples of text-independent speech recognition methods are average spectral method, vector quantization method, and multivariate autoregressive method.
The average spectral method uses a favorable scrambling distance, and the influence of the phoneme in the speech spectrum is removed by the average spectrum. Using vector quantization, the set of short-term training eigenvectors of the speaker can be used directly to describe the essential features of the speaker. However, when the number of training vectors is large, this direct depiction is impractical because the amount of storage and computation becomes bizarre. So try to use vector quantization to find an effective way to compress the training data. Montacie et al applied multivariate autoregressive mode to determine the speaker characteristics in the time series of scrambling vectors, and achieved good results.
I want to fool the speech recognition system to have a high-quality recorder, which is not very easy to buy. A typical recorder cannot record the complete spectrum of sound, and the quality loss of the recording system must also be very low. For most speech recognition systems, the imitation sound will not succeed. The use of speech recognition to identify an identity is very complicated, so the speech recognition system will combine personal identification number identification or chip card.
Speech recognition systems benefit from inexpensive hardware, and most computers have sound cards and microphones that are easy to use. But speech recognition still has some shortcomings. Voice changes over time, so biometric templates must be used. Voice can also change due to cold, hoarseness, emotional stress or puberty. Speech recognition systems have a higher false positive rate than fingerprint recognition systems because people's voices are not as unique and unique as fingerprints. For fast Fourier transform calculations, the system requires a synergistic processor and more performance than a fingerprint system. Currently, speech recognition systems are not suitable for mobile applications or battery powered systems.
Learn about speech recognition technology: