What Are the Most Common Speech Recognition Problems?

Eugene P.

Speech recognition software has advanced greatly since it was first invented, but it still has several big problems that prevent it from being used exclusively as a method of transcription. Some of the speech recognition problems that are difficult to solve include variations in the pronunciation of words, individual accents, homonyms and unwanted ambient noises. Another set of speech recognition problems involves the type of hardware used to actually input the sound, because the results can have a large impact in how the software will interpret the speech. There also is the problem of not knowing the context of the words being spoken, which can lead to text that has no punctuation or inaccurate spellings.

A microphone that is overly sensitive may create audio information that is difficult for the speech recognition software to decipher.
A microphone that is overly sensitive may create audio information that is difficult for the speech recognition software to decipher.

One of the most basic speech recognition problems is the quality of the input devices being used. If a microphone is not sensitive enough — or is overly sensitive — then it can create audio information that is difficult for the software to decipher. This is especially true when a microphone is so sensitive that the speech is distorted, making the recognition software nearly useless. A similar problem stems from background noise that can be problematic to separate out from the main speech and can cause inaccurate translations when included in the speech processing.

Differences in pronunciation, accents and speaking cadence combine to form one of the more pervasive speech recognition problems. When a single word can be pronounced in several ways, the software can become confused and misinterpret what is being said. The same can occur when a person speaks slower or faster than the program expects. There are some partial solutions, such as training the software in the speech patterns of a single user and using dynamic time-warping algorithms to match the speech to the database of samples, but they do not solve all the problems.

The most complex of the speech recognition problems is identifying the context of the words being spoken. Computer software is unable to identify the intended meaning of a collection of words, leading to a number of problems with the transcribed text. Words that have a similar sound, such as "their" and "there", can only be accurately spelled when the context of usage is known. For this same reason, accurate punctuation is nearly impossible for the software to place based solely on knowing the sequence of words. There is functional transcription software that is used in fields such as medicine, but the result is often a block of words without any type of separation, meaning it still takes a human transcriptionist to edit the document and create a readable final copy.

You might also Like

Discussion Comments


Mine software works pretty well, but If I cough or sneeze while wearing the mic, it thinks I'm saying a word and types in whatever it interprets these sounds to be. I do have to laugh at the words it comes up with at times.


I have used a popular speech recognition software, and it does have trouble distinguishing between homonyms. It also messes up when I speak too slowly or quickly. I have to go over the document to make sure it does not contain any sentence that just don't make sense.

However, I noticed if I use it daily, it does get better at adapting to my speech patterns and how I pronounce words, so it can be helpful if you cannot type quickly. But, I do find myself taking more time to carefully proofread documents when using it.

Post your comments
Forgot password?