What is an Acoustic Model?

Updated: May 16, 2024

An acoustic model is essentially a map of the voice in relation to a series of printed words. This technology is used in speech recognition programs to help a computer learn to recognize a person's speech patterns. An acoustic model is one of the two main files necessary to run a speech recognition program; the other is the language model, which indicates likely words and speech patterns that may be used by the speaker. These models are created by comparing the sound details of a spoken audio file to the text of the spoken words.

Speech recognition software is software designed to recognize and transcribe or respond to the words a person says. Many operating systems are designed with built-in basic speech recognition capabilities that the user can turn on and off. Speech recognition capabilities on operating systems usually give the user the ability to control the computer and type words on the screen using her voice.

To access speech recognition software, a user needs a microphone to get her voice to the computer, plus a program that processes the sound. While many computers have built-in microphones, an external headset microphone allows the user the benefit of clearer voice sound and the freedom to move around the room while speaking. Standalone speech recognition software brands include LumenVox®, Loquendo®, and Dragon®.

Most speech recognition programs have acoustic model programming that allows the program to recognize variations in pronunciation. They use patterns in the sound of the speaker's voice to identify words in speech. Many are designed with setup software made to help the user create an acoustic model designed to interpret her own voice. Some advanced speech recognition programs can identify and interpret multiple languages, often with a tiny amount of sound information. The more advanced a speech recognition program, the more likely it is to accurately interpret words based on its context, including where in a sentence a word is spoken.

The field of study that develops speech recognition technology is called computational linguistics. Computational linguistics involves study and design that creates software programmed to understand human speech. This field often incorporates information from the study of psychology to create acoustic models that can more accurately interpret speech.

The word "acoustic" generally refers to anything that has to do with sound. Though acoustic models are most often used in speech recognition, they can also be used in music. An acoustic model of a music track can identify properties like beats per minute, the musical keys, or dominant pitches in the music. This information can be used by a computer program to identify a music track, or it can be used to loosely determine the genre in which the music is likely categorized. Acoustic models are also used in a field of study called psychoacoustics, in which researchers hope to learn to structure music that predictably affects the brain.

Our Promise to you

What is an Acoustic Model?

Editors' Picks

Related Articles