Voice input processing for automotive speech recognition systems

August 28, 2012 // By Sverrir Olafsson, Conexant
In a quiet, controlled environment, today’s speech recognition engines have become quite effective. Whether doing dictation with a quality headset in a quiet office, or giving search-phrases to a smartphone in a silent room, hit rates of close to 100 percent are now commonly achieved. However, adding a few disturbances tends to quickly degrade the performance.

The automobile environment is one of the most challenging in this respect. A variety of noise sources both outside of the car (passing cars, honking horns) and inside (multiple passengers talking, the air conditioning fan, the radio) along with audio reverberations off the hard surfaces result in the lackluster performance with which many car owners are familiar.

Further, in order to avoid false triggers, the driver of the car needs to push a button to trigger the speech command system. This is not just a nuisance but also a safety hazard.

Yet few applications could benefit more from using speech recognition for voice command operation than the automobile. It is therefore critical and of great value if technology can make speech recognition more effective in cars, detecting commands reliably in the presence of all disturbances without use of button-presses. While fundamentally being a speech recognition problem, performance improvements will primarily come by processing the voice input signal by removing noise and disturbances.

In recent years, one of the key areas that Conexant has focused its vast experience in audio technology is in Voice Input Processing (VIP). By doing careful design from the microphone interface, providing clean bias signals and low-noise pre-amplification and gain control, to implementing complex digital signal processing algorithms on its high-performance yet low-power DSPs, Conexant has been able to deliver VIP devices for a number of applications including TVs, home appliances and automobiles. Within those applications, one of the primary advantages of using the Conexant solution is to improve the performance of speech recognition engines, where the Conexant solution has been optimized for many of the common speech recognition algorithms for use in challenging environments.

To achieve superior performance, several algorithms are employed to enhance the desired input signal and suppress noise sources in a coordinated manner. Conexant's Selective Source Pickup (SSP) algorithm is uniquely able to separate the desired signal from the noise sources by analyzing

Design category: