Today's Technologies

Fluential's Innovation in Speech Translation Technology

Prior to Fluential, traditional approaches to creating speech translation systems were lengthy and required hundreds of thousands or even millions of utterances to create sufficient data for a functioning system. To enable rapid and cost-effective development of focused solutions for its customers, Fluential has developed a unique approach that requires much less speech data. As a result, new, 2-way speech translation systems can be optimized, very cost effectively, in one to three months.

Speech translation systems rely on speech data for three types of models:

1. Acoustic models – which identify patterns in the way people speak words and phrases, enabling speech recognition engines to recognize what is being said.
Those models do not need to be created from scratch for each new system. Fluential integrates commercially available speech recognition engines (covering 25+ languages) that already have acoustic models built in them.
 
2. Language models – which identify specific words and phrases people use to convey meaning for particular concepts. For example, one can ask, "How old are you?" or "What is your age?" and both have the same meaning. Language models require the collection of speech data specific to the encounter for which they are built.

Fluential reduces the amount of field data required to build these models by:
a.Augmenting the field data with data collected over the phone.
b.Augmenting field and phone data with human knowledge and unique system development tools.
c.Using proprietary conceptual clustering technology to find additional data and resources to which Fluential has access.
 
3. Translation models – which are built by using statistical techniques to map English words and phrases to the equivalent words and phrases in the other languages to create a machine translation engine. Translation models also require the collection of task-specific data.

To develop a new system, Fluential needs very little field data because its system uses proprietary conceptual translation technology for paraphrase translation. This technology can achieve very high accuracy even in noisy and error-prone environments because it extracts the core meaning within the context of the encounter and then translates that core meaning. With this approach, the system can be designed to deliver the clearest and most culturally appropriate way to communicate a concept. In addition, multiple ways of expressing the concept in English will trigger the same culturally appropriate translation every time.

The Components of Fluential's Speech Translation Solutions

Fluential's S-MINDS systems have a speaker-independent large-vocabulary speech recognition system, a conceptual and statistical translation engine, and a voice synthesis system augmented by a library of voice recordings to output the translation. They also include a language model editor to modify and augment the system with additional words, phrases, or sentences. S-MINDS has a hybrid architecture that combines multiple automatic speech recognition (ASR) engines and multiple translation engines. This approach further improves the accuracy and the coverage of the system by leveraging the strengths of both statistical and grammar/rules-based systems.

S-MINDS has a modular architecture with the following components:

ASR Engine

S-MINDS employs multiple ASR engines, so the best engine can be chosen for each language. Within each language, multiple language models are active at the same time, telling the ASR engines which words and phrases to recognize. Generally, a smaller more directed language model with higher accuracy is used to capture important and frequently used concepts. And a larger language model that generally has broader coverage but somewhat lower accuracy is used for less frequently occurring concepts. This combination provides high accuracy for common interactions within a specific domain and slightly lower accuracy but broader coverage when something completely unexpected is said. This method also allows development of new domains with very little data – for each domain, only a new domain-specific language model needs to be built.

Interpretation Engine

Fluential has created an interpretation engine that is an alternative to a statistical machine translation (SMT) engine. The S-MINDS interpretation engine uses information extracted from the output of the ASR engine and then performs a paraphrase translation. This process is similar to what human interpreters do when they convey the essential meaning without providing a literal translation. The advantage of an interpretation engine is that new domains can be added more quickly and with less data than is possible with an SMT engine. For high-volume, routine interactions, an interpretation engine can be extremely fast and highly accurate. Again, this means that highly accurate focused applications can be built quickly with very little speech data.

Statistical Machine Translation Engine

For the S-MINDS SMT engine, Fluential has developed a novel approach that has improved the accuracy of speech translation systems. This approach capitalizes on the intuition that language is broadly divided into two levels: structure and vocabulary. Traditional statistical approaches force the system to learn both types of information simultaneously. However, when the acquisition of structural information is kept separate from the acquisition of vocabulary, the resulting system learns both levels more efficiently. Also, by modifying the existing corpus to separate structure and vocabulary, we have been able to take full advantage of all the information in the bilingual corpus, producing higher-quality machine translation without requiring large bodies of training data.

VUI+GUI System

S-MINDS has a flexible user interface that can be configured to use the voice user interface (VUI) only or the VUI plus a graphical user interface (GUI) for either the English speaker or the second-language speaker. Also, the English speaker can experience a different user interface than the second-language speaker. The system has the flexibility to use multiple types of microphones, including open microphones, headsets, and telephone headsets. Speech recognition can be confirmed by VUI, GUI, or both, and it can be configured to verify all utterances, no utterances, or just utterances that fall below a certain confidence level.

Synthesis Engine

S-MINDS uses a combination of its own small-vocabulary and highly fluent text-to-speech (TTS) and large-vocabulary but less fluent TTS engine. Fluential licenses its large-vocabulary TTS technology from different leading companies depending on the language and the domain.
Product Demos

Use the following links to view appropriate demos

Demo for Radiology Demo for Military Demo for Physical Therapy

See S-MINDS live -

Sign up for a webinar
 
Inquiries

For more information:

Contact Us E-mail Us Sign up for our newsletter

(408) 747-1010