Fujitsu Laboratories Limited today announced the development of speech interface technology that enables users to retrieve a variety of information by simply speaking into a smartphone, without having to look at the smartphone's display.
After listening to a synthesized speech read the latest news and other information, users can articulate the information that they would like to learn more about. The software will then read details about the topic and other related information. By taking advantage of this technology, users who are driving or working and need to keep their eyes and hands free can use various information services without having to look at or touch the smartphone's display.
Fujitsu Laboratories has developed a new eyes-free and hands-free speech interface in which, by simply speaking about what the user is interested in, the system pulls up relevant information and reads it out loud. For instance, when the user speaks a particular phrase from a news headline that the system has read, the system will read more detailed articles related to the topic at hand.
Language is constantly changing. To address linguistic evolution, Fujitsu has developed technology that automatically extracts the orthographic patterns of new terminology from text found on the Internet, and then automatically inputs it into the system's vocabulary dictionary. This makes it possible to create a speech interface that minimizes often misread and falsely recognized words.
Fujitsu has also developed technology that analyzes information previously presented by the system, extracts vocabulary focused on certain topics, and automatically generates a speech recognition dictionary. As a result, the system is able to correctly recognize homonyms and other ambiguous phrases, thereby helping to facilitate accurate dialogue with the user.
When performing speech recognition and speech synthesis, the handset is connected to a datacenter where a huge lexicon is stored and updated. Fujitsu Laboratories has developed technology that, by dividing and anticipating speech data, is able to absorb the delays caused by processing and transmission as part of the datacenter-based speech recognition and speech synthesis process. In addition, the technology is able to further improve the quality of the response time by controlling the timing of breaks between words. As a result, the user experience compares favorably with that of car navigation systems.
This technology enables users to retrieve information through a series of intuitive speech interaction, without looking at any displays. As a result, news, email and other web services frequently used in daily life are available while driving or walking, or provided to users who have difficulty viewing a display. In addition, for audio tour systems employed in museums, the technology can provide more detailed information. For example, additional information could be offered just by saying a word that comes up in an audio tour or in a description of an exhibit.
Fujitsu plans to commercialize this technology as a mobile user interface for cloud services within fiscal 2012.