We consistently seek improvements in user interfaces so that systems can be accessed in a wide range of situations and by people with disabilities. IBM Research - Tokyo is well ahead of the industry in this area, particularly in voice technology and accessibility technology for people with visual impairments.
Robust Speech Recognition
Our target is to develop speech recognition technology which recognizes the spoken word beyond human abilities. One approach involves studying noise reduction methods, echo cancellation, and target speech enhancement and detection by using multiple microphones. Another approach starts from a language processing viewpoint, from which we are studying non-fluency modeling and the acquisition of unknown words to improve the accuracy of transcriptions of spontaneous speech. We are also developing an advanced technology for speech comprehension, focusing on the robust retrieval of POIs (points of interest) from speech.
Our target is to develop a speech synthesis technology that generates human voices reflecting various kinds of personalities. Conventional TTS (Text-To-Speech) output is intelligible but unattractive for most users because some characteristics of the original voice have been spoiled. Our TTS system is capable of learning both acoustic models and stochastic language models that are trained with a newly developed stochastic approach from speech recognition results using the prosodic features in the speech. This method can produce more natural and human-like synthetic voices, though the applicable domains are still limited. We are developing a totally trainable TTS. In addition, to improve the naturalness, we are also studying the synthesis of emotional speech.
Beyond tools for analyzing written text, technologies for analyzing spoken conversations with clients are required for CRM and for compliance checks in various business scenarios. Recognizing natural speech within a dialogue is regarded as a difficult problem, but we have made significant progress in the last few years. In addition to more accurate transcription of conversations with time indexes, we are now developing various application technologies such as audio segmentation and classification, emotion detection, and a turn-taking overview tool.
Online Discussion with Collective Intelligence
Existing online discussion systems have several drawbacks. For example, newcomers find it difficult to track discussions, some questions may not have any responses for long periods, and questions are not found by the people with answers. Our research is focusing on a novel online discussion system that is integrated with a crowd sourcing framework, text analytics, and social network analysis. Our system seeks to make the content of discussions easier to understand by visualizing and structuring them to encourage participation.
Social Computing for Accessibility
There are billions of people who have problems in accessing webpages, including people with disabilities, elderly people, and non-literate people in developing countries. The needs for accessible webpages have become too broad to be left only to Web developers. In this project, we study an innovative Web service based on the wisdom of crowds. It can allow disabled users and volunteers to collaborate with the developers to improve the accessibility of the webpages without even changing the original content.
Sasayaki: Auditory Assistant for Web Navigation
Sasayaki is an intelligent system to help users through an audio channel. The system observes the users' behaviors and their contexts to provide appropriate auditory feedback. The system can provide information on how to use webpages and summaries of the webpages. Experiments involving blind users and elderly users show that they can navigate through webpages with more confidence based on the audio support of Sasayaki. We are trying to expand the coverage of Sasayaki applications to broader situations such as using ticket machines, ATMs, or driving.
Our research goal is to provide new methods for the efficient digitization of documents printed in various types of paper formats. We need more accurate automated recognition of the layouts, characters, and structural items such as headings in each document and easier correction work with improved computer support. We also seek to develop a system that can continuously improve the efficiency of its document digitization by providing feedback of the results back into the system.
Document Processing Assist Tools
Globally integrated companies need to exchange documents written in many languages, and high quality documents are required for sharing with customers. For such situations, we are working on technologies for document critiquing, translating, and sanitizing/masking by using natural language processing technologies.