RIALIST continues to research and develop spoken dialogue capabilities that are applicable across tasks. There were two major areas of research and development of this type during FY02: Language Modeling using Explanation Based Learning and Targeted Help.
Language Models provide a necessary constraint on speech recognition by restricting recognition to the range of language needed for a particular application. We have made significant improvements in our capabilities for creating application specific grammars semi-automatically. We have refined our use of the machine learning technique, EBL, for creating these application specific grammars from a general grammar of English using a very small amount of training data. We have also successfully tested some new techniques for compiling the specialized grammars in to context-free language models. These new techniques are faster and more likely to stay within memory limitations than previous approaches. The tools developed in this research effort have been released as part of the Open Source Regulus system.
Research indicates that the best speech recognition is achieved by expert users on systems using grammar-based language models. Naïve users do not understand the capabilities of the system and because of that produce many utterances that are out of the system’s coverage. Targeted Help is an embedded training approach that gives feedback when the user produces an utterance that the system cannot understand and guides the user towards producing in-coverage language. As a result, users should become experts faster and spend less time producing unproductive utterances. Results show that users who receive Targeted Help are significantly more able to complete tasks with the dialogue system compared to users who do not receive Targeted Help. This work was done in collaboration with Oliver Lemon and Stanley Peters of the Computational Semantics Lab at Stanford.
The Regulus component of the Leo Project now provides capabilities for parsing and grammar compilation. We have also developed a much larger English grammar for use with Regulus and translation components between Regulus and Gemini grammar formats. The Regulus tools have provided valuable support for the group’s Language Modeling research.
The Leo project aims to provide an architecture for automating the sharing of grammatical resources among various systems so that one system can take advantage of specialized algorithms and tools that are implemented for the representations used by another. The project furthermore seeks to learn about best practice in the design of these representations and encode their principles in a new XML-based format. This paper describes initial work toward creating the Leo architecture and tools that convert between different representations.
We built an initial demonstrator which showed the general form and potential capabilities of an intelligent procedure assistant. This was shown to a group of astronauts and their feedback was used to refine the design. Several suggestions were received about the best procedure to use in an initial prototype system. The procedure chosen was the Potable Water Sampling Procedure which is a long an complicated test of the drinking water quality and safety.
We built a prototype system which operates using the actual water sampling procedure. This system has dialogue features such as the ability to navigate the procedure, gracefully correct misunderstandings, take and play back audio notes, set alarms using spoken commands, keep track of the progress through the procedure (useful for interruptions) and to give more details concerning the operations being performed. These features were deemed important by the astronauts and trainers we met at JSC. The system includes a synchronized display which highlights the current step in the procedure and which can display diagrams and photographs illustrating functional information about equipment used in the procedure. We are experimenting with new more natural sounding and understandable speech synthesis systems. We have built infrastructure to support rapid changes of language and to accommodate use of the system with a variety of procedures.
Since the system is trained on examples of spoken interactions between astronauts, we have initiated recording of astronauts during training sessions. We have also arranged to have astronauts interact with the system in order to provide user feedback and additional data. This data, when annotated, can be used in an automatic training procedure to improve the performance of the system.
During the year, we further developed a spoken dialogue interface to Europa and IDEA using the RIALIST PSA demonstration system. The dialogue allows the monitoring of plan execution and re-planning when the environment changes or resources become depleted. Additional capability to ask the PSA where it is located at present has been developed. These efforts were in collaboration with Jeremy Frank and Nicola Muscettola.
Planning systems are an important part of semi-autonomous robotic systems such as planetary rovers, robotic aircraft, and in other complex planning and scheduling tasks such as daily MER activity scheduling. RIALIST has worked this year to refine the spoken dialogue interface to EUROPA and interface with the Ames ARA IDEA interleaved planning & execution system.
Robustness in noisy environments is an important feature for NASA spoken dialogue systems, particularly on ISS. Our current project in this area is in collaboration with Dr. T. Berger at USC investigating the use of neural net technology to recognize speech in very noisy conditions.
We built an initial reference resolution component with an architecture that facilitates integration of multiple sources of information including eye tracking in the reference resolution process. We designed experiments for investigating the correlation of natural eye movements with spoken dialogue events. Ellen Campana was the SSRP student who worked on this project. This work was in collaboration with James Allen, Mike Tannenhaus, Lee Stone and Roger Remington.
We developed new system requirements for a robust phoneme recognizer using the neural network technology developed at USC. We met with the USC group and offered suggestions for speech data and tests to be run on the phoneme based system. The initial USC phoneme system is now functioning and ready for further development and integration. This work was performed in collaboration with Dr. Ted Berger