
Visualization space: natural interface
Introduction
Humans communicate using speech, gesture, and body motion, yet today's
computers do not use this valuable information. Instead, computers
force users to sit at a typewriter keyboard, stare at a TV-like
display, and learn an endless set of arcane commands -- often leading
to frustration, inefficiencies, and disuse. We have created
the Visualization Space
-- a system that allows for natural interaction.
The Visualization Space -- similar to our
DreamSpace --
uses an intuitive yet richly interactive interface that
allows the user to manipulate and navigate through all types of visual
information. It "hears" users' voice commands and "sees" their
gestures and body positions. Interactions are natural, more like
human-to-human interactions. This information system understands the
user, and -- just as important -- other users understand. Users are
free to focus on virtual objects and information and understanding and
thinking, with minimal constraints or distractions by the computer,
which is present only as wall-sized 3D images and sounds (but no
keyboard, mouse, wires, wands, etc.). The multimodal input interface
combines
* voice: IBM ViaVoice speech recognition;
* body tracking: machine-vision image processing;
* understanding: context, and small amounts of learning.
The Visualization Space is essentially a smart room employing a deviceless
natural multimodal interface built on these emerging technologies and
combined with ever-cheaper computing power. This system was designed
and used for scientific visualization applications.
Future natural interfaces will allow information and communication
anywhere, anytime, anyway the user wants it -- in the office, home,
car, kitchen, design studio, school, and amusement park.
Description
Our Visualization Space allows users to collaborate in a shared
workspace using interactive visual computing. The system "hears"
users' voice commands and "sees" their gestures and body positions.
Interactions are natural, more like human-to-human interactions. The
"computer" understands the user, and -- just as important -- other
users understand. Users are free to focus on virtual objects and
information and understanding and thinking, with minimal constraints
or distractions by "the computer", which is present only as wall-sized
3D images and sounds (but no keyboard, mouse, wires, wands, etc.). As
shown in the schematic below, this intuitive human-like interaction is
made possible by
emerging interface technologies:
-
voice input: user-independent, continuous speech
IBM ViaVoice(tm)
-
vision input of gesture and body: camera and machine vision algorithm;
-
wall-sized stereoscopic "3D" display;
-
high-bandwidth networks.
The Visualization Space is a networked workspace where the computing
system adapts to the human to optimize ease of use, enjoyment, and the
organization and understanding of information. The Visualization
Space paradigm of computing is ideal for many applications:
scientific visualization
The interactive communication of complex visual concepts is now as
easy as pointing and speaking. A high-speed network and a
supercomputer (the IBM RS6000 SP) provide computational power and the
ability to handle enormous data bases, e.g., geoseismic data. Users
can collaborate remotely with users at other workspaces and
workstations.
education and entertainment
In location-based entertainment (e.g., a virtual
themepark), where interface hardware is often damaged, our
hands-off gadget-free interface allows robust unobtrusive
interactivity. Virtual adventures and walk-throughs are also more fun
and memorable. A user might take another user on a tour of a historic
site, a virtual factory, or flying tour of the continents. (See
DreamSpace for more info.)
Interaction example
The Visualization Space (early 1997) is pictured here,
showing
Mark Lucente
in a typical interaction.
To move one of the virtual objects - the earth - displayed on the
large display, Mark simply points at the object and asks the
system to "put that there":
"Put that..."
"...there."
Frequently-asked questions
You ask....
...and I (Mark Lucente) answer. Send questions
to
Mark Lucente
Or look at the
FAQs related to natural interaction.
-
-
- What is VizSpace running on?
- The VizSpace runs on an IBM PC. Both the
IBM ViaVoice(tm)
speech
recognition and the vision input operate on a multi-processor
IBM
Netfinity 7000
(also an
IBM PC 704
) running Windows NT OS. Interface
integration, communication, and application software all share this
same PC. Computer graphics rendering power is provided by a standard
graphics accelerator card. An ATM network link to an SP allows for
additional application processing power.
- Why are you using such a big PC?
- Initially, we needed lots of computing power for the interface
modalities (voice and vision), which are processed in the main CPU(s)
-- not in specialized hardware.
ViaVoice(tm)
represents an advance in
speech recognition technology that frees up CPU cycles, and the
machine-vision system has been optimized and runs one one processor
without significantly interfering with the rest of the system.
Simpler PCs have also been used.
- How much does it cost?
- The IBM PC, sound card, video digitizer, camera,
microphone, and graphics accelerator card currently in use costs about
$15,000. This is far more power and bandwidth than is required (or
utilized). The system runs (usually just as fast) on systems that
cost under $10,000. The cost of the display depends on size: our
rear-projection display is big (over 2 meters wide) and bright and costs
about $30,000. A new display technology from IBM will deliver better
performance for a fraction of the price. And smaller displays costs
much less.
- What else can it run on?
- The VizSpace has been implemented on a variety of platform during
the past two years: IBM AIX (Unix), distributed across the network; a
single-processor
IBM Intellistation
; a single
IBM Thinkpad; an
IBM PC
704 server; an
IBM Netfinity 7000 PC server.
- What do all these terms mean?: "natural computing"? "natural
interface"? "Visualization Space"?
- I believe that computers can be designed from the ground up to be
as easy and natural to use as, say, talking to your best friend, your
mom, your cactus. Making computers more natural to use ("natural
computing") requires a new kind of "natural interface" -- one that
allows humans to communicate the way they naturally communicate with
each other: speaking, gesturing, moving around, etc. The
Visualization Space (also called "VizSpace" for short) uses a
particular kind of natural interface. Users interact with and control
the images displayed on the VizSpace simply by speaking, gesturing,
etc. Other examples of natural interfaces are in the works: desks,
tables, cars, kitchens, living rooms -- natural objects and
environments that also happen to be "smart" and interactive.
A simplified version of the VizSpace (developed at IBM's T.J.Watson
Research Lab in Yorktown, New York) was set up and demonstrated at the
Comdex 1997 computer exhibition in Las Vegas in November 1997. The
DreamSpace is the next generation of VizSpace, designed for a
broader range of applications.
Contact:
Mark Lucente
, lucente@watson.ibm.com
last update: 1998 Apr
emal "lucente" at "watson.ibm.com"
[ IBM research |
natural interaction |
]