Skip to main content
Click to return to IBM ECVG home

VideoVista Browser

Much of video production is moving to a digital format, especially systems like HDTV which require compression. With all this information online in machine readable form, wouldn't it be nice to be able to search for video clips of interest? This might be done by a studio to create a trailer for a movie, by a network to assemble a digest of a sports event, by a news agency to rapidly "re-purpose" raw footage (e.g. find Monica), or by a home user wishing to find recent appearances of his favorite actress.

To this end we have created the VideoVista search and browsing system. It automatically breaks the video into segments based on changes of camera shots and stores any closed captioning information. It then runs speech recognition on the audio track, analyzes camera motion, and looks for faces in the video. Meta-information about the program, such as airdate, can also be recorded. All these attributes can then be used to search for a specific video clip as shown below. The speech and closed captioning information are used in conjunction with a standard text search engine. The motion information can be useful, for instance, during a sports program to tell when play has resumed (high motion). Or, during a speech, to detect the presence of a slow "crawl" zoom which is often used to emphasize emotionally important points. The face detector can be used to find the beginning of a new segment by looking for an "establishing" shot (full body), or can be used to find field footage versus talking heads (the anchors) since something like video of a burning building typically has zero faces. All sources of these sources of information can also be used simultaneously to allow highly constrained multi-modal searches.

 
VideoVista Query


VideoVista integrates the specified constraints using temporal interval intersection. The results found by the system are then displayed in a graphical digested form as shown below. The abstracts include meta-infomration about the source of the material, a series of thumbnail-sized keyframes summarizing the video, a fragment of the associated text, and a graphical bar indicator showing the length and position of the clip in relation to its source video.

 
VideoVista Results


One particular clip can then be selected for viewing or as a starting point for free-form browsing. As the clip plays, the relevant portion of text is highlighted to keep track with the audio. The detected features (motion, faces, and titling) are also displayed and updated. To help find other related portions of the video, these features can also be graphically depicted (yellow) on an overall timeline for the full source. The browser also incorporates a semantic fast forward capability that allows the user to advance the video not only by fixed time units, but also on the basis of camera shots and story shifts. This facility is particularly useful for jumping around within news programs.

 
VideoVista Player

 
This project was developed as part of a NIST funded ATP program: The HDTV Studio of the Future.



Selected publications:

Multi-Search of Video Segments Indexed by Time-Aligned Annotations of Video Content
Anni Coden, Norman Haas, Robert Mack
internal working paper, 1999.
More

 
Contact: Norman Haas Last updated: 6/10/02
 
Research Projects Group Papers Issued Patents Related Groups


  Privacy | Legal | Contact | IBM Home | Research Home | Project List | Research Sites | Page Contact