TUTORIALS


Tutorials are scheduled for the slots 8:30am-12:00 noon, and 1:30pm-5:00pm during the conference.  Please contact the Tutorial Co-Chairs Alexandros Eleftheriadis or Rama Chellappa for further information.


  1. T1 - Principles of Multimedia Database Systems
    V. Subrahmanian, University of Maryland

  2. T2A - Internet Multimedia Protocols
    H. Schulzrinne, Columbia University

  3. T2B - Multimedia Processors: VLSI Architectures and Programming
    P. Pirsch, University of Hannover

  4. T3A - The MPEG-4 Multimedia Coding Standard
    A. Puri, AT&T Laboratories, A. M. Tekalp, University of Rochester, A. Murat Tekalp, Ph.D.

  5. T3B - The MPEG-7 Multimedia Content Description Standard
    J. R. Smith, IBM T. J. Watson Research Center, A. Puri, AT&T Laboratories, A. M. Tekalp, University of Rochester

  6. T4A - Emotion and Paralinguistic Communication
    J. Cohn, University of Pittsburgh, F. Quek, Wright State University, S. Fels, University of British Columbia, R. Nakatsu , ATR Media Integration & Communications Research Laboratories

  7. T4B - Multi-Modal Interfaces for the Physically Able and Disabled
    J. Ohya, ATR Media Integration & Communications Research Laboratories, S. Morishima, Seikei University, R. Reilly, University College Dublin, and S. K. Semwal, University of Colorado

T1 - Principles of Multimedia Database Systems

Speaker: V. Subrahmanian, University of Maryland
Time: Full Day

This one day tutorial will present the basic techniques needed to index large amounts of disparate media data, to query collections of such data, and to deliver presentations of such data across a distributed network. We will start out with an overview of data structures used to store higher dimensional data, followed by a discussion of how such data structures may be used to store and retrieve image data and free text data. Techniques to organize video will then be discussed, followed by a discussion of disk servers to deliver video data and facilitate video operations such as fast forward and rewind, in addition to playback. We will then describe methods to create and deliver multimedia presentations across a network.

Biography:
V. S. Subrahmanian received his PhD in Computer Science from Syracuse University in 1989. Since then, he has been on the faculty of the Computer Science Department at the University of Maryland, College Park, where he currently holds the rank of Associate Professor. He received the NSF Young Investigator Award in 1993 and the Distinguished Young Scientist Award from the Maryland Academy of Science in 1997. He has worked extensively in knowledge bases, bringing together techniques in artificial intelligence and databases. Prof. Subrahmanian has over 100 published/accepted papers, he has edited two books, one on nonmonotonic reasoning (MIT Press) and one on multimedia databases (Springer). He has co-authored an advanced database textbook (Morgan Kaufman, 1997), and has ritten a textbook on multimedia databases (Morgan Kaufman, Jan. 1998). His monograph on software agents will appear in Spring 2000 (MIT Press). He has given invited talks and served on invited panels at several national and international conferences. In addition, he has served on the program committees of various conferences. He is on the editorial board of IEEE Transactions on Knowledge and Data Engineering, AI Communications, Multimedia Tools and Applications, Journal of Logic Programming, Annals of Mathematics and Artificial Intelligence, and Distributed and Parallel Database Journal. He serves on DARPA's Executive Advisory Council for the Advanced Logistics Program.


T2-A - Internet Multimedia Protocols

Speaker: H. Schulzrinne, Columbia University
Time: Morning

A range of protocols are necessary to provide streaming multimedia across the Internet. These protocols include mechanisms for data transfer (RTP), resource reservation (RSVP), session setup for specific applications like Internet TV (SAP), media-on-demand (RTSP) and Internet telephony (SIP) and quality-of-service monitoring (RTCP). The tutorial also reviews techniques that allow applications to deal with network loss and delay, including adaptive applications, playout delay compensation and forward error correction.

Biography:
Henning Schulzrinne received his undergraduate degree in economics and electrical engineering from the Technische Hochschule in Darmstadt, Germany, in 1984, his MSEE degree as a Fulbright scholar from the University of Cincinnati, Ohio and his Ph.D. degree from the University of Massachusetts in Amherst, Massachusetts in 1987 and 1992, respectively. From 1992 to 1994, he was a member of technical staff at AT&T Bell Laboratories, Murray Hill. From 1994-1996, he was associate department head at GMD-Fokus (Berlin), before joining the Computer Science and Electrical Engineering departments at Columbia University, New York. His research interests encompass real-time, multimedia network services in the Internet and modeling and performance evaluation.

He is an editor of the Journal of Communications and Networks, the IEEE Transactions on Image Processing and IEEE Communications Society editor of the IEEE Internet Computing Magazine. He co-chairs the IEEE Communications Society Internet Technical Committee and is chair of the IEEE Communications Society Technical Committee on Computer Communications. He is also technical co-chair of Infocom 2000.

Protocols co-developed by him are now Internet standards, used by almost all Internet telephony and multimedia applications. He is co-author of the Real-Time Protocol (RTP) for real-time Internet services, the signaling protocol for Internet multimedia conferences and telephony (SIP) and the stream control protocol for Internet media-on-demand (RTSP). He currently serves as editor for the Journal of Communications and Networks and the IEEE Commmunications Society editor for the IEEE Internet Computing Magazine.


T2-B - Multimedia Processors: VLSI Architectures and Programming

Speaker: P. Pirsch, University of Hannover
Time: Afternoon

Continuing advances in signal processing algorithm research and ongoing progress in VLSI technology are driving the fusion of audio, video, speech, image, 2D/3D graphics, and text processing, commonly referred to as multimedia. Computational requirements for multimedia signal processing, however, are still challenging even for the most powerful available processors and DSPs. Especially video processing, requiring complex operations to be performed on a large set of data at high sample rates, poses high demands on the computational hardware. Current implementations of multimedia systems strongly rely on the adaptation to algorithm-specific processing schemes to meet the high computational requirements. This course will show how algorithmic adaptation leads to current VLSI architectures for multimedia systems. The course covers topics ranging from dedicated VLSI accelerators to programmable ultimedia processors and software implementation aspects.

Topics covered include, among others:

Biography:
Peter Pirsch received the Ing. grad. degree from the engineering college in Hannover, Germany, in 1966, and the Dipl.-Ing. and Dr.-Ing. degrees from the University of Hannover, in 1973 and 1979, respectively, all in electrical engineering.

From 1966 to 1973 he was employed by Telefunken, Hannover, working in the Television Department. He became a research assistant at the Department of Electrical Engineering, University of Hannover, in 1973, a Senior Engineer in 1978. During 1979 to 1980 and in Summer 1981 he was on leave, working in the Visual Communications Research Department, Bell Laboratories, Holmdel, NJ. During 1983 to 1986 he was department head for Digital Signal Processing at the SEL research center, Stuttgart. Since 1987 he is Professor in the Department of Electrical Engineering at the University of Hannover. In 1998 he became vice president of the University of Hannover.

His present research includes architectures and VLSI implementations for image processing applications, rapid prototyping and design automation for DSP applications. He is the author or coauthor of more than 140 technical papers. He has edited a book on VLSI Implementations for Image Communications (Elsevier 1993) and is author of the book Architectures for Digital Signal Processing (John Wiley 1998).

Dr. Pirsch is a member of the IEEE and of the German Institute of Information Technology Engineers (ITG). He was on the editorial team for three special issues on VLSI Implementations for Video Application of IEEE Transaction journals and has served as an Associate Editor of the IEEE Transactions on Circuits and Systems for Video Technology from 1991 to 1995. Since 1997 he is Associate Editor of the IEEE Transactions on VLSI Systems and of the Journal of VLSI Signal Processing. He was recipient of the NTG paper price award in 1982 and has been elected an IEEE Fellow in 1997. Since 1997 he is on the board of governors of the IEEE circuits and systems society. He is active on several technical committees of the IEEE circuits and systems and signal processing societies. He was a member of several technical program committees of international conferences and organizer of special sessions and preconference courses. He was chair of the visual signal processing track of ISCAS'96 and technical program cochair of the SIPS'97.


T3-A - The MPEG-4 Multimedia Coding Standard

Speakers: A. Puri, AT&T Laboratories, A. M. Tekalp, University of Rochester
Time:Morning

Early MPEG standards (MPEG-1 and MPEG-2) have substantially impacted consumer electronics and broadcast industries e.g. Video CD, DVD, digital TV and HDTV. MPEG-4 is a new MPEG standard that provides core technologies for efficient object-based compression of video/multimedia for transmission, storage, and manipulation. MPEG-4 addresses a mix of applications, some new that require interactivity, others traditional, typically, enabling these applications at much more aggressive bitrates or with new functionalities than ever before. In particular it addresses applications such as, internet multimedia, wireless video, videoconferencing, video-on-demand and video games.

This course starts with a very brief overview of the MPEG-1 and MPEG-2 standards, and moves quickly to discuss the various parts of the MPEG-4 standard in detail. The course first covers MPEG-4 visual, including object-based natural video coding, shape coding, error-resilience, scalability, face animation coding, 2D/3D geometry coding and still texture coding. Next, after a brief update on MPEG-4 audio, the course goes into details of various aspects of MPEG-4 systems including scene description, elementary stream management, and interfaces to transport/network. Finally, the current status of various revisions/additions in progress to the MPEG-4 standard is summarized, and an update is provided on the MPEG-4 industry forum, an information resource on MPEG-4 applications under development.

Course Materials:
Lecture notes to be distributed. Additionally recommended, a new book “Multimedia Systems, Standards and Networks,” Atul Puri and Tsuhan Chen (Eds), Marcel Dekker Inc.(www.dekker.com), ISBN 0-8247-9303-X.

Biographies:
Atul Puri, Ph.D., is a Principal Technical Staff Member, Image Processing Research Department, AT&T Laboratories, Red Bank, New Jersey. In the past, he has been involved in research in video coding algorithms for a number of diverse applications such as videoconferencing, video on Digital Storage Media, HDTV and 3D-TV. Dr. Puri's current research interests are in the areas of video streaming and flexible multimedia systems for services over the web/internet, and multimedia search and retrieval (MPEG-7).

He joined the Visual Communication Research Department of Bell Labs as a member of technical staff and has represented AT&T at the Moving Pictures and Experts Group for the past ten years. He has actively contributed to the development of MPEG-1, MPEG-2 and MPEG-4 audio-visual coding standards and is one of the technical editors of the MPEG-4 standard. Dr. Puri holds 16 patents and has applied for another 8 patents and has published over 32 technical papers. He is the author of the book on MPEG-2 entitled, Digital Video: an Introduction to MPEG-2, and is a coeditor of a newly released book on MPEG-4 called, Multimedia Systems, Standards and Networks.

He serves as associate editor of the IEEE Circuits and Systems for Video Technology (CSVT) journal, and was recently a guest editor of the special issue of the same journal. In late 1998, Dr. Puri received the AT&T Standards Recognition award and the International Standards Organization prize. He is a member of IEEE and its communications and Signal Processing societies.

A. Murat Tekalp, Ph.D., is a Professor in Electrical and Computer Engineering Department, University of Rochester, Rochester, New York.

Dr. Tekalp's current research interests are in the area of digital image and video processing, including object-based video representations, motion tracking, image/video segmentation, video filtering and restoration, multimedia compression and multimedia content description. Prior to his current position, he was a senior research scientist at Eastman Kodak Company.

Dr. Tekalp is author of the book, Digital Video Processing (Prentice Hall 1995); and the Editor-in-Chief of the EURASIP journal Image Communication published by Elsevier. He has also served as an associate editor for the IEEE Transactions on Signal Processing and IEEE Transactions on Image Processing and the Kluwer journal Multidimensional Systems and Signal Processing. He has also been on the editorial boards of the Academic Press journals Graphical Models and Image Processing, and Visual Communication and Image Representation.

Dr. Tekalp is the technical program co-chair for the IEEE ICASSP 2000 to be held in Istanbul, Turkey, has chaired the IEEE Signal Processing Society Technical Committee on Image and Multidimensional Signal Processing, and is the founder and first chairman of the Rochester Chapter of the IEEE Signal Processing Society. He received the NSF Research Initiation Award in 1988, and was named a Distinguished Lecturer by the IEEE Signal Processing Society in 1998. Dr. Tekalp is a Senior Member of IEEE and a member of Sigma Xi.


T3-B - The MPEG-7 Multimedia Content Description Standard

Speakers: J. R. Smith, IBM T. J. Watson Research Center, A. Puri, AT&T Laboratories, A. M. Tekalp, University of Rochester
Time:Afternoon

The Moving Picture's Experts Group (MPEG) is currently working on a new standard called the "Multimedia Content Description Interface," also known as MPEG-7. The effort is being driven by the anticipated internet centric multimedia applications in the areas of image, video and audio databases, interactive media services, universal multimedia access, scientific image libraries etc. MPEG-7 mainly intends to enable fast and efficient searching, browsing and filtering of audio-visual material.

The course starts with an overview of the goals and objectives as well as anticipated application areas of MPEG-7 and moves quickly to discuss the details of MPEG-7. First, we clarify how MPEG-7 addresses the description of content at a number of levels, including low-level (e.g., color, texture, shape, motion), structural (e.g., scene layout, composition), and high-level (e.g., people, objects, places). Next, we discuss the three main parts of MPEG-7: descriptors, description schemes and description definition language (DDL). The descriptors represent content at a low level, the description schemes allow representation at a higher level and the DDL allows definition and extension of standardized description schemes. Finally, using a number of applications in the area of search, retrieval and filtering of audiovisual content we envisage the potential impact that MPEG-7 is likely to have, as well as project what is next for MPEG-7.

Course Materials:

Lecture notes to be distributed. Additionally recommended, Chapters 4, 20, 21 and 22 of the book “Multimedia Systems, Standards and Networks,” Atul Puri and Tsuhan Chen (Eds), Marcel Dekker Inc.(www.dekker.com), ISBN 0-8247-9303-X, March 2000.

Biography:
John R. Smith, Ph.D., is currently Manager of the Pervasive Media Management Group at IBM T. J. Watson Research Center. His research interests include multimedia and multi-dimensional data management, compression, access and retrieval and content-based query systems. Dr. Smith is an active participant in the MPEG-7 Multimedia Description Schemes Group and is chairing the development of the MPEG-7 Conceptual Model. He received his M. Phil and PhD. degrees in Electrical Engineering from Columbia University in 1994 and 1997, respectively. At Columbia, he developed several image and video search and retrieval systems, including the WebSEEk image and video search engine, the VisualSEEk content-based image retrieval system. At IBM, he has developed a progressive video retrieval system called VideoZoom, and a new framework for adaptive compression, access and retrieval of large images, high-resolutions documents and maps. Dr. Smith received the Eliahu I. Jury award from Columbia University for outstanding achievement as a graduate student in the areas of systems communication or signal processing. Dr. Smith is an Adjunct Professor at Columbia University and a member of IEEE.

For biographies of Atul Puri and A. Murat Tekalp, see Tutorial T3A above.


T4-A - Emotion and Paralinguistic Communication

Speakers: J. Cohn, University of Pittsburgh, F. Quek, Wright State University, S. Fels, University of British Columbia, R. Nakatsu , ATR Media Integration & Communications Research Laboratories
Time: Morning

People communicate not only by speech and written language but also by their facial expression, tone of voice, hand gestures, the way they stand or move, and their patterns of gaze. These modes of nonverbal behavior communicate emotion and are often are referred to as paralinguistic because they modify, substitute for, and improve the understanding of spoken communication. Current human-computer interfaces lack access to these important channels of information. This tutorial will present a high level introduction to emotion and paralinguistic communication in order to inform the development of multi-modal user interfaces. Jeffrey Cohn from the University of Pittsburgh will review the psychology of emotion and describe recent work in automatic analysis of emotion expression. Francis Quek from Wright State University will focus on the integration of speech, gaze, and gesture in human communication and multi-modal interfaces. Sidney Fels from the University of British Columbia will review somato-sensory feedback in the acquisition of skilled motor routines and cross-modal communication involving speech and gesture. Ryohei Nakatsu from ATR Media Integration and Communications Research Laboratories will present recent developments in multi-modal interfaces that allow people in remote locations to communicate with each other through virtual scenes. Participants in the tutorial will learn about the psychology of emotion and paralinguistic communication and their application to multi-modal interfaces and virtual environments.

Biographies:
Dr. Jeffrey Cohn is Associate Professor of Psychology and Psychiatry at the University of Pittsburgh and Adjunct Faculty at the Robotics Institute, Carnegie Mellon University. He earned his PhD in clinical psychology in 1983 from the University of Massachusetts, Amherst. His research focuses on emotion processes and their relation to the development of affective disorders in children and adults. To make feasible more rigorous, quantitative measurement of emotion expression, he formed two interdisciplinary research groups with expertise in computer vision, speech science, and human emotion and nonverbal communication. They developed methods for recognition of communicative intent and emotion from vocal fundamental frequency and the Face Analysis System, which tracks gaze and facial features in digitized image sequences and recognizes fine-grained changes in facial expression. His other projects include Psychophysiology of Risk for Depression, and Parental Depression and Infant Development. His research is supported by grants from the National Institute of Mental Health.

Francis Quek is an Associate Professor in the Department of Computer Science and Engineering at the Wright State University. He earned his B.S.E. summa cum laude (1984) and M.S.E. (1984) in electrical engineering from the University of Michigan in two years. He completed his Ph.D. C.S.E. at the same university in 1990. Francis is a member of the IEEE and ACM. He is director of the Vision Interfaces and Systems Laboratory (VISLab), which he established for computer vision, medical imaging, vision-based interaction, and human-computer interaction research. He performs research in multi-modal verbal/non-verbal interaction, vision-based interaction, facial modeling, multimedia databases, medical imaging, collaboration technology, and computer vision. Francis is the Principal Investigator of several National Science Foundation grants in gesture, speech, and gaze research and of a Whitaker Foundation grant in neurovascular feature extraction in medical brain images.

Sidney Fels received his Ph. D. and M.Sc. in Computer Science at the University of Toronto in 1994 and 1990 respectively. He received his B.A.Sc. in Electrical Engineering at the University of Waterloo in 1988. He was a visiting research at ATR Media Integration & Communications Research Laboratories in Kyoto, Japan from 1996 to 1997. He also worked at Virtual Technologies Inc. in Palo Alto, CA developing the GesturePlus system and the CyberServer in 1995. His research interests are in human-computer interaction, neural networks, intelligent agents, and interactive arts. Some of his research projects include Glove-TalkII, Glove-Talk, Iamascope, InvenTcl, Sound Sculpting, and the context-aware mobile assistant project (CMAP).

Ryohei Nakatsu received his B.S., M.S. and Ph.D. degrees in electronic engineering from Kyoto University in 1969, 1971 and 1982, respectively. After joining NTT in 1971, he mainly worked on speech recognition technology. Since 1994, he has been with ATR (Advanced Telecommunications Research Institute) and currently is the president of the ATR Media Integration & Communications Research Laboratories. His research interests include emotion extraction from speech and facial images, emotion recognition, nonverbal communications, and integration of multi-modalities in communications. He is a member of the IEEE, the Institute of Electronics, Information and Communication Engineers Japan (IEICE-J), as well as the Acoustical Society of Japan.


T4-B - Multi-Modal Interfaces for the Physically Able and Disabled

Speakers: J. Ohya, ATR Media Integration & Communications Research Laboratories, S. Morishima, Seikei University, R. Reilly, University College Dublin, and S. K. Semwal, University of Colorado
Time: Afternoon

This tutorial presents approaches to multi-modal user interfaces for people with and without disabilities. Jun Ohya and Shigeo Morishima present new approaches to virtual communication and the synthesis of facial and vocal expression from camera, microphone, and physiologic signals. Richard Reilly and Sudhanshu Semwal discuss the needs of people with cognitive, visual, and other physical disabilities and how to design and implement multi-modal interfaces that can address their needs.

Biographies:
Jun Ohya received the B.S., M.S. and Ph.D. degrees in precision machinery engineering from the University of Tokyo, Japan, in 1977, 1979 and 1988, respectively. He joined NTT Research Laboratories in 1979 and was engaged in image processing, computer vision and full color printing technologies. From 1988 to 1989, he was a visiting research associate of Computer Vision Laboratory, Center for Automation Research, University of Maryland, College Park, MD. In 1992, he transferred to ATR (Advanced Telecommunications Research Institute) and is now a department head of ATR Media Integration & Communications Research Laboratories, Kyoto, Japan. His research interests include virtual communication environments based on computer vision, computer graphics and virtual reality technologies. Dr. Ohya is a member of IEEE, the Institute of Electronics, Information and Communication Engineers Japan, the Information Processing Society of Japan, and the Virtual Reality Soceity of Japan.

Shigeo Morishima received the B.S., M.S. and Ph.D. degrees, all in Electrical Engineering from the University of Tokyo, Tokyo, Japan, in 1982, 1984, and 1987, respectively. Currently, he is an associate professor of Seikei University, Tokyo, Japan. His research interests include Model Based Image Coding, Physics Based Modeling of Face and Body, Facial Expression Recognition and Synthesis, Hair Designing by Computer Graphics, Human Computer Interaction, Life-Like Believable Agent, Image Based Motion Capture System and all about Future Interactive Entertainment. Dr. Morishima is a member of the IEEE, ACM, SIGGRAPH and the Institute of Electronics, Information and Communication Engineers Japan(IEICE-J). He is an Associate Editor of the Journal of the Pattern Recognition Society. He is an editor of Transaction of IEICE-J and committee member of HCG and PRMU of IEICE-J and of Human Interface of IPSJ. He received the ICOGRAPH paper awards in 1988, 1990, 1993 and 1996. And also he received the IEICE-J achievement award in May, 1992. Dr. Morishima was having a sabbatical staying at Visual Modeling Group in AI Lab., Department of Computer Science, University of Toronto from July 1994 to August 1995 as a visiting professor. Dr. Morishima is now a temporary lecturer of Tokyo Woman's Christian University from 1998 and a visiting researcher of ATR Media Integration & Communications Laboratories from 1999.

Dr Richard Reilly received his B.E., M.Eng.Sc. and Ph.D. degrees in 1987, 1989 and 1992, respectively, all in Electronic Engineering, from the National University of Ireland. In 1988 he joined Space Technology Ireland, and the Dept. de Recherche Spatiale part of the CNRS group in Paris, developing a DSP-based Spectrum Analyser as part of the NASA Satellite, WIND. In 1990, he joined the National Rehabilitation Hospital as research engineer. In 1992 he became a Post-Doctoral Research Fellow at University College Dublin in signal processing focused on speech enhancement and gesture recognition. In 1994 he joined the academic staff, as a College Lecturer, in the Department of Electronic and Electrical Engineering at University College, Dublin. He researches in video and biomedical signal processing. Dr Reilly is an assistant editor of IEEE Transactions of Rehabilitation Engineering.

Sudhanshu Kumar Semwal is an Associate Professor at the University of Colorado, Colorado Springs. His primary area of research is graphics (realistic avatars, human animation, volume visualization), human-machine interaction and virtual reality. He received his Ph.D. in Computer Science from the University of Central Florida, Orlando (1987), and his MS from the University of Alberta, Edmonton (1984). His B.E. is from the University of Roorkee, India (1980). He has also held research positions at CRL, Matsushita Electric, Osaka (1991-92), Sandia National Laboratory, Albuquerque (1995), and at ATR, Kyoto (Summer 97, 98, 99). His current interests are realistic avatars, 3D unencumbered tracking and study of emotions in virtual environments.