Skip to main content
Click to return to IBM ECVG home

Visual Transfer Functions

One desirable form of interaction is to allow the user to identify a location on the screen by pointing to it with their nose. Essentially, the user envisions a line extending from their nose, perpendicular to their face, that intersects the computer screen.

To explain the process, consider the point T, for Target, on the screen which is our best estimate of where the user is aiming their face in the current frame. Now consider a second point on the screen called S for Selection point. S is the point the system will use for any event occurring during this frame. A good example is in using Facial Pointing as a simple cursor replacement. S is the current cursor location where any mouse click will occur. In other types of user interaction, S could be used for other purposes. Unfortunately, using T directly as S makes identification of small regions (objects) on the screen very difficult due to noise in the estimate of T as well as the dynamic characteristics of human movement. We therefore typically we install a task-specific "transfer function", F, as an intermediary between T and S.

Transfer Function

A transfer function which seems to work well for this scenario takes the offset between the current Selection point (S) and the current Target (T), and then maps it through a sigmoid function to determine what fraction of the total offset to move the Selection point. With this transfer function, the selection point is tightly coupled to the target position on long movements. This allows the user to move the selection point quickly across the screen, with essentially no lag, though positioning will be less precise. On short movements, however, the selection point is loosely coupled to the target position, it moves only a fraction of the distance to the target each cycle.  This means that the user can use larger face movements for fine positioning than they otherwise would, but still gently drag the selection point to the desired location. The tradeoff is that the selection point seems to lag behind where the user is pointing as though on a stretchy rubber band. By setting the parameters of the sigmoid (knee and slope), the behavior can be tuned for good usability.

If F is instead a traditional filter, such as Kalman's, the noise is significantly reduced, but small objects are still difficult to identify because of the dynamic considerations. A major consideration is the gain between the amount of face movement and the amount of movement of S. When the user is moving the cursor across the screen (coarse positioning), they need S to track T closely to avoid an annoying lag in response. When the user is trying to home in on small screen objects, however, (fine positioning) then if S tracks T closely the small facial movements required are difficult to perform accurately. In this case we want to turn down the gain so that larger facial movements can be used. As can be seen, there is an element of skill involved in choosing a suitable transfer function for an application.

 
Contact: Rick Kjeldsen Last updated: 6/12/02
 
Research Projects Group Papers Issued Patents Related Groups


 
Privacy | Legal | Contact | IBM Home | Research Home | Project List | Research Sites | Page Contact