2- What Do We Need to Know?There are several important aspects of the computer virus problem that need to be understood. Hundreds of different PC viruses have been written, and their number is increasing rapidly. However, only a small minority are actively spreading; the majority are rarely seen outside of virus collections. It is important to know which ones are most prevalent, so as to focus our anti-virus effort properly. We would also like to monitor the prevalence of the most common viruses as a function of time and geographic location. This would allow us to fit their growth or decline to theoretical or phenomenological models, which could be used to project their future course. As an additional benefit, we might use such information to estimate how much time we typically have to derive detectors and cures for new viruses, which affects the required frequency of updates to virus scanners and other anti-virus software. The above considerations are reasonably adequate if we are satisfied with a global perspective on the computer virus problem. However, in order to assess a particular organization's risk of infection, we must gain some insight into what causes some environments to be more conducive to viral spread than others. For example, a good deal of anecdotal evidence suggests that educational institutions are more vulnerable than others. In order to quantify this and other such correlations between organizational type and viral prevalence, we must measure the prevalence of computer viruses in large and small businesses, homes, educational institutions, and government agencies. From an organization's perspective (illustrated in Fig. 1), the world is full of computer viruses that are continually knocking on the door, trying to get in.
Figure 1: Computer virus spread from an organization's perspective. White circles represent uninfected machines, black circles represent infected machines, and gray circles represent machines in the process of being infected. Throughout the world, computer viruses spread among PCs, many of them being detected and eradicated eventually. Left: Occasionally, a virus penetrates the boundary separating the organization from the rest of the world, initiating a virus incident. The frequency with which this occurs depends upon the fraction of infected machines in the world, the number of machines in the organization, and the success of the organization in filtering out infectious contacts with the outside world. Right: The infection has spread to other PCs within the organization. The number of PCs that will be infected by the time the incident is discovered and cleaned up (the size of the incident) depends upon inherent characteristics of the virus and the effectiveness of the organization's anti-virus policies, particularly the extent to which anti-virus software is being used. An organization should have two complementary goals regarding computer viruses: to reduce their influx from external sources, and to reduce their internal spread if they do get in. Each time a virus penetrates an organization's defenses from some external source, it instigates what we shall term an incident -- a cascade of infections that reaches some number of PCs (the size of the incident) and diskettes before being discovered and eradicated. In our definition, the incident size would be zero if a virus on a foreign diskette were detected by the organization before the virus had a chance to infect any machines. Note also that a recurrence of an incident (e.g. due to imperfect cleanup) is to be counted as part of the original incident, not a new one.
It is essential to distinguish between the
number of incidents and the number of infected PCs and diskettes
that an organization has
experienced. These statistics reflect two
different aspects of an organization's ability to manage the problem of computer viruses,
and must not be confused with one another; unfortunately, they often are.
By our definition, the number of incidents is equal to the number of times a virus
has penetrated an organization, which in turn depends upon the frequency of that
virus in the external world and the effectiveness of the organization in limiting
its initial penetration. Some policies that have been advocated
for slowing the influx of viruses include forbidding the use of
bulletin boards, shareware, and diskettes from home and
insisting that all software be centrally acquired and approved.
Integrity shells and resident processes
which scan memory for known viruses before
executing any program are two popular
software techniques for hindering the initial penetration of a virus In practice, the original source of infection can not always be determined. For example, an incomplete cleanup from a virus incident may miss an infected diskette, which instigates a second round of infectious spread some time later. It may be difficult to tell that this second ``incident'' is actually a recurrence of the first incident, rather than the result of another penetration from an external source. At an even finer level of detail, it would be very useful to understand several aspects of user behavior, such as the pattern and frequency of software and diskette exchange between users, the pattern and frequency of software use, and the extent to which anti-virus software is installed and used. Knowledge of these and perhaps other facts about user behavior combined with the other measurements that we have proposed should prove to be of great utility in supporting and calibrating theoretical models. This would provide us with a better understanding of what governs the rate of spread of various types of viruses, which should in turn guide our anti-virus strategy -- allowing us to find a reasonable balance between cost and safety. We must be content with sampling only a small subset of the world's PCs and diskettes. This in turn requires that we report results using categories that are sufficiently coarse-grained to yield acceptable statistics. Unless our resources are fairly substantial, we might wish to limit our attention to a particular type of user environment or geographic region so as not to thin out our statistics too much. Likewise, resource limitations are likely to prevent us from obtaining more than a hazy picture of user behavior. The least expensive option is probably through surveys, which rely on the questionable ability of people to assess their own behavior accurately. Another option is to study user behavior using a combination of observation by sociologists and monitoring by special-purpose software, both of which are expensive even when the studied population is fairly small. Some of the information that we wish to collect is for the purpose of determining the current prevalence of computer viruses, and some is useful for providing insight into what factors primarily influence virus prevalence -- allowing us to build and calibrate models which can predict the future situation and guide our anti-virus strategy. To summarize, some of the major categories of questions that we are particularly interested in answering include:
|