Skip to main content


next previous up

Next 3- Measuring Prevalence in a Given Environment
Previous 1- Introduction
Up How Prevalent are Computer Viruses?

2- What Do We Need to Know?

There are several important aspects of the computer virus problem that need to be understood. Hundreds of different PC viruses have been written, and their number is increasing rapidly. However, only a small minority are actively spreading; the majority are rarely seen outside of virus collections. It is important to know which ones are most prevalent, so as to focus our anti-virus effort properly. We would also like to monitor the prevalence of the most common viruses as a function of time and geographic location. This would allow us to fit their growth or decline to theoretical or phenomenological models, which could be used to project their future course. As an additional benefit, we might use such information to estimate how much time we typically have to derive detectors and cures for new viruses, which affects the required frequency of updates to virus scanners and other anti-virus software.

The above considerations are reasonably adequate if we are satisfied with a global perspective on the computer virus problem. However, in order to assess a particular organization's risk of infection, we must gain some insight into what causes some environments to be more conducive to viral spread than others. For example, a good deal of anecdotal evidence suggests that educational institutions are more vulnerable than others. In order to quantify this and other such correlations between organizational type and viral prevalence, we must measure the prevalence of computer viruses in large and small businesses, homes, educational institutions, and government agencies.

From an organization's perspective (illustrated in Fig. 1), the world is full of computer viruses that are continually knocking on the door, trying to get in.

  

figure34

Figure 1: Computer virus spread from an organization's perspective. White circles represent uninfected machines, black circles represent infected machines, and gray circles represent machines in the process of being infected. Throughout the world, computer viruses spread among PCs, many of them being detected and eradicated eventually. Left: Occasionally, a virus penetrates the boundary separating the organization from the rest of the world, initiating a virus incident. The frequency with which this occurs depends upon the fraction of infected machines in the world, the number of machines in the organization, and the success of the organization in filtering out infectious contacts with the outside world. Right: The infection has spread to other PCs within the organization. The number of PCs that will be infected by the time the incident is discovered and cleaned up (the size of the incident) depends upon inherent characteristics of the virus and the effectiveness of the organization's anti-virus policies, particularly the extent to which anti-virus software is being used.

An organization should have two complementary goals regarding computer viruses: to reduce their influx from external sources, and to reduce their internal spread if they do get in. Each time a virus penetrates an organization's defenses from some external source, it instigates what we shall term an incident -- a cascade of infections that reaches some number of PCs (the size of the incident) and diskettes before being discovered and eradicated. In our definition, the incident size would be zero if a virus on a foreign diskette were detected by the organization before the virus had a chance to infect any machines. Note also that a recurrence of an incident (e.g. due to imperfect cleanup) is to be counted as part of the original incident, not a new one.

It is essential to distinguish between the number of incidents and the number of infected PCs and diskettes that an organization has experienced. These statistics reflect two different aspects of an organization's ability to manage the problem of computer viruses, and must not be confused with one another; unfortunately, they often are. By our definition, the number of incidents is equal to the number of times a virus has penetrated an organization, which in turn depends upon the frequency of that virus in the external world and the effectiveness of the organization in limiting its initial penetration. Some policies that have been advocated for slowing the influx of viruses include forbidding the use of bulletin boards, shareware, and diskettes from home and insisting that all software be centrally acquired and approved. Integrity shells and resident processes which scan memory for known viruses before executing any program are two popular software techniques for hindering the initial penetration of a virus gif. We suspect that some of these strategies provide a more reasonable balance between convenience and safety than others. However, in order to make well-founded recommendations, we must be able to correlate their use with the observed number of incidents in particular organizations. Of course, even if such policies help, they can not completely stem the flow of viruses. It is necessary to employ another class of anti-virus measures designed to limit the spread of a virus once it has penetrated an organization. The average size of an incident is governed by the virulence of the virus and the effectiveness of the anti-virus measures in place within the organization. In particular, the extent to which anti-virus software is installed and used should be measured, as should the degree to which it is responsible for the initial discovery of a virus. We would like to know what avenues exist for informing other employees or central CERTs (Computer Emergency Response Teams) about virus incidents. Finally, we would like to measure the degree to which the organization's PCs use LAN servers, which can enhance the natural virulence of a virus.

In practice, the original source of infection can not always be determined. For example, an incomplete cleanup from a virus incident may miss an infected diskette, which instigates a second round of infectious spread some time later. It may be difficult to tell that this second ``incident'' is actually a recurrence of the first incident, rather than the result of another penetration from an external source.

At an even finer level of detail, it would be very useful to understand several aspects of user behavior, such as the pattern and frequency of software and diskette exchange between users, the pattern and frequency of software use, and the extent to which anti-virus software is installed and used. Knowledge of these and perhaps other facts about user behavior combined with the other measurements that we have proposed should prove to be of great utility in supporting and calibrating theoretical models. This would provide us with a better understanding of what governs the rate of spread of various types of viruses, which should in turn guide our anti-virus strategy -- allowing us to find a reasonable balance between cost and safety.

We must be content with sampling only a small subset of the world's PCs and diskettes. This in turn requires that we report results using categories that are sufficiently coarse-grained to yield acceptable statistics. Unless our resources are fairly substantial, we might wish to limit our attention to a particular type of user environment or geographic region so as not to thin out our statistics too much. Likewise, resource limitations are likely to prevent us from obtaining more than a hazy picture of user behavior. The least expensive option is probably through surveys, which rely on the questionable ability of people to assess their own behavior accurately. Another option is to study user behavior using a combination of observation by sociologists and monitoring by special-purpose software, both of which are expensive even when the studied population is fairly small.

Some of the information that we wish to collect is for the purpose of determining the current prevalence of computer viruses, and some is useful for providing insight into what factors primarily influence virus prevalence -- allowing us to build and calibrate models which can predict the future situation and guide our anti-virus strategy. To summarize, some of the major categories of questions that we are particularly interested in answering include:

  1. Global computer virus trends.

    • How many computer viruses have been in existence as a function of time?
    • For each computer virus, how many copies of it have existed in the world as a function of time?

  2. Local characteristics. For each virus incident:

    • When did it occur?
    • How many PCs and diskettes were infected?
    • Where did it occur?
    • What are the characteristics of the organization in which it occurred?

      • Type (e.g. educational, governmental, business, etc.)
      • Number of PCs
      • Percentage of PCs that use LAN servers
      • Anti-virus policies, such as installation and use of anti-virus software, restrictions on transport of diskettes from home, downloading software from bulletin boards, etc.

  3. How do users behave?

    • How frequently does a typical user share software and diskettes?
    • With how many other users does a typical user share software and diskettes?
    • What are their major sources of software?
    • How many different applications does a typical user run during a day?
    • What types of anti-virus protection do they use, how frequently do they use it, and under what circumstances?
    • To what extent do they report discoveries of a virus to other users or central agencies?


next previous up

Next 3- Measuring Prevalence in a Given Environment
Previous 1- Introduction
Up How Prevalent are Computer Viruses?


 

  back to index