Skip to main content


next previous up

Next 4- Our Study of Virus Prevalence
Previous 2- What Do We Need to Know?
Up How Prevalent are Computer Viruses?

3- Measuring Prevalence in a Given Environment

How do we go about answering the three major categories of questions posed in the last section (which we shall refer to as Q1 through Q3)? As was stated, we are limited to sampling some chosen subset of computer users, typically within a particular type of environment to which we have access.

One approach, which was taken by Certus in 1990 and 1991 and by Dataquest in 1991, is to survey a large number of organizations by contacting the person within each who is most responsible for troubleshooting virus problems. Based upon their surveys, Certus and Dataquest drew conclusions about some aspects of the trends of particular viruses over time and several details of virus incidents (Q1.B and Q2, respectively). Although these surveys do shed some light on the virus problem, many of the results are suspect because they rely on the accuracy of people's recollection. In some cases, respondents were asked to recall events which happened up to two years in the past, and it is not clear how many of them kept accurate records of virus incidents. Underreporting of old virus incidents would not be surprising under such circumstances.

We feel that a much more reliable way to answer these questions is to collect statistics on virus incidents directly from a large chosen population as they occur. For each incident, we must record (at a minimum) where and when the incident occurred, what virus was involved, and how many machines were affected. Other details of virus incidents (e.g. other sub-categories of Q2) would be useful to record as well. This method requires a population with three important characteristics:

  1. Anti-virus software in regular use by users. Users must have the means to determine if they are infected. If they are, they must have a reliable way of determining the identity of the virus.
  2. Educated users. Users must know what viruses are, how to use anti-virus software, and to whom they should report an infection if they discover one.
  3. Central reporting. There must be a central reporting facility that collects information about virus incidents.

We are still left with the other questions which were posed in the previous section. The question of how many different viruses exist in the world (Q1.A) has been debated hotly by several computer virus collectors and pundits. Continuance of the debate is nurtured by several factors:

  • Rapid growth in the number of viruses. This makes it virtually impossible for any one collection to contain all known viruses.
  • Incomplete standardization of virus names, so that it is difficult to pool different collections.
  • Lack of agreement about what constitutes a different computer virus. Some consider a one-byte difference between two viruses to be sufficient to count them as two separate viruses; others lump related viruses together into families. Family boundaries are ill-defined.

We are content to let people enjoy their debate. Eventually, some good may come of it. However, we feel that it is much more important to know which few of the untold hordes of viruses are worth worrying about (Q1.B). The set of questions regarding user behavior (Q3) is important, but can not be answered by monitoring virus incidents. A user survey could be useful, although its reliance on people's ability to quantify their own behavior accurately could introduce a substantial amount of error [1]. Software tools which monitor user behavior in certain limited environments might be very useful supplements to such a survey.

Some additional remarks apply to all of the questions posed in the previous section. Whether the information is obtained via surveys or by closely monitoring a large population of users, care must be taken in gathering and interpreting the data. We must define the quantities we are trying to measure carefully and make sure that the data we gather actually measure those quantities and that they are accurate. Even if the data are accurate, there are many pitfalls that must be avoided in their interpretation. For example, both the Certus and Dataquest studies have blurred the distinction between the number of incidents and the number of infected machines, and have in some cases failed to distinguish between the number of infections by one particular virus and the number of infections due to all known viruses. As we will see, attempting to extrapolate from noisy or incomplete data is another such pitfall.


next previous up

Next 4- Our Study of Virus Prevalence
Previous 2- What Do We Need to Know?
Up How Prevalent are Computer Viruses?


 

  back to index