4.2- Worldwide Virus PrevalenceIn the remainder of this section, we shall look at statistics which typify not just our sample population, but which also reveal much about virus prevalence in the world as a whole. We have maintained a current collection of known viruses by working cooperatively with other virus collectors. At any given moment in time, the number of viruses for which we have signatures in our virus scanner is a conservative estimate of the number of different viral strains in the world. The number of viruses which are actually spreading is taken to be the number of viruses that we have seen in at least one actual incident.
Figure 7: Number of viruses known to us (those we have collected and analyzed) and number of viruses ``in the wild'' (observed by us in actual incidents) as a function of time.
Figure 7 shows that the number of different viruses
that have been
written has grown dramatically during the last four years. So far,
it has been growing at a roughly exponential rate,
doubling approximately every 7 months.
During the last two years, the number of viruses that we have seen
in real incidents has consistently been approximately
15% to 20% of the total number
in our collection, and a majority of these have
only been seen once or twice. We suspect that a significant portion
of the viruses which have been seen rarely or never are below
the epidemic threshold. Others have just been unfortunate so far,
and might conceivably get a lucky break someday that will enable
them to spread appreciably Figure 8 emphasizes the point that a few viruses account for many, but certainly not all, of the observed incidents. The ten most common viruses accounted for 67% of the incidents, with the remaining 33% being distributed among 91 different viruses, over half of which were seen only once. A number of other viruses that we have seen in previous years were not observed at all during 1992. This leaves well over 1000 viruses in our collection that we have never observed at any time. It is interesting to note that the relative market share of the top two viruses has been declining steadily. In 1990, the Stoned and 1813 (Jerusalem) together accounted for 51% of all incidents. In 1991, this dropped to 34%. In 1992, Form supplanted 1813 as the second-most common virus; the Stoned and From viruses together accounted for just 28% of the observed incidents. This decrease is not due to decreased prevalence; it is due to the fact that new viruses are continually entering the field. Some of these newcomers, notably Joshi and Form, are proving to be rather successful. Other new viruses are seen only rarely, but there are so many of them that the fraction of incidents in the ``Other'' category is growing rapidly.
Figure 8: Relative frequency of incidents involving the most common viruses during 1992.
Figure 9 shows the observed incident rate of the five most common viruses of 1992 as a function of time. Except for the first half of 1992, it can also be taken to be the relative frequency of these viruses in the world as a whole. (During the first half of 1992, the Michelangelo scare caused an anomaly which disturbed the proportionality between observed and actual incidents; this will be discussed in Section 4.3.) If we restrict our attention to the fourth quarter of 1991 and earlier (in order to avoid the confusion of the Michelangelo effect), a common pattern is evident in Fig. 9. Viruses appear to increase in prevalence at an approximately linear rate for a period of six months to two years, and then plateau at a very low level. Some, such as 1813, appear to decline after a relatively stable period. Bouncing Ball (not shown) had been in apparent equilibrium for several years, perennially appearing in the list of the 5 most common viruses; during 1992, its prevalence declined precipitously, to about one fourth of its former equilibrium level. This may indicate that viruses like 1813 and Bouncing Ball have fallen below the epidemic threshold, possibly because anti-virus software is being used more widely than it had been. The Brain is a prime example of a virus which is nearly extinct.
Figure 9: Number of incidents involving five of the most common viruses as a function of time. The units (incidents per 1000 PCs per quarter) pertain to our sample population only, but the curves should also be reasonable estimates of the relative worldwide prevalence of each virus. The data points are bracketed by bars indicating the statistical sampling error that one would expect given the number of observed incidents. (The bars do not represent errors in the measured data.)
The viruses which are increasing in prevalence are clearly above the epidemic threshold, but their strongly sub-exponential spread rate points to highly localized software sharing. The ones which are approximately stable in prevalence are apparently in equilibrium. It is somewhat surprising that the equilibrium is at such a low level: approximately 0.2 incidents per 1000 PCs per quarter for Stoned, the most prevalent virus, at the end of 1991. To estimate the number of infected machines that this represents, we must multiply this figure by the average incident size for the world. Not knowing the extent to which most organizations are protected against viruses, this is difficult to estimate, but in any case it is clear that the fraction of the world's machines which are infected with any particular PC-DOS virus is exceedingly small. If we accept the simple theory of Section 2, this can only be explained if the birth rate is infinitesimally larger than the death rate. This seems very unlikely, especially since several viruses are apparently above threshold, but none have become very prevalent.
We suspect that a combination of two factors is
decreasing the equilibrium. First, kill signals are probably
operating informally, i.e. some people tell their friends
when they discover that they are infected. Depending on the
assumptions that one puts into the theoretical kill signal
models of Section 3.1, this can decrease the equilibrium infection rate by
a very substantial factor.
Second, it is conceivable that, when
someone experiences a computer virus, or hears that someone
they know became infected, they become more vigilant. Unlike
biological diseases, exposure to one computer virus can actually confer
immunity against nearly all computer viruses. As was mentioned
in Section 3.1, such an immunization effect can also be accommodated
within the kill signal model by setting
It should be noted that our observations completely contradict the predictions made by Tippett [2]. In March, 1990, he predicted that by March, 1992 there would be approximately 8 million infected PCs in the world -- an 8% infection rate. He claimed that the 1813 (Jerusalem) virus would continue to double in prevalence every 1.5 to 2.6 months. In fact, according to Fig. 9, its prevalence remained remarkably stable over a long period of time following that prediction, and today it appears to be declining. He also predicted that, for any virus, exponential growth would continue until approximately 20% of the computer population was infected, after which its prevalence would continue to increase at a slower rate. Note that, even for those viruses in Fig. 9 which have increased in prevalence, it would be difficult to claim that the growth has been exponential! We attribute this to highly localized software sharing, as was described in Section 2. Figure 10 shows the incident rate from all viruses as a function of time in our sample population. It can also be interpreted as the relative frequency of all viruses in the world as a whole. (Again, this is not the case during the first half of 1992 for reasons that will be explained in Section 4.3.) During the last quarter of 1991, about 0.1% of the PCs in our sample population became infected by some external source. This small but rising increase is due to two separate factors. First, some individual viruses are becoming more prevalent. Second, there has been an increase in the number of different varieties of successfully-spreading viruses (e.g. the Joshi, which as shown in Fig. 9a first appeared in our sample population in late 1990). It should be recognized that the statistic shown here is distinct from that presented in Figs. 7 and 9, and can be thought of as a somewhat complicated combination of the two of them.
Figure 10: Total number of virus incidents in sample population as a function of time. The units (incidents per 1000 PCs) pertain to our sample population only, but the curve should also be proportional to the worldwide prevalence of all computer viruses as a function of time. The data points are bracketed by bars indicating the statistical sampling error that one would expect given the number of observed incidents. Estimates derived from the raw survey data collected by Dataquest are displayed as well.
It is interesting to calibrate our measurements of the total virus incident rate against those of Dataquest [3], which sampled a much greater diversity of organizations. Unfortunately, a direct comparison with their results is not possible because they reported the percentage of organizations which experienced at least one incident during given time intervals. The organizations ranged over more than two orders of magnitude in size. However, by re-examining the Dataquest raw data, and taking into account the distribution of organization sizes, we have been able to de-convolve their results so as to provide an estimate of the number of incidents per 1000 PCs [5]. The Dataquest rate of approximately 0.81 incidents per 1000 PCs for the third quarter of 1991 is in the same range as our own observation of 0.90 incidents per 1000 PCs for that quarter. This suggests that the incident rate within our sample population is roughly the same as that in the rest of the world (at least the portion of the world sampled by Dataquest -- North American businesses and educational and governmental institutions.) This is consistent with the fact that our sample population is not taking any unusual precautions to prevent viruses from penetrating the organizational boundary; the special anti-virus policies that were instituted a few years ago are designed to limit the size of incidents, not their frequency. The fact that the Dataquest data for the year 1990 is considerably lower than ours may indicate fading memory on the part of the survey respondents (who took the survey in October 1991), or a lesser awareness of the virus problem during 1990 than during 1991.
|