4- Our Study of Virus PrevalenceIn this section we provide answers to Q1 and Q2, based upon the growth of our virus collection and our study of virus incidents in a large selected population of computer users over a period of several years. The sample population is international, but biased towards the U.S. It is stable, both in makeup and in size. We believe it to be typical of Fortune 500 companies possessing the three important characteristics cited in the previous section -- regular use of anti-virus software, user education, and central reporting of incidents -- plus active central response to incidents. These characteristics give us confidence that the data we collect from the sample population are accurate. Of course, these same characteristics are not typical of many other environments, so some of our results may not be representative of universities, home users, and other businesses which lack the cited characteristics. Ironically, it is precisely these special properties of our sample population which enable us to draw some important general conclusions about the computer virus problem in the world as a whole. To begin the story of how this is possible, we present in Fig. 2a the distribution of incident sizes during a six-month period when the above-mentioned anti-virus strategies were first being deployed in the various components of our sample population.
Figure 2: a) Fraction of incidents of given size during six-month periods when strategies were first being deployed. b) Fraction of infected PCs involved in incidents of given size during the same time period. During this period, the average incident size was 3.4 PCs. Most (63%) of the incidents involved just zero or one PCs. (Recall from section 2 that an incident size is defined as zero if a foreign diskette is caught before it can infect any of an organization's PCs.) Only 12% of the incidents involved more than 5 PCs. However, Fig. 2b presents a different view of the same data. Even though incidents larger than 5 PCs were fairly rare, they accounted for 59% of the total number of infected PCs. Thus the larger incidents actually accounted for most of the problem! Fig. 3 shows the corresponding distributions for 1991, after the anti-virus strategies had been in place for some time. The average incident size was cut by more than a factor of two to just 1.6 PCs. In the vast majority of cases (80%), the infection was caught before it could infect more than one PC. Only 2.5% of the incidents involved more than 5 PCs, and these large incidents accounted for only 19% of the total number of infected PCs. We believe that the anti-virus policies that were implemented helped to create a more hostile environment for computer viruses and thus are largely responsible for this marked improvement. We can expect the average incident size to be larger than that of Fig. 3 (and more like that of Fig. 2) in organizations which have not yet implemented active response policies.
Figure 3: a) Fraction of incidents of given size during 1991. b) Fraction of infected PCs involved in incidents of given size during 1991.
Now let us see how we can exploit the small incident size within our population to learn something about the virus problem in the world as a whole. As was noted in the previous section, each incident (according to our definition) stems from infected software that originated outside of the sample population. Of course, in practice it is not always possible to tell whether two ``different incidents'' are really related, and hence should be counted as a single, larger incident. However, since very large virus incidents are and always have been relatively rare in our sample population, we conclude that it is below the epidemic threshold for computer viruses [2]. In other words, our sample population is unable to sustain an ongoing computer virus infection. The lack of much internal spread of viruses makes it easier to believe that the incidents that we record are not merely repercussions of previous internal incidents. The belief that most of our ``incidents'' really arise from an external source is corroborated by another observation: incidents involving uncommon viruses are rarely clustered in time or space, as they would be if the virus were to spread between different parts of the organization or recur at the same location due to an incomplete cleanup. Assume that the success with which viruses enter an organization remains constant in time, and that the organizations in our sample population were exposed to a fairly representative sample of the world's actively-spreading viruses. To the extent to which this is true, the remaining statistics that we present in this section reflect not just characteristics of virus incidents in our sample but also the relative populations of various viruses in the world as a function of time. It is somewhat remarkable that, by studying a single sample population, we are able to distinguish its characteristics from those of the world at large. It is our clear distinction between the number of incidents and the number of infected PCs that enables us to accomplish this feat. Figure 4 presents one aspect of global virus trends (Q1) -- the number of different viruses in our High Integrity Computing Laboratory collection and the number of different viruses observed in incidents in our sample population as a function of time. We have been able to maintain a current collection of known viruses by working cooperatively with other virus collectors.
Figure 4: Number of viruses known to us (those we have collected and analyzed) and number of viruses ``in the wild'' (observed by us in actual incidents) as a function of time.
As was noted earlier, the number of different viruses is a somewhat fuzzy and debatable notion. We do not adhere rigidly to one specific criterion for determining whether two viruses in our collection are the same, but generally we count two viruses as different only if there are at least several bytes of code that do not match. In the case of degarbling viruses, which use one or more heads to encrypt and decrypt themselves in an attempt to confuse virus scanners, we count all possible realizations of a virus as a single virus. Since our collection can never be completely up-to-date, the number of different viruses in our collection can be taken as an approximate lower bound on the number of viruses in existence in the world. The number of different viruses that have been written has grown dramatically during the last four years, and the rate at which they are being written is accelerating. During the last two years, the number of viruses that we have seen in real incidents has consistently been in the range of 15% to 20% of the total number in our collection, and a majority of these have only been seen once or twice. Thus only a very small minority of computer viruses are very successful. Figure 5 emphasizes the point that a few viruses account for many, but certainly not all, of the observed incidents. During 1991, the Stoned and 1813 (Jerusalem) viruses together accounted for 34% of the observed incidents. The ten most common viruses accounted for 69% of the incidents, with the remaining 31% being distributed among 73 different viruses, half of which were only seen once. A number of other viruses that we have seen in previous years were not observed at all during 1991. This leaves over 600 viruses in our collection that we have never observed at any time. It is interesting to note that the ``market share'' of the Stoned and 1813 viruses has declined from the previous year, when together they accounted for 51% of all incidents. This has occurred despite an increase in the prevalence of both, and can be traced to several new viruses having entered the field in 1991. Some of these newcomers -- notably Joshi, Form, and Tequila -- are proving to be rather successful. Other new viruses are seen only rarely, but there are so many of them that the fraction of incidents in the ``Other'' category is growing rapidly.
Figure 5: Relative frequency of incidents involving the most common viruses during 1991.
Figure 6 shows the relative frequency of the four most common viruses as a function of time. There has been a reasonably consistent upward trend in the number of Stoned and Joshi incidents. The prevalence of the Bouncing Ball virus has been fairly stable over the last two years. The rising or stable trends of these and many other viruses indicate that they are above the epidemic threshold in the world at large (but not in our sample population). The 1813 (Jerusalem) increased in prevalence until early 1991, but appears to be leveling off or even declining now. Other viruses which are not shown in this figure are almost certainly in decline, indicating that they are below the epidemic threshold. The Brain is a prime example of a virus which is nearly extinct. Thus some viruses appear to be increasing in prevalence, some are stable, and some are decreasing. All of these behaviors are consistent with our simple theories of computer virus replication based upon an application of standard mathematical epidemiology to the problem [2]. They directly contradict predictions made by Tippett [3], who believes that all computer viruses will continue to replicate at an exponential rate until approximately 20% of the computer population is infected, after which they will continue to increase at a slower rate. Note that, even for those viruses which have increased in prevalence, it would be difficult to claim that the growth has been exponential, as claimed by Tippett. A linear fit to the growth curves would appear to do at least as well, if not better in most cases.
Figure 6: Number of incidents involving the most common viruses as a function of time. The units (incidents per 1000 PCs) pertain to our sample population only, but the curves should also be reasonable estimates of the relative worldwide prevalence of each virus. The data points are bracketed by bars indicating the statistical sampling error that one would expect given the number of observed incidents.
Figure 7 shows the relative frequency of incidents from all viruses as a function of time in our sample population. During the last quarter of 1991, about 0.1% of the PCs in our sample population became infected by some external source. This fraction is quite small, but it is rising. Part of this rise is due to the increase in the prevalence of individual viruses (e.g. the Stoned in Fig. 6). The other contributing factor is the increase in the number of different varieties of successfully-spreading viruses (e.g. the Joshi, which as shown in Fig. 6a first appeared in our sample population in late 1990). It should be recognized that the statistic shown here is distinct from that presented in Figs. 5 and 6, and can be thought of as a somewhat complicated combination of the two of them.
Figure 7: Total number of virus incidents in sample population as a function of time. The units (incidents per 1000 PCs) pertain to our sample population only, but the curve should also be proportional to the worldwide prevalence of all computer viruses as a function of time. The data points are bracketed by bars indicating the statistical sampling error that one would expect given the number of observed incidents.
|