Skip to main content


next previous up

Next References
Previous Acknowledgments
Up How Prevalent are Computer Viruses?

Appendix

In this appendix we describe the transformation which allows us to extract an estimate of the the number of incidents per 1000 PCs from Dataquest's statistic and their reported distribution of organization sizes.

Within a specified time period (e.g. the year 1990), each PC has a small probability of being infected by some external source and thus serving as the initial seed for an incident within its organization. Let us assume that this probability is equal to p for each of the 618,000 PCs included in Dataquest's survey. We also assume that whether or not one particular PC serves as an initial seed for an incident has no effect on whether any other PC does so.

Consider an organization with n PCs. The probability that there will be no incidents in the specified time period is simply tex2html_wrap_inline420 for small p. Thus the probability for there to be one or more incidents (the Dataquest statistic) in such an organization is approximately tex2html_wrap_inline424 .

Now we must account for the distribution of organization sizes. Suppose that the fraction of organizations in the sample population of size n is given by f(n). Then a weighted average of the probabilities yields an overall Dataquest statistic of:

 

equation221

Only a coarse-grained version of f(n) was available to us. In other words, Dataquest reported the number of respondents with responsibility for between 100 and 250 PCs, 250 and 500 PCs, 500 and 1000 PCs, etc. For each of these bins, we divided the number of PCs by the number of respondents to obtain a representative size for each bin. For example, in the 100 to 250 PC category, 116 respondents accounted for 19,513 PCs. This gives a representative size of 168.2 PCs per organization in this category. Since there were 602 respondents in all, we set f(n=168.2) = 116/602 = 0.193. We followed the same procedure for each bin to obtain a reasonable approximation to the values of the other f(n).

All that remains is to invert Eq. 1 so that we can use the reported value of DQ (e.g. 0.26 for 1990) to determine p. An analytic solution is impossible, but a numerical solution is trivial. We just need to experiment with various values of p on the right-hand side until we obtain the correct value of DQ on the left-hand side. For our example of DQ=0.26, this yields p=0.00042.

We can expect the coarse-graining of the distribution to introduce a small amount of error, which will be easy to eliminate once the exact distribution becomes available.


next previous up

Next References
Previous Acknowledgments
Up How Prevalent are Computer Viruses?


 

  back to index