Skip to main content


next previous up

Next Acknowledgments
Previous 4.2- Simulations on Two-Dimensional Lattices
Up Directed-Graph Epidemiological Models of Computer Viruses

5- Conclusion

Cohen showed that a perfect defense against computer viruses is impossible; we have shown that it may be unnecessary. Defense mechanisms are adequate for preventing widespread propagation of viruses if the rate at which they detect and remove viruses is sufficiently high relative to the rate at which viruses spread between users. The fact that an epidemic can only occur if the infection rate exceeds a finite critical threshold has been known in the biological realm for over half a century; we have shown that this result holds in the computational realm as well. For conditions which seem likely to hold in the computational domain, we have discovered that the epidemic threshold is actually higher than its classical value. Another encouraging finding is that, even if the infection rate is above the epidemic threshold, the number of infected individuals grows much more slowly than predicted by the standard homogeneous interaction model if the interactions are local.

In section 2, we formulated the directed random graph model and studied its behavior using three different techniques: deterministic approximation, stochastic approximation, and simulation. We obtained theoretical results which were essentially identical to those of the classical homogeneous interaction theory by ignoring the details of the connectivity of the graphs. In particular, we found that epidemics can not occur unless the ratio of the total rate at which an infected individual attempts to infect other individuals exceeds the rate at which individuals become cured. If the infection rate exceeds this threshold, an epidemic is still not certain, but it becomes increasingly probable as the infection rate is increased further above the threshold. The average number of infected individuals in equilibrium increases with the infection rate. It can take on any value, from near zero when the infection rate is just above threshold to the size of the population when the cure rate (and hence the threshold) is zero.

Simulations showed that these theoretical results hold when the connectivity of the graphs is sufficiently large, but fail miserably when the connectivity is small. In particular, if the total infection rate is held fixed while the connectivity is decreased, there is a dramatic decrease in the probability of an epidemic and in the average number of infected individuals. In other words, when the connectivity is small, the epidemic threshold is greatly increased over the value predicted by our extension of the classical homogeneous interaction theory. To our knowledge, this is the first observation of an apparent interaction between two well-known threshold phenomena: the epidemic threshold discovered by Kermack and McKendrick in the 1930's [27] and the percolation threshold for random graphs discovered by Erdös and Rényi in 1960 [36].

In section 2.4, we added weak links to the random graph model to simulate the effect of infrequent program sharing with many other individuals. In this case, the epidemic threshold is intermediate between the random-graph and the homogeneous values. The hierarchical model of section 3 was introduced to account for a distribution of rates of program sharing in a more realistic manner than the weak-link model. In addition, it allowed us to capture the phenomenon of user cliques -- groups of users which share programs with one another more frequently than with other users. We observed an increase of the epidemic threshold over its classical value which was very similar in character to that of the random graph model. Individual simulations of epidemics on hierarchical graphs revealed a number of interesting and surprising features. In some cases, the number of infected individuals fluctuated wildly; in others, the number of infections formed a series of plateaus separated by rapid growth spurts.

Taken together, the results of the random graph, weak-link, and hierarchical models demonstrate that, when most of the total infection rate is concentrated into just a few nodes, epidemics have a much harder time establishing themselves than predicted by the classical homogeneous theory. We are currently trying to develop theories which can describe this very interesting effect quantitatively.

By studying a spatial model in section 4, we viewed the effect of locality and user cliques from another perspective and obtained some analytic results based upon a diffusion-like equation for viral spreading in space and time. We found that the number of infected individuals grows polynomially in time, as opposed to the exponential growth rate in random graphs. The growth rate in the hierarchical model also appears to be polynomial under some conditions, but we have not yet obtained analytic results in this case. We believe that actual systems are intermediate between the extremes of random connectivity and local connectivity, so we expect the growth rate of infection to be intermediate between that of the random graph model and that of the hierarchical and spatial models.

The epidemiological approach to the study of computer virus propagation is quite general because it makes no assumptions about how viruses are detected and removed. Any mechanism that diminishes the infection rate or increases the detection rate will help to prevent widespread epidemics. The existence of a sharp threshold for epidemics means that it is worth doing everything possible to bring the infection rate below this threshold, but that further effort is not warranted. Our discovery that the topology of program sharing can have a profound effect upon the ability of viruses to spread may eventually lead to alternative methods for suppressing epidemics which could supplement the above-mentioned efforts to affect the infection and cure rates. While the models that we have studied are still somewhat simplistic, we expect that future work on more complex and realistic models will retain many of the features that we have observed here.

The work that we have presented here immediately suggests a number of areas for further research. We are currently trying to gain a better understanding of how the epidemic threshold depends upon the connectivity of a random graph. Our present understanding of the hierarchical model is solely based upon simulations, and we would like to develop theoretical expressions for the epidemic threshold as a function of the localization parameter and for the functional form of the growth rate of an epidemic. The peculiar phenomena that we have observed in individual simulations on hierarchical graphs, such as wild fluctuations and plateaus in the number of infected individuals as a function of time, deserve further attention. We would also like to experiment with disordered hierarchical graphs, in which the hierarchy of rates is retained but the locality of interactions is strongly disturbed or destroyed. This might allow us to isolate the effects of locality more cleanly.

We have not touched upon several areas that merit future investigation. A number of other models in the epidemiological literature have important analogs in the computational realm. In particular, the SIR (susceptible tex2html_wrap_inline1888 infected tex2html_wrap_inline1890 removed) model, in which individuals become permanently immune once they have been infected and cured, would be appropriate in the limit where users become extremely vigilant after having experienced a viral infection. The actual situation is probably somewhere between the SIS and SIR extremes. After discovering a viral infection, users may initially become much more conscientious about using anti-virus software, but if a long time passes without incident they may relax their vigilance to some degree. Models analogous to this scenario have been studied within a biological context, for there are some cases in which the body gradually loses its immunity to a particular disease [27]. Another interesting notion is the ``kill signal'', a message sent by a node upon discovering that it is infected, warning all nodes to which it is connected that they may also be infected. Our preliminary investigations suggest that this may be one of the most powerful means for thwarting epidemics. Certainly, it makes a good deal of intuitive sense and has long been used in the medical profession for the purpose of stamping out sexually transmitted diseases. The kill signal is just one of many examples of adaptive responses to viral infection. A close study of the immune system might prove to be a rich source of ideas for other adaptive methods for control and suppression of computer virus infections.

Finally, we feel that it is of the utmost importance to collect data on program-sharing habits and viral spread rates and incorporate them into our models. User surveys and centralized reporting of virus incidents would be invaluable. As epidemiologists have discovered, the task of collecting such information and incorporating it into models is fraught with difficulties, but we hope to benefit from the decades of experience that they have accumulated in dealing with such problems.

We are not the first to apply the mathematical techniques of epidemiology outside of the biological realm. Mathematical epidemiology has been used to gain insights into how ideas propagate [37] and, more practically, to develop novel algorithms for maintaining replicated databases [38]. As often occurs when mathematical techniques are adapted to new applications, we have been forced to extend those techniques in somewhat unfamiliar and unanticipated directions. We hope that in so doing we have enriched mathematical epidemiology and all fields to which it can be applied as much as we have benefitted from using it.


next previous up

Next Acknowledgments
Previous 4.2- Simulations on Two-Dimensional Lattices
Up Directed-Graph Epidemiological Models of Computer Viruses


Back To Index