Skip to main content


next previous up

Next 3- Two New Models
Previous 1- Introduction
Up Measuring and Modeling Computer Virus Prevalence

2- Epidemiological Models

In our modeling of computer virus spread [6, 7], we have borrowed some important concepts and simplifications from the well-established field of mathematical epidemiology [8]  gif.

In particular, we ignore the details of infection within an individual (in our case, a computer system, along with all associated storage media), considering it to be in one of a small number of discrete states, such as infected or susceptible. Furthermore, we ignore the details of how disease is transmitted among individuals. We assume that, from time to time, individuals have ``adequate contacts'' with one another, resulting in transmission of the disease if one individual is infected and the other is susceptible. The details of what constitutes adequate contact vary from one disease (or computer virus) to another, but we simply assume that the total rate of adequate contacts between one individual and the rest of society is tex2html_wrap_inline852 . We also assume that there is some death rate tex2html_wrap_inline854 at which the individual is cured of the infection gif.

For computer viruses, the rate of adequate contact tex2html_wrap_inline856 is influenced by anything that promotes or hinders viral replication, including mechanisms by which the virus infects programs, the rate of software transfer among computers, and precautions taken by users such as the use of a write-protect tab or integrity maintenance systems. The death rate tex2html_wrap_inline858 is influenced by intrinsic characteristics of the virus which might disguise or reveal its presence, user awareness and vigilance, and detection (and subsequent removal) of the virus by anti-virus software.

In addition to borrowing ideas from mathematical epidemiology, we have extended it by incorporating topological effects which turn out to be quite important [6, 7]. In the homogeneous mixing assumption, every individual in the population is assumed to be equally likely to infect or to be infected by every other individual. Our work has shown that this approximation works well when each individual has many randomized contacts with others. However, if the number of contacts that a typical individual has with others is fairly small and/or the pattern of contacts is more or less localized, the approximation fails terribly. We suspect that the majority of today's computer populations are characterized by a degree of sparsity and locality that invalidates the homogeneous mixing approximation.

Figure 1 exemplifies a situation in which individuals (represented by nodes in the graph) are connected in both a sparse and a local manner. It can be thought of as representing a likely scenario in which workers within one group exchange software frequently among themselves, somewhat less frequently with other members of their department, and even less frequently with users in other companies, universities, or countries. The resulting topology contains random hierarchically-nested clusters with occasional cross-links. It is said to be sparse because each individual has adequate contacts (represented by edges of the graph) with just a few others. In other words, the average degree of the nodes in the graph is some small constant independent of the size of the graph. It is said to be local because, if nodes B and C are neighbors of (i.e. connected to) A, the probability for B and C to be neighbors is significantly enhanced over what it would be in a random graph.

  

figure77

Figure 1: Snapshot of viral-spread simulation running on sparsely-connected, hierarchically-clustered topology. Each individual, represented by a node, has adequate contact with an average of three others. White and black nodes represent uninfected and infected individuals, respectively. The pattern of exchange is fairly localized, and therefore so is the pattern of infection.

By analyzing and simulating viral spread on a variety of topological structures, we have reached the following conclusions gif:

  1. In homogeneous systems (fully-connected graphs), an epidemic threshold occurs when tex2html_wrap_inline870 . When tex2html_wrap_inline872 ( tex2html_wrap_inline874 ), the system is above the ``epidemic threshold'', and an epidemic occurs with probability tex2html_wrap_inline876 . If it does occur, the number of infections increases exponentially ( tex2html_wrap_inline878 ), eventually saturating at an equilibrium of tex2html_wrap_inline880 , where N is the number of nodes. Below the epidemic threshold ( tex2html_wrap_inline884 ; tex2html_wrap_inline886 ), small outbreaks may occur whenever the disease is introduced into the population, but they can not be sustained for long.
  2. In sparse systems, the epidemic threshold still exists, but the critical ratio tex2html_wrap_inline888 is diminished to some value less than 1. As the average degree of nodes in the graph diminishes, so does tex2html_wrap_inline890 , and the probability of an epidemic diminishes (dropping to zero if tex2html_wrap_inline892 slips below tex2html_wrap_inline894 ). Even when an epidemic does occur, the growth rate is slowed, and the equilibrium level of infection depressed below what it would be in the corresponding homogeneous system.
  3. In localized systems, the epidemic threshold and the equilibrium level of infection may or may not be affected. What is certain is that the growth in the number of infections with time is slowed qualitatively, becoming strongly sub-exponential.


next previous up

Next 3- Two New Models
Previous 1- Introduction
Up Measuring and Modeling Computer Virus Prevalence


Back To Index