3- Virus VerifiersA virus verifier is a program that, given a file or disk that is probably infected with a given virus, determines with a high degree of certainty whether the virus is a known strain, or a new variant. This is, of course, important to know: if the virus is different from any known strain, it will have to be analyzed for new effects before we can be confident that we know just what to do to clean up after it. On the other hand, if the virus is identical to a known strain, we already know what to do. It is particularly important to perform verification in a program that attempts to automatically remove the virus infection from an object, restoring it to its original uninfected form. Abstractly, a verifier is a program that, given another program as input, determines whether or not the given program is part of the set of possible ``offspring'' of a particular virus. For many classes of viruses, including all the viruses actually widespread at the moment, this is easy to do. Almost all known viruses consist almost entirely of code that does not change from infection to infection, except perhaps for a simple XOR-type garbling, and data areas that are either constant, or change in simple ways (or that can be ignored entirely for the purposes of verification). Given a suspect file F and a known virus V, it is therefore always relatively simple to answer the question ``is F a file that could have been produced by infection with virus V?''. It is an open question of some theoretical interest whether or not some future virus might make this harder to do! Reliably determining whether a file is infected with any virus at all is of course known to be impossible, but we have no similar result about determining the presence of a specific virus. There are various concrete decisions and tradeoffs involved in writing a virus verifier; this section will list a few of them, and the next sections will describe the verifier/remover currently being developed and used at the High Integrity Computing Lab at IBM's Watson Research Center. A verifier may be an independent tool, or it may be integrated into a virus detector. An integrated detector/verifier can be quicker and more convenient, since there's no need for a user to find and run a verifier once the detector goes off. On the other hand, since most copies of any virus detector will never in fact detect a virus (most of the world's computers are not infected, after all), integrating a verifier along with the detector is in some sense inefficient, in that it adds significant code to the detector that may never be used. Given how much more expensive human time is than CPU time and disk space these days, integrated tools are likely to be more cost-effective in the long run. On the other hand, detection and verification will always be two different activities, because it is very desirable for a detector to detect small variants of known viruses as viruses, whereas a verifier must be able to identify any variation as a variation. Detection algorithms are typically run very often, and must be fast. Verification algorithms, on the other hand, are run rarely (only when a virus is detected), and speed is typically not a major issue. To determine whether or not a given object is infected with a known strain of a virus, a verifier must know what the known strain looks like. This may be done either with an actual copy of the code of the known strain of the virus, or by using a CRC or similar modification-detection algorithm. It's not generally desirable to include the entire code of a virus with widely-distributed tools, for obvious reasons! On the other hand, even a good difficult-to-invert digital signature algorithm is not as reliable as a byte-for-byte comparison, and it is vulnerable to a virus author intentionally creating a variant that looks to the verifier like a known strain. (This can be made arbitrarily hard through the use of cryptographic checksums and related technologies, at some increase in runtime and complexity.) Lastly, a verifier may use either special-purpose code, with one or more routines being written in some compiled language for each new strain discovered, or it may be written as an interpreter for a high-level virus-description language. A high-level language is generally simpler to program in reliably; on the other hand, this is only true because it is less expressive, which implies that there will be cases (viruses that are exotically self-garbling, for instance) in which it will be necessary to drop into the lower-level programming language again.
> |