|
|
 |
|
 |
Volume 40, Number 2, 2001
Deep computing for the life sciences |
|
Table of contents: HTML PDF ASCII |
|
This article: HTML PDF ASCII |
Copyright info |
 |
 |
 |
 |
| |
|
Fastfinger: A study into the use of compressed residue pair separation matrices for protein sequence comparison - References |
 |
by B. Robson |
 |
 |
 |
Cited references and notes
-
M. S. Waterman, Introduction to Computational Biology, Chapman & Hall, London (1988).
-
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, Basic Local Alignment Search Tool, Journal of Molecular Biology 215, No. 3, 403410 (1990).
-
W. R. Pearson and D. J. Lipman, Improved Tools for Biological Sequence Comparison, Proceedings of the National Academy of Sciences (USA) 85, No. 8, 24442448 (1988).
-
I. Rigoutsos, A. Floratos, C. Ouzounis, V. Gao, and L. Parida, Dictionary Building via Unsupervised Hierarchical Motif Discovery in the Sequence Space of Natural Proteins, Proteins: Structure, Function, and Genetics 37, No. 2, 264267 (1999).
-
A. Califano and I. Rigoutsos, FLASH: A Fast Look-up Algorithm for String Homology, Proceedings, First International Conference on Intelligent Systems for Molecular Biology, Bethesda, MD (July 79, 1993).
-
G. Stolovitsky and A. Califano, Discrete Applied Mathematics Series, P. Penver, Editor, submitted.
-
I. Rigoutsos and A. Floratos, Motif Discovery Without Alignment or Enumeration, Proceedings, Second Annual ACM International Conference on Computational Molecular Biology, New York (March 2225, 1998).
-
Several referees and colleagues objected to my particular uses of the words homology and homologous in the original manuscript, as differing from that adopted in molecular biology for the past ten years or so. I have adjusted the present text accordingly, but I have never been comfortable with the more recent usage that relates to the idea of common biological origin and departs from the original definition. (For discussion, see B. Robson and J. Garnier, Introduction to Proteins and Protein Engineering, First Edition, Elsevier Press, Amsterdam (1984), pp. 235238, and also in relation to the use of the term conservative substitution as discussed by French and Robson.12) First, the modern definition used within molecular biology does not correspond to the mathematical definition. The latter relates more to the idea of correspondence, and makes no stipulation of a (hypothetical) common origin back in time. It seems quite reasonable to speak of sequences of symbols as being homologous, irrespective of the hypothesis that there is a common origin of the genes for the proteins that they represent. The use of homology as revealed through finger matrices makes particular sense, because mathematical uses of the word, having to do with correspondence between vertices of an object, can be shown to be much closer to the matter of the correspondences between matrices of pair-wise residue symbol separation distances. Second, at least one referee held to the definition that homology is a binary state: the hypothesis is either true or false, so homology, if one uses percentage notation, would be either 0 or 100 percent. Another expressed the view that since homology expresses common ancestry, 100 percent homology is simply not meaningful. However, even leaving aside the argument that percentage homology is a reasonable shorthand for estimates of the probability (on a percentage basis) of a hypothesis about homology being true, it is arguable that the hypothesis of common ancestry is not usefully treated as a binary state. Rather, there are multiple subhypotheses about states representing gradations, relating to the tree structure of evolution (over which gradations, we can also distribute our held degrees of belief). The foreleg of a horse is homologous to the wing of a bird in the sense that they are believed to be of common origin, but it does not seem unreasonable to state that it is more homologous to the foreleg of a cow, both in terms of contemporary points of correspondence, and in terms of the hypothesis of a more recent common origin.
-
T. Nagell, Introduction to Number Theory, John Wiley & Sons, Inc., New York (1951).
-
E. Nagel and J. R. Newman, Goedel's Proof, New York University Press, New York (1958).
-
R. Perline, Zipf's Law, the Central Limit Theorem, and the Random Division of the Unit Interval, Physical Review E 54, No. 1, 220223 (1996).
-
S. French and B. Robson, What Is a Conservative Substitution? Journal of Molecular Evolution 19, 171175 (1983).
-
S. Henikoff and J. Henikoff, Amino Acid Substitution Matrices from Protein Blocks, Proceedings of the National Academy of Sciences (USA) 89, No. 22, 1091510919 (1992).
-
J. Garnier and B. Robson, The GOR Method for Predicting Secondary Structures in Proteins, Prediction of Protein Conformation and the Principles of Protein Conformation, G. D. Fasman, Editor, Plenum Publishers, New York (1989), Chapter 10, pp. 417465.
-
Robson, unpublished manuscript.
-
S. D. Silvey, Statistical Inference, Penguin Books, London (1970).
-
I(X: ~X; A, B) may be read as the information provided by A and B as to whether X will or will not occur, or the information that the joint event A and B carries about X as opposed to any other possibility than X.
|
 |
|
|