IBM Skip to main content
  Home     Products & services     Support & downloads     My account  
  Select a country  
Journals Home  
  Systems Journal  
  ·  Current Issue  
  ·  Recent Issues  
  ·  Papers in Progress  
  ·  Search/Index  
  ·  Orders  
  ·  Description  
  ·  Author's Guide  
Journal of Research
and Development
  Staff  
  Contact Us  
  Related links:  
     IBM Life Sciences  
     DiscoveryLink  
     DB2 Product Family  
     DataJoiner  
     Garlic  
IBM Systems Journal  
Volume 40, Number 2, 2001
Deep computing for the life sciences
 Table of contents: arrowHTML arrowPDF arrowASCII   This article: arrowHTML arrowPDF arrowASCII arrowCopyright info
   

DiscoveryLink: A system for integrated access to life sciences data sources - References

by L. M. Haas, P. M. Schwarz, P. Kodali, E. Kotlar, J. E. Rice, and W. C. Swope

Cited references and notes

  1. K. Howard, “The Bioinformatics Gold Rush,” Scientific American, July 2000.
  2. See http://www.ncbi.nlm.nih.gov/BLAST/fasta.html.
  3. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic Local Alignment Search Tool,” Journal of Molecular Biology 215, No. 3, 403–410 (1990).
  4. See http://www.nlm.nih.gov/medlineplus/.
  5. See http://www.idbs.co.uk/.
  6. M.-C. Shan, R. Ahmed, J. Davis, W. Du, and W. Kent, “Pegasus: A Heterogeneous Information Management System,” W. Kim, Editor, Modern Database Systems, Chapter 32, ACM Press (Addison-Wesley Publishing Co.), Reading, MA (1994).
  7. L. Liu and C. Pu, “The Distributed Interoperable Object Model and Its Application to Large-Scale Interoperable Database Systems,” Proceedings of the Fourth International Conference on Information and Knowledge Management, ACM, New York (1995).
  8. Y. Papakonstantinou, H. Garcia-Molina, and J. Widom, “Object Exchange Across Heterogeneous Information Sources,” Proceedings of the IEEE Conference on Data Engineering, Taipei, Taiwan, IEEE, New York (1995), pp. 251–260.
  9. M. Carey et al., “Towards Heterogeneous Multimedia Information Systems,” Proceedings of the Fifth International Workshop on Research Issues in Data Engineering, Taipei, Taiwan, March 1995, IEEE, New York (1995).
  10. L. M. Haas, P. Kodali, J. E. Rice, P. M. Schwarz, and W. C. Swope, “Integrating Life Sciences Data—with a Little Garlic,” Proceedings of the IEEE International Symposium on Bio-Informatics and Biomedical Engineering, IEEE, New York (2000).
  11. T. Studt, “Next Generation Database Management Tools,” R&D Magazine, Drug Discovery & Development, January 2000, http://www.dddmag.com/feats/0001net.htm.
  12. See chapter 2 of Reference 39.
  13. A. Bairoch and R. Apweiler, “The SWISS-PROT Protein Sequence Database and Its Supplement TrEMBL in 2000,” Nucleic Acids Research 28, No. 1, 45–48 (2000).
  14. A. Dalby, J. Nourse, W. D. Hounshell, A. Gushurst, D. Grier, B. Leland, and J. Laufer, “Description of Several Chemical Structure File Formats Used by Computer Programs Developed at Molecular Design Limited,” Journal of Chemical Information and Computer Sciences 32, No. 3, 244–255 (1992).
  15. D. Weininger, “SMILES,” Journal of Chemical Information and Computer Sciences 28, No. 1, 31–36 (1988).
  16. See http://www.genome.ad.jp/kegg/.
  17. D. Chamberlin, A Complete Guide to DB2 Universal Database, Morgan Kaufmann Publishers, San Francisco, CA (1998).
  18. A. Tomasic, L. Raschid, and P. Valduriez, “Scaling Heterogeneous Databases and the Design of DISCO,” Proceedings of the 16th International Conference on Distributed Computer Systems, Hong Kong, 1996, IEEE, New York (1996).
  19. S. Adali, K. Candan, Y. Papakonstantinou, and V. S. Subrahmanian, “Query Caching and Optimization in Distributed Mediator Systems,” Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Canada, June 1996, ACM, New York (1996), pp. 137–148.
  20. K. Kulkarni, “Object-Oriented Extensions in SQL3: A Status Report,” Proceedings of the ACM SIGMOD Conference on Management of Data, Minneapolis, May 1994, ACM, New York (1994).
  21. M. Tork Roth and P. Schwarz, “Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources,” Proceedings of the Conference on Very Large Data Bases (VLDB), Athens, Greece, August 1997, ACM, New York (1997).
  22. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price, “Access Path Selection in a Relational Database Management System,” Proceedings of the ACM SIGMOD Conference on Management of Data, Boston, MA, May 1979, ACM, New York (1979), pp. 23–34.
  23. J. Hellerstein and M. Stonebraker, “Predicate Migration: Optimizing Queries with Expensive Predicates,” Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, DC, May 1993, ACM, New York (1993), pp. 267–276.
  24. S. Chaudhuri and L. Gravano, “Optimizing Queries over Multimedia Repositories,” Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, June 1996, ACM, New York (1996), pp. 91–102.
  25. Actually, the optimizer normally ignores pairs when there is no predicate connecting them (e.g., Compounds and Proteins in this query), because typically these “cross-products” do not make good plans.
  26. L. Shapiro, “Join Processing in Database Systems with Large Main Memories,” ACM Transactions on Database Systems 11, No. 3, 239–264 (1986).
  27. The host variable :KETANSERIN_MOL is presumed to contain an appropriate representation of the ketanserin structure, perhaps as generated by a sketching tool.
  28. In this paper, we represent query fragments in SQL; the actual wrapper interface will use an equivalent data structure that does not require parsing by the wrapper.
  29. M. Tork Roth, F. Ozcan, and L. Haas, “Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System,” Proceedings of the Conference on Very Large Data Bases (VLDB), Edinburgh, Scotland, September 1999, ACM, New York (1999).
  30. We did not analyze why this occurred, because it was not unexpected: our experience with DataJoiner over the years is that it usually introduces little overhead, and occasionally can run a transaction faster than the native data source. There are several possible reasons why this script might run faster through DiscoveryLink, among them, DiscoveryLink's superior optimizer and the fact that it ran on a separate machine, hence could apply more hardware to the problem. In this instance, the result is probably due to the DiscoveryLink engine exploiting the resources of its separate machine, because the four queries in script four are fairly simple, and with one exception leave little room for optimization.
  31. F. Rezende and K. Hergula, “The Heterogeneity Problem and Middleware Technology: Experiences with and Performance of Database Gateways,” Proceedings of the Conference on Very Large Data Bases (VLDB), New York, August 1998, ACM, New York (1998).
  32. L. Haas, D. Kossmann, E. Wimmers, and J. Yang, “Optimizing Queries Across Diverse Data Sources,” Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece, August 1997, Morgan Kaufmann Publishers, San Francisco, CA (1997).
  33. See http://www.mdli.com.
  34. See http://www.daylight.com.
  35. Y. Martin, “Comparison of Programs That Calculate Octanol-Water logp Using Starlist,” Proceedings of the 12th Annual Daylight User Group Meeting, Daylight Chemical Information Systems (1997).
  36. G. Klopman and H. S. Rosenkranz, “Toxicity Estimation by Chemical Substructure Analysis: The Tox ii Program,” Toxicology Letters 79, 145–155 (1995).
  37. R. C. Glen and A. W. R. Payne, “A Genetic Algorithm for the Automated Generation of Molecules Within Constraints,” Journal of Computer-Aided Molecular Design 9, No. 2, 181–202 (1995).
  38. I. D. Kuntz, “Structure-Based Strategies for Drug Design and Discovery,” Science 257, 1078–1082 (1992).
  39. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, A. D. Baxevanis and B. F. F. Ouellette, Editors, Wiley-Liss, New York (1998).
  40. See http://www.msi.com.
  41. See http://www.oxfordmolecular.com.
  42. See http://www.genomica.com.
  43. S. Davidson, C. Overton, V. Tannen, and L. Wong, “BioKleisli: A Digital Library for Biomedical Researchers,” International Journal of Digital Libraries 1, No. 1, 36–53 (1997).
  44. I-M. A. Chen, A. S. Kosky, V. M. Markowitz, and E. Szeto, “Constructing and Maintaining Scientific Database Views in the Framework of the Object-Protocol Model,” Proceedings of the Ninth International Conference on Scientific and Statistical Database Management, IEEE, New York (1997), pp. 237–248.
  45. N. W. Paton, R. Stevens, P. Baker, C. A. Goble, S. Bechhofer, and A. Brass, “Query Processing in the TAMBIS Bioinformatics Source Integration System,” Proceedings of the 11th International Conference on Scientific and Statistical Database Management, IEEE, New York (1999), pp. 138–147.
  46. P. Carter, T. Coupaye, D. Kreil, and T. Etzold, “SRS: Analyzing and Using Data from Heterogeneous Textual Databanks,” S. Letovsky, Editor, Bioinformatics: Databases and Systems, Chapter 18, Kluwer Academic Press (1998).
  47. T. Etzold and P. Argos, “SRS: An Indexing and Retrieval Tool for Flat File Data Libraries,” Computer Applications in the Biosciences 9, 49–57 (1993).
  48. See http://www.netgenics.com/.
  49. See http://www.tripos.com.
  50. See http://www.gcg.com/.
  51. See http://www.incyte.com/.
  52. See http://www.nimble.com/.
  53. See http://www.w3.org/TR/NOTE-xml-ql.
  54. See http://www.software.ibm.com/data/datajoiner/.
  55. See http://www.oracle.com/.
  56. See http://www.sybase.com/.
  57. See http://www.microsoft.com.
  58. L. M. Haas, R. J. Miller, B. Niswonger, M. Tork Roth, P. M. Schwarz, and E. L. Wimmers, “Transforming Heterogeneous Data with Database Middleware: Beyond Integration,” IEEE Data Engineering Bulletin 22, No. 1, 31–36 (1999).
  59. R. J. Miller, L. M. Haas, and M. A. Hernandez, “Schema Mapping as Query Discovery,” Proceedings of the Conference on Very Large Data Bases (VLDB), Cairo, Egypt, September 2000, ACM, New York (2000).