IBM Technical Journals

Special report: Celebrating 50 years of the IBM Journals
DiscoveryLink: A system for integrated access to life sciences data sources

Award plaque by L. M. Haas,
P. M. Schwarz,
P. Kodali,
E. Kotlar,
J. E. Rice,
and W. C. Swope

Vast amounts of life sciences data reside today in specialized data sources, with specialized query processing capabilities. Data from one source often must be combined with data from other sources to give users the information they desire. There are database middleware systems that extract data from multiple sources in response to a single query. IBM's DiscoveryLink is one such system, targeted to applications from the life sciences industry. DiscoveryLink provides users with a virtual database to which they can pose arbitrarily complex queries, even though the actual data needed to answer the query may originate from several different sources, and none of those sources, by itself, is capable of answering the query. We describe the DiscoveryLink offering, focusing on two key elements, the wrapper architecture and the query optimizer, and illustrate how it can be used to integrate the access to life sciences data from heterogeneous data sources.

Originally published:

IBM Systems Journal, Volume 40, Issue 2, pp. 489-511 (2001).


DiscoveryLink™ is a middleware software product from IBM. Middleware is software that is invoked by application software to simplify access to data in one or more data sources. In the life sciences there are many different sources of data, and the need to integrate and use the various sources is great. DiscoveryLink can be used to build a federated database system, simplifying access to a range of local and remote data sources. It enables scientific programmers to more easily build the next generation of complex applications. This software has been applied in life sciences settings where data from biological and chemical experiments were combined and queried to support experiments with new approaches to drug design.

This highly cited paper shows how new software tools can enable businesses to integrate and effectively use the vast amount of data which is growing rapidly.


