|
HealthMiner
HealthMiner
mines clinical data to find new relationships that could have direct
applications for conducting biomedical research, making prognoses
and developing diagnostic tests.
HealthMiner, an application that analyzes patient
data, was developed as part of a joint project between IBM Research
and the IBM Healthcare & Life Sciences group. This innovative
solution fills a void in digital patient record analysis by discovering
rules and relationships that can lead to new knowledge, new hypotheses
and more focused medical research. As a middleware package, it
allows IBM to offer a way for our Independent Software Vendor
(ISV) partners to automate their rule-authoring process.
Despite the fact that this is a very focused
application, the methods underlying HealthMiner are general enough
to make the middleware solution applicable to a wide range of
problems, including risk management, economic and market analysis,
quality control, epidemiology, security risk identification, and
more. The strong point of this hybrid solution is that it draws
upon the complementary strengths of three quite different and
very powerful analytic methods: Thoth, CliniMiner® and Predictive
Analysis.
Thoth was developed by the Bioinformatics
and Pattern Discovery group, continues the longstanding effort
to identify recurring patterns of nearby elements within sequential
inputs of arbitrary length, and use them to build applications
such as multiple sequence alignment, gene finding, protein annotation
and others. Thoth discovers patterns, and then uses their relationships
to deduce increasingly complicated association rules. This, in
turn, permits the construction of extensive diagnostic canons.

CliniMiner®
(a.k.a. "FANO") is based on the Bayesian expectation of information
and its intrinsic moments, and enables the rapid computation of
degrees of mutual dependency between variables by using relationships
with the Riemann Zeta function, and other number-theoretic methods.
Significant relationships between variables (including preferred,
rare, negative or non-observed associations) can be identified
by selecting those conditions that satisfy Zeta-function-based
criteria.
Predictive Analysis was developed by the Data
Analytics Research group developed Predictive Analysis, using
its extensive expertise on machine learning. This analytic method
learns decision rules using a procedure known as Lightweight Rule
Induction (LRI), which works by searching through many possibilities
to find the best ones in terms of predictive value, sensitivity
and specificity. The method generates classification rules in
disjunctive normal form, i.e. rules that are a sequence of OR-ed
expressions with each expression being a conjunction of one or
more variables.
In developing our prototype we processed 700,000
patient records made available to us by the University of Virginia.
Preliminary analyses by our colleagues have shown that virtually
100 percent of the outputs generated by HealthMiner are biologically
appropriate and well established. Interestingly, some outputs
are novel, which makes their potential value for medical research
very great.
Following the success of our first pilot project,
we are in the process of extending our work to the analysis of
approximately 400,000 digital records from patients with clinically-diagnosed
schizophrenia.
B. Robson and R. Mushlin (2004), "Clinical
and pharmacogenomic data mining. 2. A Simple Method for the Combination
of Information from Associations and Multivariances to Facilitate
Analysis, Decision and Design in Clinical Research and Practice.
" J. Proteome Res. In press.
B. Robson B. and R. Mushlin (2004), "Genomic
Messaging System for Information-Based Personalized Medicine with
Clinical and Proteome Research Applications", J. Proteome
Res. In Press.
B. Robson and R. Mushlin (2004), "The
Dragon on the Gold: Myths and Realities for Data Mining in Biotechnology
using Digital and Molecular Libraries", J. Proteome Res.
In Press.
B. Robson (2003), "Clinical
and pharmacogenomic data mining. 1. The generalized theory of expected
information and application to the development of tools."
J. Proteome Res., 283-301, 2 (2003).
I. Rigoutsos, T. Huynh, A. Floratos, L. Parida and D. Platt, (2002)
"Dictionary-driven Protein
Annotation." Nucleic Acids Research. 30(17).
B. Robson and J. Garnier (2002), "The
Future of Highly Personalized Health Care." Stud Health
Technol Inform. 80:163-74.
S. Weiss, and N. Indurkhya (2000), "Lightweight
rule induction", Proceedings of the Seventeenth International
Conference on Machine Learning, pp. 1135-1142.
L. Parida, A. Floratos and I. Rigoutsos (1999), "An Approximation
Algorithm for Alignment of Multiple Sequences Using Motif Discovery."
J. Comb. Optim.,
3(2/3):247-275, July 1999.
I, Rigoutsos, and A. Floratos (1998), "Combinatorial
Pattern Discovery In Biological Sequences: The TEIRESIAS Algorithm."
Bioinformatics, vol.14, num. 1.
S. Weiss, R. Galen and P. Tadepalli (1990), "Maximizing the
predictive value of production rules." J. Artif. Intell., 45(1-2),
pp. 47 - 71.
|
 |
 |
|
|
What is the most exciting potential
future use for the work you're doing?
The methods that we have
combined make it possible to discover new relationships
between diseases, diagnostic test results, treatments, prescriptions,
and other patient characteristics. Some of those relationships
can reveal negative (resp. positive) side effects of medications
or treatments, new correlations between diagnostic test
result complexes and disease, and other information that
could pave the way toward a deeper understanding of disease
physiology. HealthMiner can also aid in creating new diagnostic
software tools to guide physicians and facilitate their
diagnosis.
What is the most interesting part
of your research?
The aspect of this research that excites me most
is devising ways on how to use the tools we have been developing
in creating new and unexpected applications – for example,
going from pattern discovery to the discovery of actionable
association rules. Then there is the immediate connection
with basic physiological processes, with biochemistry and
medicine, collaborating with physicians in trying to understand
the meaning and importance of rules that we've discovered,
working with people with very diverse backgrounds and a
common objective – all of this is fascinating.
What inspired you to go into this
field?
The multidisciplinary character of it: we have
an unprecedented opportunity to use pattern discovery to
connect physics, biochemistry, medicine, computer science
and other domains. We also have the opportunity to connect
to IBM's bottom line through contributions we can make to
HealthCare and Life Sciences.
What is your favorite invention
of all time?
Mechanical clock escapement: translates oscillatory rotational
movement into rotation in one direction. The idea is echoed
in a number of different technologies, including the ratchet
and pawl and piston engines. Together, these simple inventions
were important to technological development for millennia,
and provided the foundations for the world's industrial
revolution.
|
|