IBM®
Skip to main content
    Country/region [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Teiresias



Exploring the genetic makeup of living organisms is incredibly complex because of the immense number of variables involved in isolating and tracking genes.  IBM researchers have developed a fast and efficient algorithm that helps find patterns present in biological sequences.  This work is advancing research in genetic studies of living organisms, but is also finding applications in other bioinformatics problems outside of discovering patterns. 

Detecting Patterns

Biological sequence analysis has received a lot of attention since the early 1990s.  As database sizes grew, so did the interest in mining the accumulated entries for existing patterns. One problem that attracted researchers in genetics was the discovery of patterns that are common to a family of proteins assumed to be related.

When carrying out "pattern discovery," one looks through a database in order to identify anything that appears frequently.  What is sought is not known in advance but is actually determined in the process. Pattern discovery differs from pattern matching -- in pattern matching, the item to search for is known in advance (see description). In both tasks, speed is essential.

Special Algorithm

IBM researchers developed a new combinational algorithm called Teiresias which carries out pattern discovery in one or more "event streams."  Examples of such streams include DNA and proteins. The algorithm has the ability to discover all patterns occurring two or more times in any such set of data, and  imposes no restrictions on the composition of the patterns, their location, minimum/maximum length, or relative arrangements.  The algorithm is very fast and excels with very weak signals; it gains its speed by avoiding the exhaustive exploration of the space of potential solutions while at the same time reporting completeness of the reported results.

IBM scientists are currently using Teiresias to tackle several important problems from the field of computational biology, outside the immediate context of pattern discovery.

IBM scientists have applied the algorithm to a number of problems including the unsupervised discovery of novel protein families, multiple sequence alignment, and the discovery of previously unobserved singularities in the space of natural proteins.  Also, a dictionary of protein patterns is currently being compiled and will subsequently be associated with functional and structural information.

In collaboration with researchers from Monsanto, Teiresias is being applied to the discovery of specific patterns in biological databases. IBM and Monsanto scientists are working on identifying and mapping the structural patterns of proteins involved in plant development and human diseases. It's anticipated that the collaborative effort will produce new scientific insights that could potentially lead to improved plants, for example plants that are resistant to certain diseases and that will yield more.



  

Related Links
   Monsanto

  

  About IBM  |  Privacy  |  Terms of use  |  Contact