The aim of this project is the development of new and efficient algorithms for analysing huge molecule databases with up to one billion molecules with respect to biological activity. Thereby, we concentrate on molecular similarity search and molecular clustering, which are important tasks for substructure and virtual screening, similarity, diversity, and quantitative structure-activity relationship analysis within rational drug design.
For that, we propose the new Maximum Similar Subgraph (MSS) paradigm which extends the well-known Maximum Common Subgraph problem with allowed deviations with respect of similar bioactivity. We will use our newly developed MSS search algorithms in order to compare and cluster huge sets of molecules. We address the use of parallelism (massive and distributed) in mainstream architectures such as clusters of multicore processors as well as space-efficient data structures and algorithms. The developed methods will directly be tested and applied within computeraided drug discovery and design projects. One research topic is for example the development of new compounds for the treatment of tuberculosis and trypanosomatid diseases like sleeping sickness.