In Silico Target Prediction by Training Naive Bayesian Models on Chemogenomics Databases

Nidhi

In Silico Target Prediction by Training Naive Bayesian Models on Chemogenomics Databases

Files

Nidhi.doc (1.3 MB)

Date

2006-06-29T19:50:21Z

Authors

Nidhi

Language

American English

Committee Chair

Merchant, Mahesh

Degree

M.S.

Degree Year

2005-12

Department

School of Informatics

Grantor

Indiana University

Abstract

The completion of Human Genome Project is seen as a gateway to the discovery of novel drug targets (Jacoby, Schuffenhauer, & Floersheim, 2003). How much of this information is actually translated into knowledge, e.g., the discovery of novel drug targets, is yet to be seen. The traditional route of drug discovery has been from target to compound. Conventional research techniques are focused around studying animal and cellular models which is followed by the development of a chemical concept. Modern approaches that have evolved as a result of progress in molecular biology and genomics start out with molecular targets which usually originate from the discovery of a new gene .Subsequent target validation to establish suitability as a drug target is followed by high throughput screening assays in order to identify new active chemical entities (Hofbauer, 1997). In contrast, chemogenomics takes the opposite approach to drug discovery (Jacoby, Schuffenhauer, & Floersheim, 2003). It puts to the forefront chemical entities as probes to study their effects on biological targets and then links these effects to the genetic pathways of these targets (Figure 1a). The goal of chemogenomics is to rapidly identify new drug molecules and drug targets by establishing chemical and biological connections. Just as classical genetic experiments are classified into forward and reverse, experimental chemogenomics methods can be distinguished as forward and reverse depending on the direction of investigative process i.e. from phenotype to target or from target to phenotype respectively (Jacoby, Schuffenhauer, & Floersheim, 2003). The identification and characterization of protein targets are critical bottlenecks in forward chemogenomics experiments. Currently, methods such as affinity matrix purification (Taunton, Hassig, & Schreiber, 1996) and phage display (Sche, McKenzie, White, & Austin, 1999) are used to determine targets for compounds. None of the current techniques used for target identification after the initial screening are efficient. In silico methods can provide complementary and efficient ways to predict targets by using chemogenomics databases to obtain information about chemical structures and target activities of compounds. Annotated chemogenomics databases integrate chemical and biological domains and can provide a powerful tool to predict and validate new targets for compounds with unknown effects (Figure 1b). A chemogenomics database contains both chemical properties and biological activities associated with a compound. The MDL Drug Data Report (MDDR) (Molecular Design Ltd., San Leandro, California) is one of the well known and widely used databases that contains chemical structures and corresponding biological activities of drug like compounds. The relevance and quality of information that can be derived from these databases depends on their annotation schemes as well as the methods that are used for mining this data. In recent years chemists and biologist have used such databases to carry out similarity searches and lookup biological activities for compounds that are similar to the probe molecules for a given assay. With the emergence of new chemogenomics databases that follow a well-structured and consistent annotation scheme, new automated target prediction methods are possible that can give insights to the biological world based on structural similarity between compounds. The usefulness of such databases lies not only in predicting targets, but also in establishing the genetic connections of the targets discovered, as a consequence of the prediction. The ability to perform automated target prediction relies heavily on a synergy of very recent technologies, which includes: i) Highly structured and consistently annotated chemogenomics databases. Many such databases have surfaced very recently; WOMBAT (Sunset Molecular Discovery LLC, Santa Fe, New Mexico), KinaseChemBioBase (Jubilant Biosys Ltd., Bangalore, India) and StARLITe (Inpharmatica Ltd., London, UK), to name a few. ii) Chemical descriptors (Xue & Bajorath, 2000) that capture the structure-activity relationship of the molecules as well as computational techniques (Kitchen, Stahura, & Bajorath, 2004) that are specifically tailored to extract information from these descriptors.
iii) Data pipelining environments that are fast, integrate multiple computational steps, and support large datasets. A combination of all these technologies may be employed to bridge the gap between chemical and biological domains which remains a challenge in the pharmaceutical industry.

Description

Submitted to the faculty of the Chemical Informatics Graduate Program in partial fulfillment of the requirements for the degree Master of Science in the School of Informatics,Indiana University, December 2005

Keywords

Chemogenomics, Databases, Informatics

Extent

1359360 bytes

Rights

Type

Thesis

Permanent Link

https://hdl.handle.net/1805/611
http://dx.doi.org/10.7912/C2/833

Collections

Informatics Graduate Theses and PhD Dissertations
Informatics School Theses and Dissertations

Full item page