University of Southern California
Ray R. Irani Hall
Molecular and Computational Biology
Computational Biology Colloquium
Katerina Kechris
Dept. of Biostatistics & Informatics
University of Colorado
"Comparative Genomics for Motif Prediction:
Integrating Multiple Species Sequence and Expression Data"
Abstract:
De novo identification of transcription factor binding sites (TFBS) is a challenging computational problem because TFBS are relatively short sequences buried in long genomic regions. Earlier methods for identifying TFBS motifs incorporated genome-wide expression data and promoter sequences into a linear model framework, regressing values of gene expression onto counts of putative TFBSs in promoters for a single species. More recently, the growing availability of both genomic sequences and expression data from multiple species makes it possible to explore the use of multivariate regression models for TFBS motif prediction. We have developed methods to expand the search space to both sequence and expression information from all available genes across multiple species. Using data from yeast, we show that the multiple-species methods result in an improvement in the prediction of TFBS over the single species method using several evaluation criteria. More generally we show how comparative genomics techniques can improve the performance of regression methods for genome-wide prediction of TFBSs.
Thursday, April 1, 2010
2:00 pm
RRI 101
Host: Jasmine Zhou