Identifying Sequence Features that Contribute to the Risk of Aberrant DNA Methylation in Cancer

Paula Vertino
Winship Cancer Institute; Emory University School of Medicine

Gene silencing associated with the aberrant methylation of promoter region CpG islands is an acquired epigenetic mechanism that serves as an alternative to genetic alterations in the inactivation of tumor suppressors and other genes in human cancers. There remains relatively little known about how or why particular genes succumb to this aberrant event. Recently we showed that, on a genome-wide level, there are intrinsic differences in the susceptibility of various CpG island loci to aberrant methylation. Using DNA pattern recognition coupled with a supervised learning approach we showed that this susceptibility could be predicted based on underlying sequence context. These data suggest that there are local (ie. cis-acting) factors that contribute to the risk of aberrant methylation. Here we used motif elicitation coupled with classification techniques to identify DNA sequence motifs that selectively define methylation-prone or methylation-resistant CpG islands. Motifs common to twenty-eight methylation-prone or forty-seven methylation-resistant CpG island-containing genomic fragments were determined using the MEME and MAST algorithms (http://meme.sdsc.edu). The five most discriminatory motifs derived from methylation-prone sequences were found to be associated with CpG islands in general and were non-randomly distributed throughout the genome. In contrast, the eight most discriminatory motifs derived from the methylation-resistant CpG islands were randomly distributed throughout the genome. Interestingly, this latter group tended to associate with Alu and other repetitive sequences. The frequency of occurrence of these motifs successfully discriminated methylation-prone and methylation-resistant CpG island groups with an accuracy of 87% after ten-fold cross validation, and 80% in blind predictions. A further analysis of the distribution of interspersed and simple tandem DNA repeats in CpG islands of different methylation potentials showed that most classes of repeat elements did not significantly differ in frequency between methylation-prone and methylation-resistant sequences, except for Alu sequences which showed a statistically significant enrichment in the methylation-resistant group. The preferential association of Alu elements with the methylation-resistant sequences could be attributed primarily to the evolutionarily older AluS and AluJ families whereas the younger and more mobile AluY family showed little difference in frequency between the two groups. The preferential association of Alu sequences with methylation-resistant sequences may reflect a selection against retrotransposition into methylation-prone regions during the Alu expansion or differential rates of Alu loss through inter-Alu recombination events. The further refinement of DNA feature based models may ultimately allow the identification of novel targets of aberrant methylation. For example, application of one such classifier to the genomic sequences of chromosomes 21 and 22 predicts a subset of putative methylation-prone CpG islands, some of which are known targets of aberrant methylation in human cancers.