0.01 39414 1832 SCCM/E, P-value 0.001 17031 479 SCCM/E, P-value 0.05, fraction 0.309 0.024 SCCM/E, P-value 0.01, fraction 0.166 0.008 SCCM/E, P-value 0.001, fraction 0.072 0.The total number of CpGs in the study is 237,244.Medvedeva et al. BMC Genomics 2013, 15:119 http://www.biomedcentral.com/1471-2164/15/Page 5 ofTable 2 Fraction of cytosines demonstrating rstb.2013.0181 different SCCM/E within genome regionsCGI CpG “traffic lights” SCCM/E > 0 SCCM/E MedChemExpress GSK-J4 insignificant 0.801 0.674 0.794 Gene promoters 0.793 0.556 0.733 Gene bodies 0.507 0.606 0.477 Repetitive elements 0.095 0.095 0.128 Conserved regions 0.203 0.210 0.198 SNP 0.008 0.009 0.010 DNase sensitivity regions 0.926 0.829 0.a significant overrepresentation of CpG “traffic lights” within the GSK429286A biological activity predicted TFBSs. Similar results were obtained using only the 36 normal cell lines: 35 TFs had a significant underrepresentation of CpG “traffic lights” within their predicted TFBSs (P-value < 0.05, Chi-square test, Bonferoni correction) and no TFs had a significant overrepresentation of such positions within TFBSs (Additional file 3). Figure 2 shows the distribution of the observed-to-expected ratio of TFBS overlapping with CpG "traffic lights". It is worth noting that the distribution is clearly bimodal with one mode around 0.45 (corresponding to TFs with more than double underrepresentation of CpG "traffic lights" in their binding sites) and another mode around 0.7 (corresponding to TFs with only 30 underrepresentation of CpG "traffic lights" in their binding sites). We speculate that for the first group of TFBSs, overlapping with CpG "traffic lights" is much more disruptive than for the second one, although the mechanism behind this division is not clear. To ensure that the results were not caused by a novel method of TFBS prediction (i.e., due to the use of RDM),we performed the same analysis using the standard PWM approach. The results presented in Figure 2 and in Additional file 4 show that although the PWM-based method generated many more TFBS predictions as compared to RDM, the CpG "traffic lights" were significantly underrepresented in the TFBSs in 270 out of 279 TFs studied here (having at least one CpG "traffic light" within TFBSs as predicted by PWM), supporting our major finding. We also analyzed if cytosines with significant positive SCCM/E demonstrated similar underrepresentation within TFBS. Indeed, among the tested TFs, almost all were depleted of such cytosines (Additional file 2), but only 17 of them were significantly over-represented due to the overall low number of cytosines with significant positive SCCM/E. Results obtained using only the 36 normal cell lines were similar: 11 TFs were significantly depleted of such cytosines (Additional file 3), while most of the others were also depleted, yet insignificantly due to the low rstb.2013.0181 number of total predictions. Analysis based on PWM models (Additional file 4) showed significant underrepresentation of suchFigure 2 Distribution of the observed number of CpG “traffic lights” to their expected number overlapping with TFBSs of various TFs. The expected number was calculated based on the overall fraction of significant (P-value < 0.01) CpG "traffic lights" among all cytosines analyzed in the experiment.Medvedeva et al. BMC Genomics 2013, 15:119 http://www.biomedcentral.com/1471-2164/15/Page 6 ofcytosines for 229 TFs and overrepresentation for 7 (DLX3, GATA6, NR1I2, OTX2, SOX2, SOX5, SOX17). Interestingly, these 7 TFs all have highly AT-rich bindi.0.01 39414 1832 SCCM/E, P-value 0.001 17031 479 SCCM/E, P-value 0.05, fraction 0.309 0.024 SCCM/E, P-value 0.01, fraction 0.166 0.008 SCCM/E, P-value 0.001, fraction 0.072 0.The total number of CpGs in the study is 237,244.Medvedeva et al. BMC Genomics 2013, 15:119 http://www.biomedcentral.com/1471-2164/15/Page 5 ofTable 2 Fraction of cytosines demonstrating rstb.2013.0181 different SCCM/E within genome regionsCGI CpG “traffic lights” SCCM/E > 0 SCCM/E insignificant 0.801 0.674 0.794 Gene promoters 0.793 0.556 0.733 Gene bodies 0.507 0.606 0.477 Repetitive elements 0.095 0.095 0.128 Conserved regions 0.203 0.210 0.198 SNP 0.008 0.009 0.010 DNase sensitivity regions 0.926 0.829 0.a significant overrepresentation of CpG “traffic lights” within the predicted TFBSs. Similar results were obtained using only the 36 normal cell lines: 35 TFs had a significant underrepresentation of CpG “traffic lights” within their predicted TFBSs (P-value < 0.05, Chi-square test, Bonferoni correction) and no TFs had a significant overrepresentation of such positions within TFBSs (Additional file 3). Figure 2 shows the distribution of the observed-to-expected ratio of TFBS overlapping with CpG "traffic lights". It is worth noting that the distribution is clearly bimodal with one mode around 0.45 (corresponding to TFs with more than double underrepresentation of CpG "traffic lights" in their binding sites) and another mode around 0.7 (corresponding to TFs with only 30 underrepresentation of CpG "traffic lights" in their binding sites). We speculate that for the first group of TFBSs, overlapping with CpG "traffic lights" is much more disruptive than for the second one, although the mechanism behind this division is not clear. To ensure that the results were not caused by a novel method of TFBS prediction (i.e., due to the use of RDM),we performed the same analysis using the standard PWM approach. The results presented in Figure 2 and in Additional file 4 show that although the PWM-based method generated many more TFBS predictions as compared to RDM, the CpG "traffic lights" were significantly underrepresented in the TFBSs in 270 out of 279 TFs studied here (having at least one CpG "traffic light" within TFBSs as predicted by PWM), supporting our major finding. We also analyzed if cytosines with significant positive SCCM/E demonstrated similar underrepresentation within TFBS. Indeed, among the tested TFs, almost all were depleted of such cytosines (Additional file 2), but only 17 of them were significantly over-represented due to the overall low number of cytosines with significant positive SCCM/E. Results obtained using only the 36 normal cell lines were similar: 11 TFs were significantly depleted of such cytosines (Additional file 3), while most of the others were also depleted, yet insignificantly due to the low rstb.2013.0181 number of total predictions. Analysis based on PWM models (Additional file 4) showed significant underrepresentation of suchFigure 2 Distribution of the observed number of CpG “traffic lights” to their expected number overlapping with TFBSs of various TFs. The expected number was calculated based on the overall fraction of significant (P-value < 0.01) CpG "traffic lights" among all cytosines analyzed in the experiment.Medvedeva et al. BMC Genomics 2013, 15:119 http://www.biomedcentral.com/1471-2164/15/Page 6 ofcytosines for 229 TFs and overrepresentation for 7 (DLX3, GATA6, NR1I2, OTX2, SOX2, SOX5, SOX17). Interestingly, these 7 TFs all have highly AT-rich bindi.