CPG Islands
CpG islands
The number of CpG islands in a vector sequence is determined using EMBOSS. The ratio of observed to expected number of GC dinucleotide patterns is calculated over a 100 bp window which is moved along the sequence. The Observed number of CpG patterns in a window is the number of times a C is found followed immediately by a G.
The Expected number of CpG patterns is calculated for each window as the number of CpG dinucleotides that is expected in that window based on the frequency of C's and G's in that window. Thus, the Expected frequency of CpG's in a window is calculated as the number of Cs in the window multiplied by the number of Gs in the window, divided by the window length.
Expected = (number of C's * number of G's) / window length