MEME version 3.0 (Release date: 2004/07/26 08:17:15)
For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.sdsc.edu.
This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.sdsc.edu.
If you use this program in your research, please cite:
Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994.
DATAFILE= GCN4_YPD.fsa ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ iYJR109C 1.0000 688 iYDR126W 1.0000 246 iYHR018C 1.0000 500 iYER052C 1.0000 787 iYOL141W 1.0000 224 iYJL089W 1.0000 102 iYOL059W 1.0000 784 iYER055C 1.0000 753 iYMR062C 1.0000 711 iYLR355C 1.0000 1056 iYKL016C 1.0000 576 iYEL063C 1.0000 940 iYOR336W 1.0000 482 iYER068W 1.0000 597 iYOR107W 1.0000 757 iYCL030C 1.0000 290 iYHR070W 1.0000 544 iYDL198C 1.0000 354 iYDL171C 1.0000 700 iYNL005C 1.0000 924 iYDR481C 1.0000 184 iYGL184C 1.0000 527 iYOL064C 1.0000 218 iYOR301W 1.0000 467 iYOR130C 1.0000 1161 iYBR115C 1.0000 238 iYJL200C 1.0000 302 iYLL005C 1.0000 543 itT(AGU)J 1.0000 335 iYBR043C 1.0000 398 iYDR084C 1.0000 265 iYOL154W 1.0000 1421 iYDR235W 1.0000 109 iYBR249C 1.0000 1066 iYPR110C 1.0000 481 iYER072W 1.0000 843 iYHR161C 1.0000 775 iYBR113W 1.0000 1212 iYJL072C 1.0000 258 iYBL076C 1.0000 211 iYOL119C 1.0000 260 iYJR111C 1.0000 340 iYGR267C 1.0000 327 iYGR271W 1.0000 360 iYJLWdelta9 1.0000 525 iYER089C 1.0000 619 iYNL104C 1.0000 999 iYJR016C 1.0000 311 iYOR221C 1.0000 506 iYPL274W 1.0000 390 iYEL062W 1.0000 288 iYBR248C 1.0000 363 iYDR341C 1.0000 592 iYNR050C 1.0000 1361 iYDR125C 1.0000 478 iYBR144C 1.0000 227 itM(CAU)P 1.0000 1037 iYDR034C 1.0000 888 iYGL126W 1.0000 399
This information can also be useful in the event you wish to report a problem with the MEME software. command: meme GCN4_YPD.fsa -dna -nmotifs 5 -minw 7 -maxw 11 -revcomp model: mod= zoops nmotifs= 5 evt= inf object function= E-value of product of p-values width: minw= 7 maxw= 11 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 59 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 em: prior= dirichlet b= 0.01 maxiter= 50 distance= 1e-05 data: n= 33299 N= 59 strands: + - sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.322 C 0.178 G 0.178 T 0.322 Background letter frequencies (from dataset with add-one prior applied): A 0.322 C 0.178 G 0.178 T 0.322
BL MOTIF 1 width=11 seqs=59 iYNR050C ( 1295) TTTTTTTTTTC 1 iYOR221C ( 275) TTTTTTTTTTC 1 iYJR016C ( 229) TTTTTCTTTTC 1 iYER089C ( 119) TTTTTTTTTTC 1 iYGR267C ( 258) TTTTTCTTTTC 1 iYJL072C ( 126) TTTTTTTTCTC 1 iYBR113W ( 1030) TTTTTTTTTTC 1 iYER072W ( 545) TTTTTTTTTTC 1 iYJL200C ( 241) TTTTTTTTTTC 1 iYOR301W ( 353) TTTTTTTTTTC 1 iYOR107W ( 533) TTTTTTTTTTC 1 iYER068W ( 434) TTTTTTTTTTC 1 iYOR336W ( 284) TTTTTTTTTTC 1 iYER055C ( 478) TTTTTTTTTTC 1 iYER052C ( 234) TTTTTTTTTTC 1 iYHR018C ( 141) TTTTTTTTTTC 1 iYGL126W ( 230) CTTTTTTTTTC 1 iYDR034C ( 865) TTTTTCTTCTT 1 iYDR341C ( 153) TTTTTCTTCTT 1 iYEL062W ( 32) TTTTTTTTTTT 1 iYNL104C ( 64) TTTTTTTTCTT 1 iYJR111C ( 299) TTTTTTTTTTT 1 iYBL076C ( 130) TTTTTTTTTTT 1 iYHR161C ( 514) TTTTTCTTCTT 1 iYBR249C ( 834) TTTTTCTTTTT 1 iYDL198C ( 206) CTTTTTTTTTC 1 iYEL063C ( 25) TTTTTTTTTTT 1 iYJR109C ( 188) CTTTTTTTTTC 1 iYOL154W ( 237) TTCTTCTTCTC 1 iYPL274W ( 176) TTTTTTTTCAC 1 iYOL141W ( 122) TTTTTTTTCAC 1 iYDR126W ( 29) TTTTTTTTCAC 1 iYPR110C ( 62) TTTTTTCTTTC 1 iYLL005C ( 460) TTCTTTTTCTT 1 iYDL171C ( 414) TTCTTTTTTTT 1 iYHR070W ( 140) TTCTTCTTCTT 1 iYDR125C ( 190) TTTTATTTTTC 1 iYBR248C ( 209) TTTTTTTTTGC 1 iYLR355C ( 204) TTTTTTTTTGC 1 iYBR115C ( 57) TTTTTTTTTAT 1 iYNL005C ( 549) CTTTTCTTCAC 1 iYCL030C ( 15) TTTTTTTTCTG 1 iYGR271W ( 102) TTTTTTTACTC 1 iYOL059W ( 135) CTCTTTTTCTT 1 iYOR130C ( 1092) TTTTATTTTTT 1 iYDR481C ( 125) CTTTACTTTTC 1 iYKL016C ( 193) TTTTACTTTTT 1 iYOL064C ( 101) TTCTTCTTTGC 1 iYOL119C ( 108) TTTTCCTTCTC 1 itM(CAU)P ( 935) TTTCTCTTCTC 1 iYMR062C ( 496) CTTTTCTTTAT 1 iYGL184C ( 41) TTTTTGTTTTT 1 iYDR235W ( 70) TTTTTCTTAAC 1 iYBR043C ( 298) TTTTACCTCTT 1 itT(AGU)J ( 1) TTCTATTTTAT 1 iYDR084C ( 67) CTTTTCTACAT 1 iYJLWdelta9 ( 466) ATTTTCCTCTT 1 iYJL089W ( 45) CTCTTCCTCTG 1 iYBR144C ( 101) CTCTTTTATAT 1 //
log-odds matrix: alength= 4 w= 11 n= 32709 bayes= 10.3757 E= 1.7e-020 -424 -7 -1253 134 -1253 -1253 -1253 164 -1253 -22 -1253 140 -1253 -339 -1253 161 -166 -339 -1253 145 -1253 93 -339 100 -1253 -139 -1253 153 -266 -1253 -1253 156 -424 100 -1253 96 -92 -1253 -181 128 -1253 169 -239 28
letter-probability matrix: alength= 4 w= 11 nsites= 59 E= 1.7e-020 0.016949 0.169492 0.000000 0.813559 0.000000 0.000000 0.000000 1.000000 0.000000 0.152542 0.000000 0.847458 0.000000 0.016949 0.000000 0.983051 0.101695 0.016949 0.000000 0.881356 0.000000 0.338983 0.016949 0.644068 0.000000 0.067797 0.000000 0.932203 0.050847 0.000000 0.000000 0.949153 0.016949 0.355932 0.000000 0.627119 0.169492 0.000000 0.050847 0.779661 0.000000 0.576271 0.033898 0.389831
Time 159.41 secs.
BL MOTIF 2 width=11 seqs=58 iYEL063C ( 363) GGGTGAGTCAT 1 iYER068W ( 375) GACTGAGTCAT 1 iYBR113W ( 487) GGCTGAGTCAC 1 iYHR161C ( 407) GGCTGAGTCAC 1 iYGL184C ( 386) GGCTGACTCAT 1 iYOL064C ( 148) GTCTGAGTCAT 1 iYDL171C ( 571) GTCTGAGTCAT 1 iYOR221C ( 341) CACTGAGTCAT 1 iYDR126W ( 7) GGATGAGTCAT 1 iYJR109C ( 345) GGATGAGTCAT 1 iYJR016C ( 130) TACTGAGTCAT 1 iYGR267C ( 47) GCGTGACTCAT 1 iYMR062C ( 148) CAGTGAGTCAT 1 iYER089C ( 451) TGCTGACTCAT 1 iYBR043C ( 346) GAATGAGTCAT 1 iYER052C ( 289) AGGTGAGTCAT 1 iYNL005C ( 612) TGGTGAGTCAC 1 iYLR355C ( 298) GGATGAGTCAC 1 iYHR018C ( 183) AAGTGAGTCAT 1 iYOR130C ( 227) AGGTGAGTCAC 1 iYOR107W ( 489) GAATGACTCAT 1 iYOL059W ( 422) TAGTGACTCAT 1 iYDR341C ( 195) AACTGACTCAT 1 iYNL104C ( 95) AACTGAGTCAC 1 iYKL016C ( 331) GATTGAGTCAT 1 iYGL126W ( 1) TCCTGAGTCAT 1 iYDL198C ( 137) CAGTGACTCAC 1 iYCL030C ( 202) CAGTGACTCAC 1 iYER055C ( 192) AAGTGAGTCAC 1 iYOL141W ( 43) AAGTGAGTCAC 1 itM(CAU)P ( 580) CCCTGACTCAT 1 iYJL089W ( 82) TCGTGACTCAT 1 iYER072W ( 599) TTCTGACTCAT 1 iYOR336W ( 206) TAATGAGTCAT 1 iYOL119C ( 198) GGCTGACTAAT 1 iYBR249C ( 318) CGTTGAGTCAT 1 iYOR301W ( 131) GTTTGACTCAT 1 iYBR115C ( 165) CATTGAGTCAC 1 iYDR034C ( 247) AATTGAGTCAC 1 iYJL200C ( 226) TCTTGAGTCAT 1 iYJR111C ( 160) GTGTGACTAAT 1 iYBR248C ( 112) GTCTGACTCTT 1 iYNR050C ( 211) GTATGAGTAAT 1 iYBR144C ( 147) TACTGTCTCAC 1 iYPR110C ( 245) TTCTGACTAAT 1 iYOL154W ( 897) CACTGACTGAC 1 iYJL072C ( 138) TGATGACTAAC 1 iYPL274W ( 237) AAATGACTAAT 1 iYHR070W ( 38) AGATGACTAAC 1 iYGR271W ( 1) TTGTGTCTCAC 1 iYBL076C ( 87) TGATGACTCTT 1 iYLL005C ( 108) ATTTGTGTCAT 1 iYEL062W ( 201) AACTGAGTATT 1 itT(AGU)J ( 84) GTGAGAGTAAT 1 iYDR235W ( 92) GCGGGAGTGAT 1 iYJLWdelta9 ( 224) CCGTAACTCAT 1 iYDR125C ( 372) GGTAGTGTCAT 1 iYDR084C ( 229) GAAGGCGTCAT 1 //
log-odds matrix: alength= 4 w= 11 n= 32709 bayes= 9.77853 E= 1.1e-014 -64 -20 115 -41 30 -56 72 -76 -64 95 80 -122 -322 -1250 -237 153 -422 -1250 246 -1250 151 -337 -1250 -222 -1250 121 172 -1250 -1250 -1250 -1250 164 -105 218 -237 -1250 156 -1250 -1250 -264 -1250 72 -1250 114
letter-probability matrix: alength= 4 w= 11 nsites= 58 E= 1.1e-014 0.206897 0.155172 0.396552 0.241379 0.396552 0.120690 0.293103 0.189655 0.206897 0.344828 0.310345 0.137931 0.034483 0.000000 0.034483 0.931034 0.017241 0.000000 0.982759 0.000000 0.913793 0.017241 0.000000 0.068966 0.000000 0.413793 0.586207 0.000000 0.000000 0.000000 0.000000 1.000000 0.155172 0.810345 0.034483 0.000000 0.948276 0.000000 0.000000 0.051724 0.000000 0.293103 0.000000 0.706897
Time 315.57 secs.
BL MOTIF 3 width=11 seqs=8 iYHR161C ( 280) GCGAGCGGCGG 1 iYER052C ( 491) GCGAGCGGCTG 1 iYOR301W ( 69) GCGAGGGGCCG 1 iYHR070W ( 67) GCGCGGGGCTG 1 itM(CAU)P ( 835) GCGAGCGGCGA 1 iYPL274W ( 88) GCGAGCGCCAG 1 iYOR107W ( 452) GCGCGCGGCAC 1 iYBL076C ( 115) GCGTGCTGCGG 1 //
log-odds matrix: alength= 4 w= 11 n= 32709 bayes= 11.997 E= 8.8e+004 -965 -965 249 -965 -965 249 -965 -965 -965 -965 249 -965 96 49 -965 -136 -965 -965 249 -965 -965 207 49 -965 -965 -965 229 -136 -965 -51 229 -965 -965 249 -965 -965 -36 -51 107 -36 -136 -51 207 -965
letter-probability matrix: alength= 4 w= 11 nsites= 8 E= 8.8e+004 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.625000 0.250000 0.000000 0.125000 0.000000 0.000000 1.000000 0.000000 0.000000 0.750000 0.250000 0.000000 0.000000 0.000000 0.875000 0.125000 0.000000 0.125000 0.875000 0.000000 0.000000 1.000000 0.000000 0.000000 0.250000 0.125000 0.375000 0.250000 0.125000 0.125000 0.750000 0.000000
Time 465.48 secs.
BL MOTIF 4 width=11 seqs=11 iYHR161C ( 75) CCTTCCTCCCC 1 iYER089C ( 209) CCCTCCTCCCC 1 iYGL126W ( 296) CCTTGCTCCCC 1 iYOL059W ( 18) CCCCCCTCCCC 1 iYOR336W ( 195) CCGTGCTCCCC 1 iYNR050C ( 1188) CCTTCCGCGCC 1 iYHR018C ( 428) CGTTCCTCCCT 1 iYJR016C ( 248) CCGTCCTCGCT 1 iYBR113W ( 831) CCCTCTTCCCT 1 iYJLWdelta9 ( 318) CTTTCCGCCCT 1 iYDL171C ( 172) CTTTCTTCCCC 1 //
log-odds matrix: alength= 4 w= 11 n= 32709 bayes= 13.0712 E= 3.0e+006 -1010 249 -1010 -1010 -1010 203 -97 -82 -1010 61 3 76 -1010 -97 -1010 150 -1010 220 3 -1010 -1010 220 -1010 -82 -1010 -1010 3 135 -1010 249 -1010 -1010 -1010 220 3 -1010 -1010 249 -1010 -1010 -1010 183 -1010 18
letter-probability matrix: alength= 4 w= 11 nsites= 11 E= 3.0e+006 0.000000 1.000000 0.000000 0.000000 0.000000 0.727273 0.090909 0.181818 0.000000 0.272727 0.181818 0.545455 0.000000 0.090909 0.000000 0.909091 0.000000 0.818182 0.181818 0.000000 0.000000 0.818182 0.000000 0.181818 0.000000 0.000000 0.181818 0.818182 0.000000 1.000000 0.000000 0.000000 0.000000 0.818182 0.181818 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.636364 0.000000 0.363636
Time 619.53 secs.
BL MOTIF 5 width=11 seqs=6 itM(CAU)P ( 667) GACGGTGCGGC 1 iYPL274W ( 113) GGCGGTGCGGC 1 iYOL154W ( 661) GCCAGTGCGGC 1 iYOR130C ( 169) GACGCTGCGGC 1 iYBR249C ( 361) GCGGGTGGGGC 1 iYGR267C ( 296) CCCGGTACGGC 1 //
log-odds matrix: alength= 4 w= 11 n= 32709 bayes= 13.5118 E= 5.6e+005 -923 -10 222 -923 5 149 -10 -923 -923 222 -10 -923 -95 -923 222 -923 -923 -10 222 -923 -923 -923 -923 163 -95 -923 222 -923 -923 222 -10 -923 -923 -923 249 -923 -923 -923 249 -923 -923 249 -923 -923
letter-probability matrix: alength= 4 w= 11 nsites= 6 E= 5.6e+005 0.000000 0.166667 0.833333 0.000000 0.333333 0.500000 0.166667 0.000000 0.000000 0.833333 0.166667 0.000000 0.166667 0.000000 0.833333 0.000000 0.000000 0.166667 0.833333 0.000000 0.000000 0.000000 0.000000 1.000000 0.166667 0.000000 0.833333 0.000000 0.000000 0.833333 0.166667 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000
Time 770.53 secs.
CPU: ncc007
MOTIFS
For each motif that it discovers in the training set, MEME prints the following information:
J. Kyte and R. Doolittle, 1982. "A Simple Method for Displaying the Hydropathic Character of a Protein", J. Mol Biol. 157, 105-132.
Summing the information content for each position in the motif gives the total information content of the motif (shown in parentheses to the left of the diagram). The total information content is approximately equal to the log likelihood ratio divided by the number of occurrences times ln(2). The total information content gives a measure of the usefulness of the motif for database searches. For a motif to be useful for database searches, it must as a rule contain at least log_2(N) bits of information where N is the number of sequences in the database being searched. For example, to effectively search a database containing 100,000 sequences for occurrences of a single motif, the motif should have an IC of at least 16.6 bits. Motifs with lower information content are still useful when a family of sequences shares more than one motif since they can be combined in multiple motif searches (using MAST).
Multilevel TTATGTGAACGACGTCACACT consensus AA T A G A GA AA sequence T C TT T
You can convert these blocks to PSSMs (position-specific scoring matrices), LOGOS (color representations of the motifs), phylogeny trees and search them against a database of other blocks by pasting everything from the "BL" line to the "//" line (inclusive) into the Multiple Alignment Processor. If you include the -print_fasta switch on the command line, MEME prints the motif sites in FASTA format instead of BLOCKS format.
Note: Earlier versions of MEME gave the posterior probabilities--the probability after applying a prior on letter frequencies--rather than the observed frequencies. These versions of MEME also gave the number of possible positions for the motif rather than the actual number of occurrences. The output from these earlier versions of MEME can be distinguished by "n=" rather than "nsites=" in the line preceding the matrix.