MEME version 3.0 (Release date: 2004/07/26 08:17:15)
For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.sdsc.edu.
This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.sdsc.edu.
If you use this program in your research, please cite:
Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994.
DATAFILE= YDR026c_YPD.fsa ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ iYNR011C 1.0000 477 iYEL055C 1.0000 1072 iYBR035C 1.0000 639 iYGR093W 1.0000 180 iYDL086W 1.0000 988 iYBR229C 1.0000 303 iYBR179C 1.0000 632 iYLR458W 1.0000 497 iYFL006W 1.0000 378 iYIL003W 1.0000 514 itF(GAA)N 1.0000 353 iYDR498C 1.0000 808 iYGL152C 1.0000 167 iYDR219C 1.0000 168 iYDR047W 1.0000 183
This information can also be useful in the event you wish to report a problem with the MEME software. command: meme YDR026c_YPD.fsa -dna -nmotifs 5 -minw 7 -maxw 11 -revcomp -minsites 20 model: mod= zoops nmotifs= 5 evt= inf object function= E-value of product of p-values width: minw= 7 maxw= 11 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 15 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 em: prior= dirichlet b= 0.01 maxiter= 50 distance= 1e-05 data: n= 7359 N= 15 strands: + - sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.322 C 0.178 G 0.178 T 0.322 Background letter frequencies (from dataset with add-one prior applied): A 0.322 C 0.178 G 0.178 T 0.322
BL MOTIF 1 width=11 seqs=13 iYIL003W ( 153) TTTACCCGGCC 1 iYDL086W ( 599) TTTACCCGGCC 1 iYEL055C ( 125) TTTACCCGGCC 1 iYDR498C ( 122) TTTACCCGGAC 1 iYBR179C ( 182) TTTACCCGGAC 1 iYBR229C ( 104) TTTACCCGGAC 1 iYNR011C ( 98) GTTACCCGGAC 1 itF(GAA)N ( 145) TTTACCCGGAA 1 iYLR458W ( 379) TTTACCCGGAA 1 iYBR035C ( 127) TTTACCCGGCG 1 iYGL152C ( 9) GTTACCCGGAA 1 iYFL006W ( 117) ATTACCCGGCA 1 iYGR093W ( 66) TTTACCCGGTT 1 //
log-odds matrix: alength= 4 w= 11 n= 7209 bayes= 10.2784 E= 1.8e-015 -206 -1035 -21 126 -1035 -1035 -1035 163 -1035 -1035 -1035 163 163 -1035 -1035 -1035 -1035 249 -1035 -1035 -1035 249 -1035 -1035 -1035 249 -1035 -1035 -1035 -1035 249 -1035 -1035 -1035 249 -1035 74 111 -1035 -206 -7 160 -121 -206
letter-probability matrix: alength= 4 w= 11 nsites= 13 E= 1.8e-015 0.076923 0.000000 0.153846 0.769231 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.538462 0.384615 0.000000 0.076923 0.307692 0.538462 0.076923 0.076923
Time 7.49 secs.
BL MOTIF 2 width=11 seqs=14 itF(GAA)N ( 175) CATTTTTTTTT 1 iYBR179C ( 432) CATTTTTTTTT 1 iYDL086W ( 778) CATTTTTTTTT 1 iYDR498C ( 514) CTTTTTTTTTT 1 iYFL006W ( 308) CTTTTTTTTTT 1 iYBR035C ( 494) CTTTTTTTTTT 1 iYEL055C ( 854) CTTTTTTTTTT 1 iYIL003W ( 389) CATTTTCTTTT 1 iYBR229C ( 138) CATTCTTTTTT 1 iYGL152C ( 20) CATTTGCTTTT 1 iYDR047W ( 146) TTTTTTTTTTT 1 iYNR011C ( 364) TTTTTTTTTTT 1 iYLR458W ( 210) CATTTGTTTTC 1 iYDR219C ( 80) CAATTTGTTTT 1 //
log-odds matrix: alength= 4 w= 11 n= 7209 bayes= 9.61211 E= 2.9e-001 -1045 227 -1045 -117 83 -1045 -1045 41 -217 -1045 -1045 153 -1045 -1045 -1045 163 -1045 -132 -1045 153 -1045 -1045 -32 141 -1045 -32 -132 129 -1045 -1045 -1045 163 -1045 -1045 -1045 163 -1045 -1045 -1045 163 -1045 -132 -1045 153
letter-probability matrix: alength= 4 w= 11 nsites= 14 E= 2.9e-001 0.000000 0.857143 0.000000 0.142857 0.571429 0.000000 0.000000 0.428571 0.071429 0.000000 0.000000 0.928571 0.000000 0.000000 0.000000 1.000000 0.000000 0.071429 0.000000 0.928571 0.000000 0.000000 0.142857 0.857143 0.000000 0.142857 0.071429 0.785714 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.071429 0.000000 0.928571
Time 17.68 secs.
BL MOTIF 3 width=7 seqs=2 iYGR093W ( 91) CAGCGCG 1 iYBR035C ( 524) CAGCGCG 1 //
log-odds matrix: alength= 4 w= 7 n= 7269 bayes= 10.0607 E= 5.0e+005 -765 248 -765 -765 163 -765 -765 -765 -765 -765 248 -765 -765 248 -765 -765 -765 -765 248 -765 -765 248 -765 -765 -765 -765 248 -765
letter-probability matrix: alength= 4 w= 7 nsites= 2 E= 5.0e+005 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000
Time 27.77 secs.
BL MOTIF 4 width=10 seqs=2 iYBR179C ( 401) GCCAACAGGC 1 iYDL086W ( 815) GCCTACAGGC 1 //
log-odds matrix: alength= 4 w= 10 n= 7224 bayes= 10.9699 E= 6.9e+005 -765 -765 248 -765 -765 248 -765 -765 -765 248 -765 -765 63 -765 -765 63 163 -765 -765 -765 -765 248 -765 -765 163 -765 -765 -765 -765 -765 248 -765 -765 -765 248 -765 -765 248 -765 -765
letter-probability matrix: alength= 4 w= 10 nsites= 2 E= 6.9e+005 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.500000 0.000000 0.000000 0.500000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000
Time 38.06 secs.
BL MOTIF 5 width=11 seqs=8 iYIL003W ( 4) TGTGTGTGTGT 1 iYBR229C ( 269) TGTGAGTGTGT 1 iYDR498C ( 337) TGTGTGTGTAT 1 iYBR035C ( 233) TGTGTATGTGT 1 itF(GAA)N ( 20) TGTGGATGTGT 1 iYGR093W ( 143) TCTGTGTGTGC 1 iYNR011C ( 269) TGTGAGCGTAT 1 iYEL055C ( 280) TGCGTATGTAT 1 //
log-odds matrix: alength= 4 w= 11 n= 7209 bayes= 9.81398 E= 9.6e+005 -965 -965 -965 163 -965 -51 230 -965 -965 -51 -965 144 -965 -965 249 -965 -36 -965 -51 96 22 -965 181 -965 -965 -51 -965 144 -965 -965 249 -965 -965 -965 -965 163 22 -965 181 -965 -965 -51 -965 144
letter-probability matrix: alength= 4 w= 11 nsites= 8 E= 9.6e+005 0.000000 0.000000 0.000000 1.000000 0.000000 0.125000 0.875000 0.000000 0.000000 0.125000 0.000000 0.875000 0.000000 0.000000 1.000000 0.000000 0.250000 0.000000 0.125000 0.625000 0.375000 0.000000 0.625000 0.000000 0.000000 0.125000 0.000000 0.875000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.375000 0.000000 0.625000 0.000000 0.000000 0.125000 0.000000 0.875000
Time 48.52 secs.
CPU: ncc007
MOTIFS
For each motif that it discovers in the training set, MEME prints the following information:
J. Kyte and R. Doolittle, 1982. "A Simple Method for Displaying the Hydropathic Character of a Protein", J. Mol Biol. 157, 105-132.
Summing the information content for each position in the motif gives the total information content of the motif (shown in parentheses to the left of the diagram). The total information content is approximately equal to the log likelihood ratio divided by the number of occurrences times ln(2). The total information content gives a measure of the usefulness of the motif for database searches. For a motif to be useful for database searches, it must as a rule contain at least log_2(N) bits of information where N is the number of sequences in the database being searched. For example, to effectively search a database containing 100,000 sequences for occurrences of a single motif, the motif should have an IC of at least 16.6 bits. Motifs with lower information content are still useful when a family of sequences shares more than one motif since they can be combined in multiple motif searches (using MAST).
Multilevel TTATGTGAACGACGTCACACT consensus AA T A G A GA AA sequence T C TT T
You can convert these blocks to PSSMs (position-specific scoring matrices), LOGOS (color representations of the motifs), phylogeny trees and search them against a database of other blocks by pasting everything from the "BL" line to the "//" line (inclusive) into the Multiple Alignment Processor. If you include the -print_fasta switch on the command line, MEME prints the motif sites in FASTA format instead of BLOCKS format.
Note: Earlier versions of MEME gave the posterior probabilities--the probability after applying a prior on letter frequencies--rather than the observed frequencies. These versions of MEME also gave the number of possible positions for the motif rather than the actual number of occurrences. The output from these earlier versions of MEME can be distinguished by "n=" rather than "nsites=" in the line preceding the matrix.