Script and description |
Count the number of fasta sequences in a multiple-sequence fasta file:grep ">" mySeqs.fa | wc -l |
Extract one sequence (with ID 'myAcc') from a multiple-sequence fasta file ('multSeqFile'):sed -n '/myAcc/, />/p' multSeqFile | sed '$d' > oneSeqFile |
Sort fields in a comma-delimited file (6th field by text order then 1st field in reverse by numerical order):sort -t, -k 6,6 -k 1,1nr fileToSort |
Print lines that match a pattern ('myPattern'):grep myPattern myFile |
Print lines that don't match a pattern ('myPattern'):grep -v myPattern myFile |
Print line of a tab-delimited file when the 8th field is 10090:awk -F "\t" '$8 == 10090 { print $0 }' myFile |
Print fields 1, 2, 3 from a tab-delimited file where the 4th field contains a '99':awk -F "\t" '$4 ~ /99/ {print $1"\t"$2"\t"$3}' myFile |
Add text ('lcl|') after the ">" to format a fasta file for BLAST indexing:sed 's/>/>lcl|/' mySeqs.fa |
Find all files ending in .pl and copy them to the 'Perl_archive' directory:find . -name \*.pl -exec cp {} Perl_archive/ \; |
Remove HTML tags:sed -e :a -e 's/<[^>]*>//g;/</N;//ba' myFile.html |
Print lines, from 2 lines before to 3 lines after, when a word ("ABC99") is matched:grep -B2 -A3 "ABC99" myFile |
Convert lowercase letters (a, c, t, g) into 'n' using the 'tr' command:tr actg n < softmasked_sequence.fa > hardmasked_sequence.fa |
Remove all version numbers (ex: '.1') from the end of a list of sequence accessionssed 's/\.[0-9]\+//g' accsWithVersion > accsOnly |
|