WIBR BaRC Script library

BaRC > Bioinformatics > Script library

Script name	Description	Sample input	Sample output	Download
hey.pl	Test Perl on your system			download
rev_comp.pl	Reverse and complement a fasta sequence using EMBOSS's 'revseq' command			download
oligos.pl	Extract oligos from a sequence and analyze them			download
patscan_batch.pl	Run patscan (to search for a pattern) on every sequence in a directory			download
puzzle_helper.html	Web-based interface for the puzzle.cgi script			NA
parse_genbank.pl	Simple GenBank nucleotide report parser using regular expressions	input	output	download
get_web_data.pl	Use LWP to automate web file access	input	output	download
draw_figure.pl	Draw a PNG figure using the GD module	input	output	download
fastaToGenbank_2.pl	Sequence conversion with BioPerl	input	output	download
iterate_seqs.pl	Split a file of multiple sequences into separate files and modify the format			download
genbank_parse.pl	Parse GenBank sequence features with BioPerl	input	output	download
manipulate_seq.pl	Manipulate a sequence with BioPerl	input	output	download
blast_parse_0.pl	Parse BLAST output files with BioPerl's SearchIO	input	output	download
blat_sort_output.pl	Sort BLAT output to select only the best hit(s) for each query sequence	input	output	download
merge_blat_output.pl	Merge lines of BLAT output to one line for each query sequence	input	output	download
alignPairs.pl	Align a list of pairs of sequences using different algorithms	input	outputs 1 2 3	download
get_Excel_file_info_by_dir.pl	Extract data from a set of Excel files in a directory	input	output	download

Script and description

Count the number of fasta sequences in a multiple-sequence fasta file:

grep ">" mySeqs.fa | wc -l

Extract one sequence (with ID 'myAcc') from a multiple-sequence fasta file ('multSeqFile'):

sed -n '/myAcc/, />/p' multSeqFile | sed '$d' > oneSeqFile

Sort fields in a comma-delimited file (6th field by text order then 1st field in reverse by numerical order):

sort -t, -k 6,6 -k 1,1nr fileToSort

Print lines that match a pattern ('myPattern'):

grep myPattern myFile

Print lines that don't match a pattern ('myPattern'):

grep -v myPattern myFile

Print line of a tab-delimited file when the 8th field is 10090:

awk -F "\t" '$8 == 10090 { print $0 }' myFile

Print fields 1, 2, 3 from a tab-delimited file where the 4th field contains a '99':

awk -F "\t" '$4 ~ /99/ {print $1"\t"$2"\t"$3}' myFile

Add text ('lcl|') after the ">" to format a fasta file for BLAST indexing:

sed 's/>/>lcl|/' mySeqs.fa

Find all files ending in .pl and copy them to the 'Perl_archive' directory:

find . -name \*.pl -exec cp {} Perl_archive/ \;

Remove HTML tags:

sed -e :a -e 's/<[^>]*>//g;/</N;//ba' myFile.html

Print lines, from 2 lines before to 3 lines after, when a word ("ABC99") is matched:

grep -B2 -A3 "ABC99" myFile

Convert lowercase letters (a, c, t, g) into 'n' using the 'tr' command:

tr actg n < softmasked_sequence.fa > hardmasked_sequence.fa

Remove all version numbers (ex: '.1') from the end of a list of sequence accessions

sed 's/\.[0-9]\+//g' accsWithVersion > accsOnly