HOMEWORK 5
- Modify the script patscan_batch.pl to run the following emboss programs on every sequence in a folder.
- Download the liverseq.tar file, transfer the file to fladda, and untar the file in your fladda home directory. If you use X-windows on fladda, you can download the liverseq.tar with netscape to fladda directly. Click demo at unix class to find out how to decompress a file. You should get 6 sequence files within a new directory called liverseq.
- Create the reverse complement of each sequence in the liverseq directory, and save each one in a file (Each file name ends with .revcomp. For example, if original file is T50888.tfa, the new one is T50888.revcomp) under a new directory called RevComp. You need to make the RevComp directory with unix command before running the modified script. You can use the revseq program in emboss.
- Find the inverted repeats in each sequence in the liverseq directory, and save them in files ending with .palindrome in a new directory called Palindrome. You need to make the Palindrome directory with unix command before running the modified script. You can use the palindrome program in emboss with default parameters except that the number of mismatches allowed is 2. There is no argument for -overlap.
- Translate each nucleic acid sequence in the liverseq directory in all 3 forward frames, and save result in a file ending with .peptide in a new directory called Peptide. You need to make the Peptide directory with unix command before running the modified script. You can use transeq program in emboss.
- Figure out a way to make your script dynamic, which can take the input and output directories from command prompts rather than specifying the names of directories in the patscan_batch.pl script.
- Modify the script oligo.pl to find only the oligos whose GC% are between 40-60. Print the result in a table, which the first column is the starting position of the oligos; the second column is the oligo sequences; the third column is their GC%. Each row in the table is tab delimited. This is a sample of the table. You can use your revised script to find the 20 mer oligos from between position 10 to 100 in the sequence.