See /education/bioinfo2006/db4bio/ for the course page.
See the solutions to this exercise to check your work.
To connect to MySQL on hebrides, you have two choices:
a. You can connect directly from your laptop or desktop computer to MySQL on hebrides without logging on to hebrides. To do this, you need MySQL on your computer, which is the case for the class laptops. If you need to do this on your own computer, go to the most recent MySQL download page, select your operating system, download, and install. This will get you the MySQL database management system (DBMS), which has the 'mysql' command to connect you to another computer (or you can create a database on your own computer).
b. You can first log on to hebrides and then connect to MySQL. To do this, you need to have or install a ssh (secure shell) application. See Connecting to tak or hebrides or (if using Windows) install Cygwin as described on Using Cygwin/X for X Windows on Tak or Hebrides.
Note that you can install and use MySQL on your own computer. We have chosen to use MySQL on a Linux computer so everyone in the class can access the same database. If you ever use MySQL on your own computer, the MySQL queries would work just the same (but be perhaps somewhat slower).
1a. Get to the Unix prompt (Mac OS X) or Unix-like prompt (Windows) on your laptop. If you're using Mac OS X, open the X11 terminal (Applications > Utilities > X11), which is also on the dock. If you're using Windows, open Cygwin, which is a "Unix emulator", meaning that it makes your Windows computer act as if it's a Unix computer.
In either case, you now have a terminal, essentially a place to send commands to your computer, including the possibility of connnecting to a remote computer.
Change directories ("cd", a Unix command) to go to the desktop:
cd "$USERPROFILE/Desktop" (for Windows)
cd Desktop (for Macintosh)
1b. From the terminal on your laptop, connect to Hebrides using the secure shell (ssh) command:
ssh -l username hebrides.wi.mit.edu
and you'll be prompted for your password.
3. To access MySQL on Hebrides, use this command to connect to the db4bio database:
mysql -u username -h hebrides.wi.mit.edu -D database -pwith the following options:
-u username (replace with your MySQL username [not necessarily your hebrides username])
-h host (where the "host" is hebrides.wi.mit.edu; not necessary if you're already connected to hebrides)
-D database (which takes you directly to the database we'll be using ("db4bio"); optional)
-p password (for which you will be prompted)
ex: mysql -u guest99 -h hebrides.wi.mit.edu -D db4bio -p
Try some queries in increasing level of complexity, starting with some easy examples to show correct syntax.
Note: MySQL keywords (SELECT, FROM, etc.), data fields, and search strings are case-insensitive but other words (table names) ARE case sensitive.
SQL querying can be done any one of several ways:
mysql> source query_1.sql;
mysql -u username -h hebrides.wi.mit.edu -D db4bio -p < query_1.sql
Convert each of the following SQL queries into a question, and convert each of the following questions into SQL.
They are listed in order of increasing difficulty, and some are quite difficult.
You may wish to consult the answers.
SELECT * from Sources;
2. What are some example GenBank IDs (accessions) and corresponding sequence descriptions?
SELECT count(*) FROM LocusLinks;(to get number of tuples in table)
4. How many different Affy IDs are in the expression data?
SELECT description, species FROM LocusDescr WHERE linkId = 655;Note that 655 is of type integer (int), so it doesn't need quotes.
6. What's the expression level of Affy ID 33659_at in experiment hs-liv-1?
SELECT * FROM LocusDescr WHERE description LIKE "phosphatase%";Note that the "%" (wildcard) matches any character(s).
8. What are all the gene descriptions, along with their GenBank IDs, containing the phrase "growth factor"?
SELECT * from Data WHERE exptId = "mm-hrt-1" and level > 6000 ORDER BY level DESC;
10. What Gene Ontology terms (and corresponding accessions) contain the phrase "transcription factor"? Put your answer in alphabetical order of terms.
SELECT Data.affyId, Targets.gbId, Data.level FROM Data, Targets WHERE Data.affyId = Targets.affyId ORDER BY Data.level DESC LIMIT 10;
12. What Affy probes correspond to target sequences with the phrase "interleukin" in their description?
SELECT Data.affyId, Targets.gbId, UniSeqs.uId, Data.level FROM Data, Targets, UniSeqs WHERE Data.affyId = Targets.affyId AND Targets.gbId = UniSeqs.gbId ORDER BY level DESC LIMIT 10;
14. Make a table of ten affyId, UnigeneID, and Unigene descriptions in reverse alphabetical order of Unigene descriptions.
SELECT affyId, AVG(level) AS average FROM Data GROUP BY affyId ORDER BY level DESC LIMIT 10;Note that AVG() is an aggregate [like COUNT(), MIN(), MAX(), and SUM()] and "AS" creates a field with the requested name. Since you're combining rows, you have to use GROUP to show how to do it.
16. What is the average expression level of each of the six experiments?
SELECT Data.affyId, Data.level, Data.exptId, DataCopy.affyId, DataCopy.level, DataCopy.exptId FROM Data, Data DataCopy WHERE Data.level > 10 * DataCopy.level AND Data.affyId=DataCopy.affyId AND Data.affyId NOT LIKE "AFFX%" LIMIT 10;(To compare rows of the same table, we copy the table and compare the table to its copy.)
18. List ten non-control human Affy IDs (IDs without "AFFX") with higher expression in liver than in heart.
SELECT Data.affyId, Data.level AS Heart_level, DataCopy.level AS Brain_level, Data.level - DataCopy.level AS Difference FROM Data, Data DataCopy WHERE Data.affyId= DataCopy.affyId AND Data.exptId = "hs-hrt-1" AND DataCopy.exptId = "hs-cer-1" ORDER BY Difference DESC LIMIT 10;
SELECT Data.affyId, Targets.gbId, LocusLinks. linkId, LocusDescr.description, Data.level as Heart_level, DataCopy.level as Brain_level, Data.level - DataCopy.level AS Difference FROM Data, Data DataCopy, LocusLinks, LocusDescr, Targets WHERE Data.affyId= DataCopy.affyId AND Data.exptId = "hs-hrt-1" AND DataCopy.exptId = "hs-cer-1" AND Data.affyId = Targets.affyId AND Targets.gbId = LocusLinks.gbId AND LocusLinks.linkId = LocusDescr.linkId ORDER BY Difference DESC LIMIT 10;