SQL, SQL, SQL

Bioinformatics and Genomics sometimes (always?) brings together two very different groups: biologists and computer scientists. They are often biologists who know something about computers and computer scientists who know something about biology and sometimes they are computational biologists who do both. We (OpenHelix scientists) train biologists who want to use genomics tools that computational biologists (or a team of computer scientists and biologists) have developed. Sometimes those biologists want to do more and sometimes computer scientists need to learn a bit of biology. So, in that vein…

A lot of databases allow you to have direct query access to database. Most of these are also SQL databases. The UCSC Genome Browser is an example. Actually, in the Table Browser, it can come in handy allowing you to do some amazing stuff (or get burned). In fact, you can do it within a form based or freeform sql query. The form allows you to do a SQL query without actually knowing a tottle of SQL, but if you really want to get fancy, you can do your own query.

If you know SQL that is. Well, I was about to go find you some sites that will help you learn a bit of SQL so you can search those databases to get you that high-impact research article, but Sandra at “Discovering Biology in a Digital World” beat me to it. She suggests a few sites, two of which I really like. The first gives you a basic intro to SQL at W3Schools and the second gives you a great place to practice what you’ve learned, sqlzoo.  Learn some, practice some and then do a Table Browser search (or well.. just do a Table Browser search using the form, then you’d never have to learn SQL :). Btw, the comments in the DBDW link above have an interesting discussion about what is and isn’t a programming language :D.

But if you are a computer scientist who needs some biology quickies? well, there is the ‘basic biology concepts‘ post a while back. Want to learn something about genetic conditions, how about this: OMIM or GeneTests ? With those you’ll need some understanding of genetics to really delve into them and they aren’t really ‘tutorials’, they are databases. Of course you could watch one the the Top Ten Videos about Genetic Conditions on YouTube. Or learn a bit more about specific genes at a bit more basic level at the Gene Wiki.

6 thoughts on “SQL, SQL, SQL

  1. Chris Lasher

    Knowing the basics of SQL is a “Good Thing”. I’d like to also point out that if you already know a high-level language such as Perl, Python, or Ruby, you could alternatively use a database interface for that language (e.g. Perl DBI for Perl, SQLAlchemy for Python, or Ruby DBI for Ruby) which abstracts the database to a data structure (typically an object).

    Database interfaces allow you to interact with databases using knowledge of a language you’re already comfortable with, and remove the requirement of writing raw SQL statements, which can grow tedious. They also help manage complexity by being full-fledged data structures. Additionally, many times you will need to perform analyses on the data you retrieved from your queries. High level languages Using a database interface means your data is already loaded into a data structure of your preferred programming language, saving you time and effort.

  2. Jennifer

    A post about the relationship between biologists and computer scientists is very timely with an issue currently being commented on in the biocurator mailing list. For a few months there has been discussion in the Biocurator’s group about forming a professional society for biocurators. This week Pascale Gaudet asked the Biocurator mailing list if joining the International Society for Computational Biology (ISCB) as an interest group would be a viable option for creating a professional society for Biocurators.

  3. Chris

    It’s a few years old but I still highly recommend Aaron Mackey’s excellent introduction to SQL specifically tuned for biologists.

    http://www.people.virginia.edu/~wrp/papers/ismb02_sql.pdf

    For folks wishing to get tricky with their own SQL queries, various databases have open SQL ports you can use to issue freeform queries — no need to download the database (which can sometimes be rather large). Ensembl and FlyBase both provide open ports.

    You can query GO annotations in SQL via the GOOSE HTML interface, see
    http://www.geneontology.org/GO.database.shtml
    for details. There are some example queries pre-programmed to help you get round the kind of head-twisting required for some of the more complex ontology-based queries.

  4. boomydebby

    Thanks for the comments made on this site.

    I am a computer scientist with very little knowledge of I’m ready more daily. Please can someone tell me how to embed graphical images to the webpage as seen in the UCSC browser? I am currently on a project where I need to graphically represent the database on the webpage.

    Any comment on this will be greatly appreciated. Someone told me about crystal report but I’m not comfortable with that suggestion.

  5. Mary

    Hi debby–

    I’m not sure I understand what you need. Do you just need a screen shot of a region? (we love techsmith’s “snagit” for screen shots)

    Or do you need an image of a UCSC region? You can do that very quickly with the PDF/PS link at the top of any UCSC page to get a postscript file, and use that any way you need.

    Or are you saying you have data that you need to display in a graphical way on the UCSC browser? If that is the question I would encourage you to watch our advanced tutorial with attention to the custom tracks section. Custom tracks is a way to display your data on the browser.

    Tutorial is here: http://www.openhelix.com/ucsc

Comments are closed.