Author Archives: Trey

Video Tip of the Week: LineUp, data ranking visualization tool


Caleydo, from the Institute of Computer Graphics and Vision, a suite of genomics and biomolecular visualization tools. As the project developers state, it’s strength is “the visualization of interdependencies between multiple datasets.” The tip of the week this week is a video introducing one of their newest tools: LineUp.

LineUp is an open source scalable visualization technique for ranking systems that use several disparate ranks. Lineup was developed to

address [the] need to understand the ranking of genes by mutation frequency and other clinical parameters in a group of patients,…It is an ideal tool to create and visualize complex combined scores of bioinformatics algorithms.

Yet, it can be used for many different ranking systems whether that is to view rankings of universities or restaurants, or ranked datasets from from various sources. In the video above, the users explain how to use Lineup to look at and visual the ranking of universities based on several different rankings such as student reputation, student-to-faculty ratio and many others. The tool  allows users to assign weights to different parameters to create a custom ranking.

You really need to watch the video to understand the power of the visualization tool and the broad applicability. I immediately saw several uses in research, but even down to choosing schools for my children. In San Francisco schools are by “lottery,” and you rank the schools by preference. There are so many datasets that affect that for parents, distance, academic ranking, teacher to student ratio, diversity ranking and several more. I could see this tool as a great way to determine the ranking of our choices. The uses are endless.

Quick Links:




Gratzl S, Lex A, Gehlenborg N, Pfister H, & Streit M (2013). LineUp: visual analysis of multi-attribute rankings. IEEE transactions on visualization and computer graphics, 19 (12), 2277-86 PMID: 24051794

Video Tip of the Week: BioRxiv, A preprint server for biology

 Open access to scientific research has been advocated for a long time, even before the advent of the internet. With the internet, the movement grew. Many open access journals, NIH now requires NIH-funded research to be open access within a year of publication, NSF and other agencies are working on similar plans.

A part of that movement to open access to research, “preprint” has also grown. The preprint of scientific research allows for the fast dissemination of research and the open discussion of results before they’ve gone through peer review. Peer-reviewed research can take weeks, months and sometime years to be publicly disseminated through publication. In a modern world of fast-developing and changing science, preprint distribution allows for a much faster access to research

The most well-known preprint server is arXiv. Started in 1991 at the Los Alamos National Laboratory and moving to Cornell University Library, arXiv allows for the open access preprint dissemination of physics, mathematical and computer science research. By many standards, it has been a success in getting research quickly and openly discussed.

There have been previous attempts for such biological research preprints in the past and currently: Nature Precedings (which ceased accepting new manuscripts in 2012), PeerJ Preprints and others.

That a traditional publisher such as Nature (Nature Precedings) had made a foray an open access preprint repository, goes to the need and demand for such as service. The rise of biological research preprints in arXiv has grown rapidly.biorxiv Just last year, a case was made in PLOS Biology for just such a server for life sciences research by Desjardins-Proulx et al.

The first and most often discussed advantage of open preprints is speed. The time between submission and the official publication of a manuscript can be measured in months, sometimes in years. For all this time, the research is known only to a select few: colleagues, editors, and reviewers. Thus, the science cannot be used, discussed, or reviewed by the wider scientific community. In a recent blog post, C. Titus Brown noted how posting a paper on arXiv quickly led to a citation (arXiv papers can be cited), and his research was used by another researcher. The current system of hiding manuscripts before acceptance poses problems for both scientists and publishers. Manuscripts that are unknown cannot be used and thus take more time to be cited. It has been shown that high-energy physics, with its high arXiv submission rate, has the highest immediacy among physics and mathematics.

And now we have it. Above you will find the promo video for a new life sciences open access preprint server: bioRxiv. Science has a introductory post about it from November 2013 (2 days after it was announced).

Like arXiv, bioRxiv is housed and run by Cold Spring Harbor. LIke arXiv, it is open access, preprint and has similar rules. You can learn more about the specifics (such as journal preprint policies) on the about page. Articles can be in most any life sciences topic from biochemistry to zoology, and other fields, such as physics, if the research has direct relevance to life science. It will not, however, publish medical research such as clinical trials.

Articles are placed into three categories:

Articles in bioRxiv are categorized as New ResultsConfirmatory Results, orContradictory ResultsNew Results describe an advance in a field. Confirmatory Results largely replicate and confirm previously published work, whereasContradictory Results largely replicate experimental approaches used in previously published work but the results contradict and/or do not
support it.

The biological research community has asked for it, and here it is. Currently, there are only 200 or so manuscripts submitted, a quick search of ‘retrovirus‘ brings up only 3 results. But, bioRxiv is only 6 months old. Keep an eye on it, better yet, test it out and submit.

Quick Links:



Desjardins-Proulx, P., White, E., Adamson, J., Ram, K., Poisot, T., & Gravel, D. (2013). The Case for Open Preprints in Biology PLoS Biology, 11 (5) DOI: 10.1371/journal.pbio.1001563

Video Tip of the Week: list of genes associated with a disease

I am currently in Puerto Varas, Chile at an EMBO genomics workshop. The workshop is mainly for grad students and the instructors are, for the most part, alumni of the Bork group. I gave a tutorial on genomics databases.

Anyway, the last two days of the workshop is a challenge, in teams of 3-4 advised by an instructor, students are to develop a list of genes associated with epilepsy. Obviously, this could be a trivial task, just go to OMIM or GENECARDS and grab a list. But this challenge requires them to go behind that and use the available data and make predictions. My team attempted, on my suggestion, some brainstorming techniques to ensure a more creative solution than they could come up with individually or just jumping into normal group dynamics. It seemed to work, their solution was quite creative and we will find out today how that worked.

That was my long way of saying, in the process we came across many databases of gene-disease information. above you will find a video of rat gene disease associations from RGD, often used of course to investigate human gene disease associations.

Below you will find a list of some excellent databases and resources to find similar lists:

Gene Association Database






Several NCBI resources

UCSC Genome Browser’s tracks for disease and phenotype

There are several others I’m sure, if you have a favorite not on this list, please comment.

Reference for RGD:
Laulederkind S.J.F., Hayman G.T., Wang S.J., Smith J.R., Lowry T.F., Nigam R., Petri V., de Pons J., Dwinell M.R. & Shimoyama M. & (2013). The Rat Genome Database 2013–data, tools and users, Briefings in Bioinformatics, 14 (4) 520-526. DOI:

Video Tip of the Week: JANE, comparing phylogenies

Unable to display content. Adobe Flash is required.
When I was doing my Ph.D. in the ancient days of the Sanger Method sequencing and reading in the results with one hand on the keyboard and reading the GATCs on the read (and going to the lab in the snow uphill both ways), my purpose for slogging  through all that was to eventually get a phylogeny of the sequences of the retrotransposable elements I was studying. Why did I want that phylogeny? Because I was comparing the phylogeny of the retroelements to that of the species in which they reside. We were attempting to determine if these retroelements were stable within the taxa lineage (they are) or there was promiscuous horizontal transfer occurring. We did those comparisons, but it would have been nice to have a ‘cophylogeny reconstruction’ program :D. There are often times similar comparisons of phylogenies are necessary. Host-parasite studies, coevolution, etc. Jane is a software package (free with registration) that uses a heuristic approach, “running a genetic algorithm with an internal fitness function that is evaluated using a dynamic programming algorithm.” It can often give an optimal solution for that cophylogeny you are studying. Jane was developed in the research group of  Ran Libeskind-Hadas at Harvey Mudd College and you can  read more about the algorithm and approach here. They also have an extensive written tutorial. In these tips we usually focus on web-interface to tools, but I liked this package (and it’s free) and wanted to play around with it, so today I’ll walk you through a very quick intro to downloading and getting started with the tool. Quick Links: Jane Jane Tutorial CoPhylogeny Reconstruction TreeMap (another cophylogeny reconstruction software) CopyCat (yet another) Book Chapter on Cophylogeny and reconstruction Conow, C., Fielder, D., Ovadia, Y., & Libeskind-Hadas, R. (2010). Jane: a new tool for the cophylogeny reconstruction problem Algorithms for Molecular Biology, 5 (1) DOI: 10.1186/1748-7188-5-16

Video Tip of the Week: MetaPhlAn and Galaxy

CPB Using Galaxy 2 from Galaxy Project on Vimeo.

for loading and using datatypes and  the OpenHelix Galaxy tutorial for getting familiar with Galaxy interface and usage.

Metagenomics analysis can be a bit daunting at times, but there are a good number of tools out there to assist a researcher in analysis.  Integrated Microbial Genomes at JGI has some excellent tools such as IMG/M and IMG HMP M. (OpenHelix tutorialThere are other excellent tools that I suggest you check out. QIIME is an excellent tool also.

But the above is not per se a metagenomics tutorial, rather it’s some short screencast of how to use the Galaxy interface for loading data and datatypes. Why? Because another excellent set of tools to use for metagenomic analysis is MetaPhlAn from the Huttenhower lab at Harvard.

The MetaPhlan tools can be downloaded and used ‘offline’, but they also have an excellent Galaxy interface to the tools. If you walk yourself through the MetaPhlAn tutorials on their site, including their Galaxy module one, after familiarizing yourself with Galaxy above, that should help you get started on some excellent metagenomics analysis.

To get a feel of these and other tools and workflows, you might want to browse through this excellent slide set from Surya Saha, Research Associate at Cornell University, from last year.

Quick Links:


Nicola Segata, Levi Waldron, Annalisa Ballarini, Vagheesh Narasimhan, Olivier Jousson & Curtis Huttenhower (2012). Metagenomic microbial community profiling using unique clade-specific marker genes Nature Methods (9), 811-814 : doi:10.1038/nmeth.2066

Participate in an NSF “IDEAS LAB” (generate research agendas and proposals)

The short link: IUSE IDEAS LAB:

NSF’s education directorate has a funding opportunity called “Improving Undergraduate STEM Education” (IUSE).

The IUSE program description [PD 14-7513] outlines a broad funding opportunity to support projects that address immediate challenges and opportunities facing undergraduate science, technology, engineering, and math (STEM) education,
To generate research agendas and proposals for this, NSF is holding an… 

Ideas Lab:
Ideas labs are meetings that bring together researchers, educators and others in an “intensive, interactive and free-thinking environment, where participants immerse themselves in a collaborative dialog in order to construct bold and innovative approaches and develop research projects.” MOre often than not, these “Ideas Labs” produce new collaborations and research projects proposals that often go on to be funded. The Ideas Lab is patterned after the Ideas Factory process.
“to make new connections, which are frequently cross disciplinary, and also generate novel research projects coupled with real-time peer review.”
This NSF Ideas lab has several purposes, but the one most pertinent to this community is finding new ways, and develop research proposals, to infuse computational thinking, literacy and competency into the core curriculum for undergraduate education.
Individuals apply to the Ideas lab, it’s a 2 page proposal and is DUE FEBRUARY 4 (Next Tuesday). Funding is provided for the trip. These ideas labs are excellent ways to meet and discuss genomics, biology and education, build new collaborations and to develop new research proposals.
The letter and more information (read the link):
A Dear Colleague Letter on the topic of ³Preparing Applications to
Participate in Phase I Ideas Labs on Undergraduate STEM Education² [NSF
14-033] has been posted on the NSF web site.
If you have any questions, you can ask here or by email (wlathe AT ). I am _not_ a project officer at NSF and don’t have all the answers, but I can direct you to the places you might find answers.
PLEASE feel free to disseminate!

Tip of the Week: Creating an Electronic Informed Consent

Informed consent has been a foundation of research, and especially genetics research, in that last few decades though it’s taken quite some time to right past wrongs. And with genomics research and personal genomics generating huge amounts of data, informed consent becomes both more important and more complex. The National Human Genome Research Institute has a pretty good selection of information and regulations surrounding informed consent including the regulations, guidelines, specific NHGRI guidelines and applicable federal legislation. If you are doing human genetics and genomics research it would behoove you to make sure you understand the guidelines and issues. A good paper to read to understand would be the article, “Tailoring the process of informed consent in genetic and genomic research” in Genome Medicine cited below.

Depending on your institution and support, you might not have to ever write up or administer informed consent documentation, but often smaller institutions or projects must. So, let’s get to the nitty gritty logistics of actually creating an informed consent survey and document.

Today’s Tip of the Week is the 4 minute first part of a five part series from iDASH at UCSD on creating an electronic informed consent using LimeSurvey and then using iCONS, iDASH’s Informed Consent Management Tool. iDASH:

“is one of the National Centers for Biomedical Computing (NCBC) under the NIH Roadmap for Bioinformatics and Computational Biology. Founded in 2010, the iDASH center is hosted on the campus of the University of California, San Diego and addresses fundamental challenges to research progress and enables global collaborations anywhere and anytime. Driving biological projects motivate, inform, and support tool development in iDASH. iDASH collaborates with other NCBCs and disseminates tools via annual workshops, presentations at major conferences, and scientific publications.”

There are many ways to create an electronic survey, but this set of tutorials makes it straight forward using LimeSurvey and iCONS. The other short sections are here: Part two, three, four and five.

… and while I have you at iDASH, I suggest you watch this hour talk by Philip Bourne (at UCSD) titled “In the future, will a biological database really be different than a biological journal.” The talk was given just over a month ago and an update to something Dr. Bourne said in 2005. It’s a good corollary to Mary’s (apt and true) oft-repeated mantra: “The data is not in the papers any more.”

(we’ve mentioned iDASH before and a lecture about the cost curation that was interesting)

Quick links:

iDASH Tutorials:


Charles N Rotimi and Patricia A Marshall (2010). Tailoring the process of informed consent in genetic and genomic research Genome Medicine, 2 (3) : doi:10.1186/gm141

VideoTip of the Week: ENCODE @ Ensembl

We have a lot of tutorials (2 in fact, ENCODE Foundations & ENCODE @ UCSC), tips and information about ENCODE. We also have a lot of tutorials (again 2, Ensembl and Ensembl Legacy- on the older versions ), tips and information about Ensembl, the database and browser at EBI.

Now here is a tip of the week on both Ensembl AND ENCODE. This is one of the more recent additions to Ensembl’s video tutorials. This video looks at how to identify sequences that may be involved in gene regulation. Most of this data at Ensembl is based on ENCODE data. This is using the “Matrix,” a way to select the regulation data you need based on cell types and TF’s. At the end of the 8 minute video they discuss a bit more about how to get all ENCODE data.

So, now you have a wealth of information here at OpenHelix through our tutorials and our blog about ENCODE and Ensembl.

Quick links:

ENCODE Tutorials:
Ensembl Tutorials:

[update]Tip of the Week: A Lapse in Appropriations, Some Affects on Research

nsfclosed[to keep abreast of the effect on research, this from is a good article from Science Careers, see below for more links and information]. Forgive me as I create a lapse in our weekly tips and instead give one ‘insider’s’ view of the shutdown (or lapse in appropriations in official parlance). This most definitely not a full understanding of how the government shutdown will effect you as a researcher, educator or student, but hopefully will give you an idea.

As many readers know, I am currently on a temporary leave from OpenHelix to work at the National Science Foundation. Though on leave, I still write Tips (among other activities) for the blog. My turn was today. Unfortunately, yesterday the government shut down and though it did, it took a lot of the day to prepare for an ‘orderly shutdown.’  We were given a large portion of the day to shut down operations. That was an intense time of emails, calls and preparations. We had to leave the office by 1:30pm. (unfortunately for me my home computer seized up in it’s own version of a shutdown the day before). So that you, our readers, may get an idea of the affect of the shutdown on NSF and research in general, allow me to relate my experience and some of the affects the shutdown will have on research and possibly you.

My work at NSF (and AAAS), for the most part, entails working on programs to broaden participation in science education. Among other activities, I organize working meetings to explore the research in the topic and develop and run a science engagement project for refugee youth.

As part of the shutdown, NSF shut it’s doors. This is pretty much a complete shut down. All NSF employees were told they had till 1:30pm to complete an “orderly shutdown” of all activities. For me this included emailing all the PIs and educators attending a meeting next week here at NSF about computer science education for those with disabilities and others that I am working with on other projects. By federal rules, no NSF employee, fellow or other staff is allowed to access email, respond to communications, work on any and all NSF-related activities, travel to conferences or meetings or otherwise conduct any business. The violation of this is a large fine.

What does that mean for those who have funding or dealings with NSF? If you have no immediate work or need to contact an NSF employee, than the effect will be minimal if the shutdown is short. If you need to submit a grant proposal, talk to a grant officer or have an NSF meeting to attend or any other business with NSF within the next few days (or as long as the shutdown lasts) you will be unable to do so. All activities have ceased. If there is an NSF meeting that takes place during the shutdown, it is canceled.

For example, I have been planning and organizing a meeting here at NSF for educators in computer science for persons with disabilities. It has entailed everything from getting rooms settled to setting agendas to choosing reviewers to procuring ASL interpreters and translating documents into Braille. If the shutdown continues into next week, that meeting will be canceled  and all attendees will be unable to come. So, in my case, if the shutdown lasted into the next week, this even would be canceled, the researchers and educators (about 20) attending will have to cancel their flights and plans and all arrangements will be for naught. Obviously this will have some costs from lost flights and plans, but it will also mean that hundreds of man hours of past work will be wasted.

If you have a current award, there should be little impact in the short term, but no new solicitations will be made, grants will not be allowed to be submitted, etc. As the homepage now says:

No new funding opportunities (program descriptions, announcements or solicitations) will be issuedFastLane proposal preparation and submission will be may be up and running, however, since FastLane will not be operating, proposal downloads from Grants.govwill not take place. Therefore proposals will not be checked for compliance with NSF proposal preparation requirements or processed until normal operations are allowed to resume.”

The National Institutes of Health will also shutdown though the impact will be somewhat different. Since NIH is a research institution in addition to be a granting one (NSF is a granting institution), all research at NIH will cease. Universities will feel little short term affect as one of the three granting cycles just completed. That said, October 5th is the next round of deadlines and if the shut down continues past that date, no grant applications will be accepted. A short shutdown will only delay the processing of those grants for up to a month, the longer the delay the more exponentially delayed the process will become because of backlogs.

There are other effects on the shutdown on researchers also. Visa sponsorships or requests from visiting scientists will not be issued during the shutdown and some activities will be postponed or cease. Another example from my own work is the science engagement program I run for refugee youth. Though not funded through NSF, it is something I do under my fellowship and thus I am not allowed to work on it. THis means the program will be halted until further notice and the students will not be able to participate in the activities they’ve been very excited to do.

So, let’s reiterate the top 10 effects on research the shutdown will have:

10. NSF, NIH and other agency websites and information will be unavailable or affected. Many databases such as PubMed will not be updated or maintained.
9. Visas for visiting scientists to the US or scientists collaborating overseas will not be processed.
8. Payments of student loans and stipends might be slowed or otherwise affected in the long term.
7. NIH, NSF and other agency-hosted meetings and discussions will cease.
6. New grants proposals will not be accepted, processed or reviewed.
5. Contractors and businesses that rely on NSF, NIH and other research and granting institutions will receive no business and payments will be delayed.
4. Government employed researchers, grant officers and staff (at NIH and NSF among other agencies) will not be paid.
3. Government employed researchers, grant officers and staff will not be allowed to work or contribute to any and all meetings, work-related activities or communications.
2. Research at NIH (and some other agencies such as Energy’s ARPA-E) will cease.

and the number one affect (at least in the sadness factor)
1. Children with cancer will be turn away at NIH.

Do you know of an effect missed here? Comment!

An article from ThinkProgress about some other effects on science and research including other agencies such as NOAA, etc.

A long Reddit discussion from scientists  (with some good pointers and not-so-good stories) about the effects.

Don’t worry, the lapse in a video tip will not be long as the shutdown. I have a lot of time today :D , so will be doing one for later.

Tip of the Week: 3D Protein Structure Modeling

Today’s Tip of the Week is going to be slightly different than what we usually do. We usually highlight a genomics or biological database in the tip, but today I’d like to point you to some  tutorials on the basics of bioinformatics from “bioinformatics-made-simple.” The series on Protein 3D prediction is pretty decent and is a mix of videos and text tutorials. It will give you a basic understanding 3d protein structure prediction. Once you do that, you’ll be set and ready to start using the myriad software prediction tools and 3D protein structure databases.

The tutorial above is a video from part I that is quite basic: “what is a protein structure.” Most of our readers probably should already know that ;-), but the later parts of the series start getting into more details about 3d protein structure prediction. Part II is a brief discussion of the purposes and methods of prediction (with a video of common terms used), part III, part IV and part V (to come) then get into more detail about software and algorithms. Many have additional reading. It is pretty basic, but will at least get you started.

Related Links:

OpenHelix PDB Tutorial
OpenHelix SBKB Tutorial