Video Tip of the Week: SoyBase CMap

SoyBaseOver the years I’ve started to follow a lot of farmers on twitter. It might sound odd to folks who are immersed in human genomics and disease. But I actually find the plant and animal genomics communities to be pushing tech faster and further to the hands of end-users than a lot of the clinical applications are at this point in time.  And as #Plant16 rolls out to feed us, there was a lot of soybean chatter in my twittersphere.

So when SoyBase tweeted a reminder about some of their videos, I thought the timing was great. They have a YouTube channel for some videos to help users access the SoyBase data. And one of the tools they illustrate is CMap. Although we’ve touched on CMap a couple of times on the blog and in our training videos, we never featured it. It’s one of the GMOD family members that can offer you comparisions of different map coordinate data sets. But conceptually I think it’s a good idea for people to think about physical map vs sequence mapping data. And this video shows how you can examine these different representations at SoyBase.

Besides their software videos, though, SoyBase also links to a lot of other videos that help people to understand more about many aspects of soybean cultivation. Check out their wide range of topics on their Video Tutorials page. You never see how to use a two row harvester at human genomics databases, do you?

SoyBase: http://www.soybase.org/

Video tip of the week: Yak Genome Database

For this week’s video tip of the week we’ll explore the Yak Genome Database. Honestly I wouldn’t have predicted a week where I talk about the Sasquatch genome, the abominable snowman (really, it was a Nature paper), and yaks. But genomics is pretty wild these days.

Some folks are getting jaded about the new genome every day we seem to be getting–Carl Zimmer called it YAGS: “yet another genome syndrome” a couple of years ago already. But I’m delighted every time I see a new genome. Certainly the press releases are overselling the results in many cases. However, as Carl also points out:

What remains truly exciting is the kind of research starts after the genomes are sequenced: discovering what genes do, mapping out the networks in which genes cooperate, and reconstructing the deep history of life.

And I completely agree with this. However, I think the research teams deserve a bit of horn-tooting when they roll out their sequencing paper. The foundation for the future work needs to be done, and it needs to become available with some initial analysis. Then it becomes available for others to take that work further and for that team to continue to learn more.

The other great thing about the price reduction in sequencing and the access of new research groups is the range of species we now see coming along. Mushrooms. Birch trees. Puerto Rican parrots. Watermelon. Bananas (with the best Venn diagram in genome papers so far). Some of these are species that only small research groups have focused on before. But the sequence data leads to so many potential novelties in our understanding of their biological niche. Such is the case with yak.

Yaks are probably not on the radar of a lot of American researchers. But this is an important agricultural species for Tibet. It also has climate adaptations that are useful to understand. If we continue to face potentially rapid climate alterations, there are a lot of things we are going to want to know about how species adapt to different scenarios. We may need to help protect them from emerging pathogens. We may need to help coax some to different breeding cycles. And the more data we have about those species, the better.

However, the “big” genome data centers are not always able to absorb new and less supported species quickly. They have funding focus issues and limited resources too. So these species groups are often the ones who have to deliver genome access themselves. Most often I’m seeing these groups are setting up an installation of GBrowse. So understanding how to interact with that software can be really helpful as you look for new insights from new genomes.

So I offer you the Yak Genome Database:

We used the Generic Genome Browser (GBrowse) developed as part of the Generic Model Organism Database project (GMOD; http://gmod.org/wiki/GMOD) to visualize the genome of the yak [7]. In addition, predicted genes, single nucleotide variants (SNVs), multiple types of RNA sets and repeats contained within the YGD can be visualized using Gbrowse.

Have a browse around the yak genome. In the browser paper they highlight the ARG2 gene page in Figure 1, and the region of the GPR125 G-protein coupled receptor in Figure 2. I’ll show that in the video as well.


Yak Genome Database: http://me.lzu.edu.cn/yak/


Video Tip of the Week: JBrowse for genome browsing

There are a number of genome browsers out there. Some of them are big and institutional installations and are crucial to research today for a wide range of users. But as we increasingly see that more and more sequence data becomes available on species that have smaller communities, or maybe patient or family sets, or data types that aren’t supported by the big browsers, people may need to seek out other types of tools to support their projects.

Sometimes the choice of browser will be based on local knowledge. If you already have someone who sets up GBrowse, that’s great–that would be a good choice. And lots of people have chosen that for their data subsets, like this mitochondrial transcriptome project. But some people might prefer some of the features of Gaggle browser. Or maybe you want to consider JBrowse.

We talked about JBrowse a while ago. But recently I saw this tweet about it, so I took another look.

RT @mtwolfinger: #JBrowse 1.5.0 is out and comes with a direct-access storage driver for BigWig data files – #Bioinformatics #NGS http://t.co/K8Aw1L8d

So here I offer the video done by the JBrowse team to have a look at the current state of their software and the nice features it has. If you need to build a local browser for your project, it’s definitely worth a look.

Quick link: JBrowse http://jbrowse.org/

Video Tip of the Week: BioMart’s new central portal

BioMart is widely-used data management open-source software, with an interface that enables end-users to generate complex and customized queries across many types and sources of biological data. It’s part of the GMOD tool kit, and many project teams that have big data have chosen the BioMart software to organize and make their data available to you.

We’ve been fans of BioMart for years. It was one of the earliest software tools we described, as it was integrated into many of the sites that we covered–such as Ensembl. Eventually we broke it out into its own tutorial suite, though, as there are now dozens of groups that have built Marts of their own. Although the skin may change and the data sets that are available will vary at different sites, the underlying software features are the same. Learning to use the main BioMart portal will help you to use all of them. Until recently the list of data providers that used BioMart was on the homepage, but here’s a taste of that list from my slides:

In this video tip I’ll introduce the newly re-designed BioMart main site, and touch on some of the other version of BioMart that you should get to know. We’ll be updating our tutorial suite with the new look soon, but most of the software functionality is the same as we’ve covered otherwise (available by subscription).

There are two main versions of BioMart circulating right now. The v 0.7 is the one that will probably be most familiar to people who have encountered BioMart at any of the genomics sites that have installations right now. But there’s a new and re-designed v 0.8 that is under development. It’s the one that’s used at the International Cancer Genome Consortium (ICGC.org) and there’s also a 0.8 central BioMart portal available to try out. Eventually this may replace many of the 0.7 setups, but this depends on the site. Some may persist with 0.7 for a while rather than updating. So it’s probably wise to have an idea of how to use both of them at this time.

One of the features of the new BioMart interface that’s already got bioinformatics folks talking is the ID converter. This is a common problem in the field, and Steven Turner thought this was a nice aspect of the facelift: BioMart Gene ID converter.

I also wanted to note that BioMart is one of the tools that you can use at Galaxy to access large swaths of data for further analysis. At Galaxy, open the “Get Data” menu to see that BioMart is one of your options.

There was also a lot of buzz about BioMart last week when a “Virtual Issue”of the journal Database was released that had not only an overview article about BioMart as a whole, but also several of the resources that use BioMart for their management and query interfaces as well. So you can see how widely useful this software is, among many different types of data providers. You can use the local installations of BioMart at a provider’s site, or you can use the main site to query from any of these sources as well–and more powerfully you can cross-database query too.

BioMart main site: http://www.biomart.org/

BioMart new style Bio Central portal: http://central.biomart.org/

BioMart pages at GMOD: http://gmod.org/wiki/BioMart

Virtual Issue of Database on BioMart: http://www.oxfordjournals.org/our_journals/databa/biomart_virtual_issue.html


GMOD community meeting announced

I am pleased to formally announce pre-registration for the upcoming GMOD community meeting which will take place October 12-13 in Toronto, Ontario, Canada, hosted by the Ontario Institute for Cancer Research.

The meeting web page is here: http://gmod.org/wiki/October_2011_GMOD_Meeting

and while it is still under construction, the registration page should be available by the end of next week, along with information about the keynote speaker(s) and logistics like hotels.  In the mean time, I urge you to go to the meeting page and add suggested topics and talks to the appropriate section.

Finally, the meeting itself will be limited in size, so when registration is open, I urge you to register as soon as possible, since I may need to close registration when we are full.

Thanks and I look forward to seeing you in Toronto, Scott

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

  • RT @32nm: ISCB Public Policy Statement on Open Access to Scientific and Technical Research Literature — Bioinformatics http://ff.im/x9hQ2 Hat tip to Mitsuteru Nakao [Mary]
  • The beginning of population level structural variation information on the human genome, from the at the  1000 genomes project. Hat tip to GenomeWeb [Jennifer]
  • Registration for the March 2011 GMOD Meeting is now open. See the wiki page at March 2011 Community Meeting. [Mary]
  • Need to assemble and analyze large datasets for multigene phylogenetic analysis? Might want to try out the new iPhy. Paper here. [Trey]
  • Ok, I had to go look at this–not what I expected…. : Top 200 Genomic Females list for Feb. 2011 is NOW ONLINE at http://bit.ly/i0Nj3O via @holstein_world. Yes, the name did clue me in, but I still had to look. [Mary]
  • Scientific Reports: interesting new venture by Nature Publishing Group. [Jennifer]
  • I love when rare diseases can shine lights into the darkness–very neat story on arterial calcification. Fascinating: RT @NatureNews: Solution to medical mystery offers treatment hope http://ff.im/-xiZYE [Mary]

GMOD 2011 Spring Trainings announced; applications available

From the ethers comes word of the GMOD spring trainings. I hear how valuable these are for folks who are working with the Generic Model Organism tools like GBrowse, Apollo, Chado, and more from the installation/configuration perspective. (Our trainings focus on the end users.) And this time they are also listing Galaxy as one of the tools they’ll be training on! Here’s the full text, with the links to find out more:

Applications are now being accepted for the 2011 GMOD Spring Training
course, a five-day hands-on school aimed at teaching new GMOD
administrators how to install, configure and integrate popular GMOD
components. The course will be held March 8-12 at the US National
Evolutionary Synthesis Center (NESCent) in Durham, North Carolina, as
part of GMOD Americas 2011.

* http://gmod.org/wiki/GMOD_Americas_2011
* http://www.nescent.org/

These components will be covered:
* Apollo – genome annotation editor
* Chado – biological database schema
* Galaxy – workflow system
* GBrowse – genome viewer
* GBrowse_syn – synteny viewer
* GFF3 – genome annotation file format and tools
* InterMine – biological data mining system
* JBrowse – next generation genome browser
* MAKER – genome annotation pipeline
* Tripal – web front end to Chado databases

The deadline for applying is the end of Friday, January 7, 2011.
Admission is competitive and is based on the strength of the
application, especially the statement of interest. The 2010 school had
over 60 applicants for the 25 slots. Any application received after
deadline will be automatically placed on the waiting list.

The course requires some knowledge of Linux as a prerequisite. The
registration fee will be $265 (only $53 per day!). There will be a
limited number of scholarships available.

This may be the only GMOD School offered in 2011. If you are
interested, you are strongly encouraged to apply by January 7.


Dave Clements

Community Annotation; Beyond Reference Genomes

I’m catching up with some mailing lists and news and I came across this interesting tidbit from our friends in the GMOD community. We are huge supporters of curation by humans for a couple of reasons: 1) we know the quality of that information can be the best and it captures so much of the information biologists need beyond sequence info; and 2) some of us have done curation and we know that it’s underappreciated but far from trivial :) .

We’ve also followed attempts by various groups to get the wider community involved to do a couple of things–to get authoritative and active researchers to put in stuff they know, and to reduce the burden on the overwhelmed professionals.  There have been a variety of ambitious attempts to get people involved in curation. UCSC has a wiki they rolled out. Some journals required Wiki updates. Seeding Wikipedia with some information and encouraging community input has been attempted. Separate new wikis on some topics have been initiated like WikiPathways.  We have been “skeptical optimists” about how some of these would go. We understand the need–but we know that end users of data are busy, they don’t get any work-related credit or time to do this sort of thing, and sometimes they don’t understand the finer points of curation.  But we like to see how the efforts work out and we’d like to see success.

Well–I’ve seen some results on various efforts that you ought to see. The GMOD community recently had a Community Annotation meeting that brought several groups together to discuss their experiences and outcomes. I’m not going to give it away–you need to go read it. One group had a 90% success rate with a strategy they attempted!!  Some groups are using curation as a student project. Others report on things they’ve tried that haven’t had as much success. Anyway: it’s all very interesting to know about–what works and what doesn’t.  And what about communities that don’t have MOD (model organism database)? They touched issues on that too.

There was another meeting too that took on a separate topic: Post Reference Genome Tools.  The premise is this:

How are we going to visualize and exploit (or even cope with) the world three years from now, when small labs may be able to fully sequence 500 individuals or species (or more) in a month? How can we visualize and link together 500, 1000, or 10,000 genomes? Many existing tools assume a reference genome. Will a reference make sense in the future, or will it hold us back?

We know a lot about the volume of data that we’ve already got that so many people aren’t aware of.  As I was just saying the other day: the data’s not in the papers anymore. It’s in these databases and it’s up to you and me to find and deal with it.  But how will the data providers offer it to you? These folks are thinking about this–and it may alter the way you interact with the data.  Again, I’m not giving it away: go read the report.

Thanks for the GMOD community for doing these reports. They are nice to have, and for those of us who can’t be there they offer a really helpful look inside.

Community Annotation Satellite Meeting Report

Post Reference Genome Tools Satellite Meeting Report

GBrowse: http://gmod.org/wiki/Gbrowse

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

  • RoBuST “has been developed as root and bulb plant community research platform for integrated analysis of root and bulb genomics data.” Cool. I’m a big fan of roots and bulbs–oh, crap,  just realized I forgot to buy carrots for the Pav Bhaji.  Will try to get them tomorrow at the farmer’s market or Faneuil.  [Mary]
  • FEAST is a sensitive local alignment program with multiple rates of evolution. An interesting project as part of a Ph.D. thesis :). I haven’t tried it yet, but from the commentary, it looks good. [Trey]
  • Because Trey often talks about the CLOCK gene, I found this set of Nature papers interesting: Editor’s Summary – Clocking on to diabetes [Jennifer]
  • From BioMed Central: CIG-DB: the database for human or mouse immunoglobulin and T cell receptor genes available for cancer studies plus a link to the actual site (free, no registration required):  CIG-DB [Jennifer]
  • announcement: GMOD Europe 2010, 13-16 September 2010, Cambridge UK [Jennifer]
  • As most parents and anyone who has watched a child over time knows, a large portion of our personalities are genetic. But like height and sexuality, they aren’t easily reduced to single (or even multiple) gene causes as this recent GWAS research is showing. [Trey]
  • There’s a site that is fielding questions about predominantly on Next-Gen type sequencing related issues: http://i.seqanswers.com/ [Mary]