Today’s tip is on a TARGeT. TARGeT is, as the the paper’s title in the this year’s NAR’s issue states, “a web-based pipeline for retrieving and characterizing gene and transposable element families from genomic sequences.” There are several things you can do at TARGeT. Using BLAST, PHI BLAST, MUSCLE and TreeBest ,the main function of TARGeT is to quickly obtain gene and transposon families from a query sequence. The tip today is a quick intro to the tool and a search on an R1 non-LTR transposon.
For this tip of the week we look at a text-mining tool for the Arabidopsis literature, Plan2L, or PLant ANnotation to Literature. It has a very straightforward interface that permits searching of the paper space, and you can do that with a variety of focal points: the bibliome as a whole, or with emphasis on interactions, regulation, cell cycle, and more. The results offer links to the PubMed abstracts, and tabular results of the statistics of the term occurance in that area of focus. Green results indicate positive scores and likely relevance, red are likely to be non-relevant, a graphical guide to quickly finding the data of interest. Links to other resources including the BioCreative server, WikiGenes, iHOP and TAIR are provided as well.
The current emphasis for this resource is Arabidopsis, but it would be quite useful for other species too. If you are interested in text mining Arabidopisis I would also encourage you to compare the results with the Textpresso installation at TAIR to see what you discover in a different text miner interface as well.
Plan2L site: http://zope.bioinfo.cnio.es/plan2l/plan2l.html
For their recent paper on Plan2L see: http://www.ncbi.nlm.nih.gov/pubmed/19520768 or the full article freely available in PubMedCentral: http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=19520768
Barcoding Plant DNA (I hope the embed of the audio file works, first time I’m trying that…)
It is a discussion with Dr. Damon Little, a curator of bioinformatics from the New York Botanical Garden. The focus of the discussion is the recent publication of the CBOL Plant Working Group which has settled on the regions that will be used for barcoding plants.
If you aren’t familiar with barcoding efforts yet, you can check out Jennifer’s prior post with some background and great links. Essentially a small snippet of DNA sequence is used to (hopefully) uniquely identify a given species. This can be stored in a database–Dr. Little of the NY Botanical Garden refers to GenBank at NCBI, but there are other sites as well. I was just reading about the web interface for barcoding called iBarcode.org for analyzing and managing this sort of data.
The Consortium for the Barcode Of Life Plant Working Group summary press release of this work can be found here. The paper that describes the work is Open Access in PNAS here. The paper describes the genes that had been candidates for the barcode, and the ones that were selected (rbcL + matK). They described primer selection and sequencing results for the series they examined. They evaluate which ones meet the barcoding standard criteria and provide the selections. They use MUSCLE to examine the sequence alignments.
This is an excellent effort on many fronts. Just assessing and cataloging biodiversity is useful itself, but this can also help to identify plants that are claimed to be used in food or medicine products to see if that is what’s really in there. It can help combat poaching of protected species–for example, it can identify wood harvested that shouldn’t have been taken for lumber.
Glad to see this work moving forward and getting out in front of the public!
Podcast direct page: http://www.wnyc.org/shows/lopate/episodes/2009/07/29/segments/137623
Barcode blog: http://phe.rockefeller.edu/barcode/blog/
Scientific American article on the topic: http://www.scientificamerican.com/blog/60-second-science/post.cfm?id=botanists-agree-on-dna-barcode-for-2009-07-29
Consortium for the Barcode of Life (CBOL): http://www.barcoding.si.edu/
CBOL Plant Working Group (2009). A DNA barcode for land plants PNAS, 106 (31), 12794-12797 : 10.1073/pnas.0905845106
Singer, G., & Hajibabaei, M. (2009). iBarcode.org: web-based molecular biodiversity analysis BMC Bioinformatics, 10 (Suppl 6) DOI: 10.1186/1471-2105-10-S6-S14
Edgar, R. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput Nucleic Acids Research, 32 (5), 1792-1797 DOI: 10.1093/nar/gkh340
I keep an eye on a lot of mailing lists. Usually they are the ones for database or software resources in our field. But I also keep an eye on some funding ones. We aren’t always eligible, but it also helps us to get a sense of the directions that projects are going.
Yesterday I saw one that surprised me on several levels. It is called BREAD funding. BREAD stands for Basic Research to Enable Agricultural Development. I found this one interesting because:
1. It is a joint project between NSF and The Gates Foundation. Maybe there are other federal funding projects that involve private foundations like this. But I haven’t seen them.
The National Science Foundation (NSF) and the Bill & Melinda Gates Foundation (BMGF) are partnering to support a new research program to be administered by NSF. The objective of the BREAD Program is to support innovative scientific research designed to address key constraints to smallholder agriculture in the developing world
2. It is giving money for plant genomics in agriculture. Cool! Among the possible directions for the research:
- New strategies for creating resistance to major diseases and pests that affect plants, animals or insects of agricultural importance, and that have major impact in broad regions of the developing world.
3. It actually uses the phrase “climate change” and calls it a threat. And acknowledges several thing that I don’t think the last administration was serious about at all:
- Novel approaches to using the genetic diversity of plants, microbes, or animals to enhance the ability of small-scale farmers to adapt to emerging threats of global climate change, emerging diseases, and the rising costs of energy.
Anyway, I’m delighted to see basic research on plants in agriculture in Africa and Asia getting some attention. I was pleased when I heard Hillary Clinton refer to this recently, but I was waiting for someone to show me the money. And what do you know–they did.
This specific grant: http://www.nsf.gov/pubs/2009/nsf09566/nsf09566.htm?govDel=USNSF_25
More on the BREAD program: http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503285
Press release on the NSF + BMGF partnership: http://www.eurekalert.org/pub_releases/2009-03/nsf-nsf033009.php
I know sometimes I joke about “another day, another genome” as it seems like we can check off another genome daily. And as the next-gen technology spreads further that’s going to be even more common. It’s gotten me thinking a lot about which species ought to be done. And how will sequencing research teams choose?
The folks at the Agricultural Biodiversity Weblog have me intrigued on a bunch of resources that are not the ones that most bioinformatics folks in my sphere have focused on. I mean, I know why we focus so much effort on model organisms and the big food species like at Gramene and PlantGDB, and I support that. But when you start thinking about the other organisms that we rely on so much–in the big agriculture way and the small agriculture way–I think we need to bring those animals and plants into the herd :) And we can soon.
Their recent post Linking up livestock databases was the one that prompted this post. But they write a lot of things I like (especially about plant genetic resources) and really have me wondering and reading, and thinking about how to raise awareness on the other valuable species.
The livestock post pointed to several nice resources that I was unaware of before. In an article by Eildert Groeneveld in the Globaldiv Newsletter the focus in animals, and he offers several nice links. Check out the diverse sheep at the Heritage Sheep Breeds web site. Check out the species in the Central Documentation for Animal Biological Diversity in Germany here. Or the breed data collection at Oklahoma State–have you ever seen goats like those Alti Mountain goats? Wowsa. How about the Domestic Animal Diversity Information System or DAD-IS? There are other great links as well in the newsletter–check ‘em out. Another thrust of the article is linking up individual data with breed data via the EFABIS project as well to enhance the knowledge, and you can learn more about that in the newsletter.
Anyway, there are some really fascinating variations here. Understanding them would be a great project for folks with a next-gen sequencer waiting for input. Have a look. And celebrate rare breeds. We are going to need them in times of climate change.
I’m fascinated by all the genomes I see–and I’m delighted to see plant scientists raising awareness for that research. So this weekend it was nice to see some commentary about support for plant scientists on the political blogs. From an article by Hillary Clinton on Huffpo:
4. We will expand knowledge and training by supporting R&D and cultivating the next generation of plant scientists.
I hope that’s true–and that the funding comes through for that. But how nice to see an administration say out loud that they support science–and plant science specifically.
The article was also related to the work of Dr. Gebisa Ejeta who sleuthed out a way to create a plant resistant to Striga, a very tricksy parasitic weed that was seriously impacting sorghum farmers in Africa. Congrats to Dr. Ejeta who won the World Food Prize for this work. More like Ejeta!
Hat tip to the Biodiversity Weblog post that started me reading on this compelling work.
A newly enhanced database and resource is available to researchers called Phytozome. Phytozome is targeted as a hub of genomic data for plants of interest in biofuel research and a joint project of the DOE JGI and UC Berkeley’s Center for Integrative Genomics. As a recent press release states,
The gene families available in Phytozome, defined at several evolutionarily significant epochs, provide a framework for the transfer of functional information to important biofuel and agricultural crops from model plant systems, as well as allowing users to explore land plant evolution.
This release is v. 4 and includes the genomes of 14 plants from green algae, arabidopsis and corn. The resource uses GBrowse (free tutorial and training materials) as it’s genome browser, BioMart for advanced searching and has BLAST capability. I find Gramene a bit more extensive than Phytozome, but the focus of the two (biofuel plants and agricultural grains for Phytozome and Gramene respectively) are different and Phytozome is becoming quite extensive.
I remember going to a DOE/JGI users conference last year and being quite impressed with the research going on in biofuel, and also more sobered by the obstacles both techological and practical (use of food-producing land, etc) that we face. With rising gas prices and temperatures, can’t ask for too much information!
Comprehensive tutorials on the model organism databases ZFIN, SGD and PlantGDB and GBrowse, a model organism genome browser, enable researchers to quickly and effectively use these invaluable resources.
Seattle, WA September 15, 2008 — OpenHelix today announced the availability of new tutorial suites on several model organism resources including Zebrafish Information Network (ZFIN), Saccharomyces Genome Database (SGD) and the Plant Genome Database (PlantGDB) and also a tutorial using genome browsers with GBrowse. These four tutorials expand OpenHelix’s model organism database training which now also includes tutorials on MGI (mouse), FlyBase (drosophila), Gramene (grasses), RGD (rat), WormBase and more to come soon. Model organisms are integral to our understanding of basic biology and modern biomedical research. ZFIN is a collection of data, tools, and resources on the zebrafish (Danio rerio), a popular model organism for developmental biology and genetics research and SGD is a collection of data, tools and analyses centered around Saccharomyces cerevisiae, commonly known as bakers’ or budding yeast. PlantGDB is the primary resource for plant comparative genomics.
Additionally, OpenHelix has added a tutorial on GBrowse, a web application that allows you to explore genomic sequences together with annotated data. GBrowse is rapidly becoming the genomic browser of choice amongst model organism databases, because the browser is both universal and yet customizable.
The tutorial suites, available for single purchase or through a low-priced yearly subscription to all OpenHelix tutorials, contain a narrated, self-run, online tutorial, slides, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. These tutorials will teach users:
- to perform effective searches and understand the displays
- to access advanced searches enabling multifaceted queries
- to use the various databases of genes and markers, expression data, mutant genotype/phenotype details, ontologies, and more
- to investigate many related resources associated with ZFIN
- to navigate the SGD site, locate Basic and Advanced Search options, and use the site map to access additional search tools
- perform the two Basic SGD Quick and Text Search types and understand the displays
- to navigate the SGD Locus Page and access data from a variety of tools, tabs, and links
- to investigate many related resources associated with SGD
- to perform quick searches and navigate sequence pages
- to conduct BLAST searches across several plant species of your choice
- to create exon/intron gene predictions and sequence alignments
- to construct tables displaying highly varied information from many datasets
- the basic layout and search methods at GBrowse
- how to access detailed annotation data tied to genomic sequences
- how to select and customize annotations using Tracks
- how to upload and incorporate your own data or other external data sources
- take a tour of different GBrowse installations at model organism databases
We do mainly genomics and molecular biology resources around here, but I thought I’d mention this database I learned about at the Special Libraries Association conference this week. The database is “VegBank” and is a repository of plot data of species numbers and types from across the US and Canada. A great resource, I am told, if you do ecological studies.
So, it’s not specifically genomics nor molecular, but it is biological and a database so it might be of use to one of our readers!
This is one of the nice benefits of attending these conferences, we get to learn a lot about resources and databases we might not have heard about otherwise. There are so many out there, no directory or google search is going to find them all. From VegBank, Continue reading