Category Archives: Tip of the Week

Video Tip of the Week: Phenoscape, captures phenotype data across taxa

Development of the skeleton is a good example of a process that is highly regulated, requires a lot of precision, is conserved and important relationships across species, and is fairly easy to detect when it’s gone awry. I mean–it’s hard to know at a glance if all the neurons in an organism got to the right place at the right time or if all the liver cells are in the right place still. But skeletal morphology–length, shape, location, abnormalities can be apparent and are amenable to straightforward observations and measurements. Some of these have been collected for decades by fish researchers. This makes them a good model for creating a searchable, stored, phenotype collection.

The team at Phenoscape is trying to wrangle this sort of phenotype information. I completely agree with this statement of the need:

Although the emphasis has been on genomic data (Pennisi, 2011), there is growing recognition that a corresponding sea of phenomic data must also be organized and made computable in relation to genomic data.

They have over half a million phenotype observations cataloged. These include observations in thousands of fish taxa. They created and used an annotation suite of tools called Phenex to facilitate this. They describe Phenex as:

Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to integrate phenotypic data from different organisms and studies, leveraging decades of work in systematics and comparative morphology.

That’s great data to capture to provide important context for all the sequencing data we are now able to obtain. I think this is a nice example of combining important physical observations, mutant studies, and more, with genomics to begin to get at questions about evolutionary relationships among genes and regulatory regions that aren’t obvious only from the sequence data. You may not be personally interested in fish skeletons–but as an informative way to think about structuring these data types across species to make them useful for hypothesis generation–this is a useful example.

Here’s a intro video provided by the Phenoscape team that walks you through a search starting with a gene of interest, and taking you through the kinds of things you can find.

So have a look around Phenoscape to see a way to go from the physical observations of phenotype to gene details, or vice versa.

Quick links:



Mabee B.P., Balhoff J.P., Dahdul W.M., Lapp H., Midford P.E., Vision T.J. & Westerfield M. (2012). 500,000 fish phenotypes: The new informatics landscape for evolutionary and developmental biology of the vertebrate skeleton., Zeitschrift fur angewandte Ichthyologie = Journal of applied ichthyology, PMID:

Balhoff J.P., Cartik R. Kothari, Hilmar Lapp, John G. Lundberg, Paula Mabee, Peter E. Midford, Monte Westerfield & Todd J. Vision (2010). Phenex: Ontological Annotation of Phenotypic Diversity, PLoS ONE, 5 (5) e10500. DOI:

Video Tip of the Week: Immune Epitope DB (IEDB)

This week’s tip was inspired by the recent NHGRI workshop of the future directions for funding and resourcing of genomics-related projects. Titled “Future Opportunities for Genome Sequencing and Beyond: A Planning Workshop for the National Human Genome Research Institute” brought together a lot of influential folks on this topic, and had them noodle on the priorities and major gaps in this arena that should get more attention going forward.

Much of the meeting was live-streamed, which was really great. You can see the video segments and sometimes the slides are available on the workshop page. One of the great things about this meeting was that there’s so much excitement about what scientists want to do, and all the terrific ideas that are out there. One of my personal favorites was the Human Cell Atlas presented by Aviv Regev. I’d love to work on that. I loved working on the Adult Mouse Anatomical Dictionary and Gene Expression Database at Jax.

But for today’s focus, I’ll turn to a totally different aspect of genomics research that intrigues me–the immune system. As an undergraduate in microbiology and immunology, the fact that microbes and their teeny genomes could wreak havoc on large mammals fascinated me (Ebola–I mean, seriously, it’s not that big). And that the hosts have developed the mix-and-match adaptable response and antibody system to do battle–clever stuff, as long as it doesn’t turn into an autoimmune situation…. But this could also be turned to good use if you want to battle cancer cells with immunotherapies. So when David Haussler’s talk brought that back around–the idea of the complexity of the immune response genomics which is not well characterized yet–I connected with that idea as well. And it struck me that I had not ever featured the Immune Epitope Database before, which Haussler had mentioned in his talk. It was also noted that this is an interesting system because it is also a hybrid of proteomics and genomics information that’s required to be wrangled. And if this is a direction that NHGRI will emphasize, it’s important to know what’s out there, and think about the ways to go forward.

So here’s Haussler’s talk to set the foundation, but there’s another video about the database I’ll point to below.

In this talk he mentioned NetMHC for peptide binding prediction as well, and ImmPort at NIAID. There was a quick mention of an unfunded prototype UCSC immunobrowser to keep an eye out for. And for the most part these resources aren’t new–you can find a number of publications that go back and describe the foundations and development over the years. And it seems to be a good solid foundation, and with appropriate support can continue to keep this important information coming.

To learn more about IEDB, you can access their documentation, which includes a whole list of video tutorials. Here I’ll highlight the intro/overview one–but there are others that offer specific guidance on other tasks. I can’t embed this one, so the link will take you over to the video at their site.

Click the image to visit the video page.

Click the image to visit the video page.

So have a look at the IEDB resources, and think about the future directions of this important aspect of genomics.

Quick links:

NHGRI workshop:


Intro IEDB video:




Vita R., J. A. Greenbaum, H. Emami, I. Hoof, N. Salimi, R. Damle, A. Sette & B. Peters (2010). The Immune Epitope Database 2.0, Nucleic Acids Research, 38 (Database) D854-D862. DOI:

Kim Y., Z. Zhu, D. Tamang, P. Wang, J. Greenbaum, C. Lundegaard, A. Sette, O. Lund, P. E. Bourne & M. Nielsen & (2012). Immune epitope database analysis resource, Nucleic Acids Research, 40 (W1) W525-W530. DOI:

Lundegaard C. & M. Nielsen (2008). Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers, Bioinformatics, 24 (11) 1397-1398. DOI:

Bhattacharya S., Linda Gomes, Patrick Dunn, Henry Schaefer, Joan Pontius, Patty Berger, Vince Desborough, Tom Smith, John Campbell & Elizabeth Thomson & (2014). ImmPort: disseminating data to the public for the future of immunology, Immunologic Research, 58 (2-3) 234-239. DOI:

Video Tip of the Week: EpiViz Genome Browsing (and more)

This is the browser I’ve been waiting for. Stop what you are doing right now and look at EpiViz. I’ll wait.

I spend a lot of time looking at visualizations of various types of -omics data, from a number of different sources. I’ve never believed in the “one browser to rule them all” sort of thing–I think it’s important for groups to focus on special areas of data collection, curation, and visualizion. Although some parts can be reused and shared, of course, some stuff just should be viewed win certain species or strategies that don’t always end up nicely in a “track” of data that you can slap on some browser.

My dreams of this began in earnest with the Caleydo tools I’ve been talking about for a long time. Years ago I began imagining genome browser data in one panel, pathway maps in the nearby one, TF motifs, an OMIM page loaded up, and other stuff that was all part of my train-of-thought on some issue. They Caleydo team has continued on this path, and their EnRoute and Entourage tools get part of that way too. You can do some of that with the nifty BioGPS layouts. I also love the idea of looking at multiple genomic regions at the same time, in the manner that the Multi-Image Genome viewer (MIG) enables.

So we are getting closer and closer. And this EpiViz tool is an excellent demonstration of how to combine necessary genome track data visualizations and other analysis strategies into one viewer. It also allows other types of data to come in, with the Data-Driven Documents tools. You should read the paper, you should try out their software, and have a look at this overview video the EpiViz team has provided to get started.

Off we go. More like this please.

Quick links:

EpiViz Browser example:

EpiViz main site:


Chelaru F., Smith L., Goldstein N. & Bravo H.C. (2014). Epiviz: interactive visual analytics for functional genomics data., Nature methods, PMID:

Video Tip of the Week: Biodalliance browser with HiSeq X-Ten data

Drama surrounding the $1000 genome erupts every so often, and earlier this year when the HiSeq X Ten setup was unveiled there was a lot of chatter–and questions: Is the $1,000 genome for real? And some push-back on the cost analysis: That “$1000 genome” is going to cost you $72M. A piece that offers nice framework for the field of play is here: Welcome to the $1,000 genome: Mick Watson on Illumina and next-gen sequencing. Aside from the media flurry, though, what matters is the data. And not many people have had access to the data yet.

Via Gholson Lyon, I heard about access to some:

A set of collaborators (The Garvan Institute of Medical Research, DNAnexus and AllSeq) have provided a test data set from the X Ten. I’ll let them describe this effort:

Take advantage of this unique opportunity to explore X Ten data.

The Garvan Institute of Medical Research, DNAnexus and AllSeq have teamed up to offer the genomics community open access to the first publicly available test data sets generated using Illumina’s HiSeq X Ten, an extremely powerful sequencing platform.  Our goal is to provide sample data that will allow you to gain a deeper understanding of what this technological advancement means for your work today and in the future.

My focus won’t be this data itself–but if you are interested in many of the technical aspects of this system and their process, have a listen to this informative presentation by Warren Kaplan from Garvan:

The sample data is derived from a cell line, the GM12878 cells. These cells are from the Coriell Repository here: Catalog ID: GM12878. Conveniently, this is one of the Tier 1 cell lines from the ENCODE project too, so there is other public data out there on this cell line–which I have explored in the past and knew some things about.

There are 2 different data sets of the sequence in the download files, and one of them is available in the browser to view. I’m sure the Genoscenti will be all over the downloadable files. But because I’m always interested new visualizations, I wanted to explore the genome browser they made available. Although I had heard of Biodalliance before, we hadn’t highlighted it as a tip, so I thought that would be interesting to explore. Biodalliance is a flexible, embeddable, extensible system that’s worth a look on it’s own, besides delivering this test data. And if you come by at a later date and the X Ten data is no longer available, go over to their site for nice sample data sets. Their “getting started” page has a nice intro to the features.

In the video, I’ll just take a quick test drive around some of the visualization features with the X-Ten GM12878 data. I’ll look at a couple of sample regions, one with the SOD1 gene just to illustrate the search and the tracks. And I’ll look at a region that I knew from the previous ENCODE CNV data had a homozygous deletion to see how that looked in this data set. (If you want to look for deletions later, search for the genes OR2T10 or UGT2B17).

Note: the data is time-sensitive–apparently it’s only available until September 30 2014. So get it while it’s hot, or browse around now.

Quick Links:

Test data site:

Biodalliance browser software details:


Down T.A. & T. J. P. Hubbard (2011). Dalliance: interactive genome viewing on the web, Bioinformatics, 27 (6) 889-890. DOI:

Check Hayden E. (2014). Is the $1,000 genome for real?, Nature, DOI:

Dunham I., Shelley F. Aldred, Patrick J. Collins, Carrie A. Davis, Francis Doyle, Charles B. Epstein, Seth Frietze, Jennifer Harrow, Rajinder Kaul & Jainab Khatun & (2012). An integrated encyclopedia of DNA elements in the human genome, Nature, 489 (7414) 57-74. DOI:

Garvan NA12878 HiSeqX datasets by The Garvan Institute of Medical Research, DNAnexus and AllSeq is licensed under a Creative Commons Attribution 4.0 International License

Video Tip of the Week: PhenDisco, “phenotype discoverer” for dbGap data

The dbGaP, database of Genotypes and Phenotypes, repository at NCBI collects information from research projects that link genotype and phenotype information and human variation, across many different types of studies, providing leads on variation that may be important to understand clinical issues. Some of the data is publicly available de-identified patient information, and some of the data requires authorization to access. This is valuable information, certainly, but I know I’ve heard folks grouse about how challenging it can be to locate specific things you might be interested it, because of a lack of standardization of some of the aspects of the project details.

The developers of PhenDisco were aware of the challenges of extracting the information out of dbGaP, and they chose to investigate ways make searches for key data more effective. They looked at requests that had come in to dbGaP. They surveyed researchers who would represent typical users, and found that the way to make the mining of dbGaP easier would be to standardize a lot of the aspects of the project descriptions and data. They thought through use-case scenarios. And once the standardization was completed, a new query interface relying on these new descriptors was made available as well.

For the foundations of the project and how they went about it, you should read their paper (linked below). But for this week’s video tip, I’ll include a couple of things that this group has delivered to help people understand their project and use their site. If you want the short version about how to approach the site, this YouTube video will cover that (erm, and I’m sorry about the actual disco music….):

But if you have time for the longer form, there’s a webinar they delivered that I’ll include here as well. Part of this webinar is the video from YouTube, but the details are easier to see in the YouTube version so I’d encourage watching that and skipping that piece of the webinar.

So have a look at the PhenDisco if you’ve been finding searchers of dbGaP have been less satisfying than you’d hoped. I think one of the best ways to grasp the standardization is to have a look at their advanced search page to see what types of things are selectable there. Try some searches and see if it’s helpful for your research.

Just wanted to add a link to a slide set from a journal club presentation on PhenDisco as well, in case the videos aren’t ideal for your situation. There is also a separate video of that journal club.


If this is a type of resource you find useful, you might also want to explore the PheGenI (Phenotype-Genotype Integrator) that I covered in a previous Tip of the Week too.

Quick links:

Project overview page:

Search engine main page:

Advanced search page to understand the structure:


Doan S., Lin K.W., Conway M., Ohno-Machado L., Hsieh A., Feupe S.F., Garland A., Ross M.K., Jiang X. & Farzaneh S. & (2013). PhenDisco: phenotype discovery system for the database of genotypes and phenotypes., Journal of the American Medical Informatics Association : JAMIA, PMID:

Tryka K.A., A. Sturcke, Y. Jin, Z. Y. Wang, L. Ziyabari, M. Lee, N. Popova, N. Sharopova, M. Kimura & M. Feolo & (2013). NCBI’s Database of Genotypes and Phenotypes: dbGaP, Nucleic Acids Research, 42 (D1) D975-D979. DOI:

Video Tip of the Week: VectorBase, for invertebrate vectors of human pathogens

I wish I had been clever enough to coordinate this week’s Video Tip of the Week with “Mosquito Week” a couple of months back. There was a bunch of chatter at that time about this infographic that was released by Bill Gates, which illustrated the contribution of various human-killing species. The mosquito was deemed: The Deadliest Animal in the World. Jonathan Eisen took issue with the numbers, however, noting that if you are consistent about the way you count disease vectors, humans come out on top (or, bottom, I guess, in this category). Still, Eisen noted, mosquitoes are important and demand attention. But there are lots of other vectors to keep in mind as well.

Luckily, the team at VectorBase is on it. VectorBase has been providing information on invertebrate vectors of human pathogens for a long time. They collect a variety of species data, including mosquitoes, but also a lot more–ticks, lice, flies, etc. Check out their list of organisms here: . They have information not only on basic biology, but also information about the very key problems of resistance to insecticides as well.

We’ve been fans of VectorBase for years, and have highlighted them in the past, after a site redesign a couple of years ago, and a few other times with various other news tidbits. But I was delighted to discover recently that they have a new overview video which is my favorite kind to highlight in these tips. If you are new to a resource, a brief overview is the most helpful way to understand the kinds of data and tools you’ll see at their site. They have a lot of other slide/PDF tutorials as well, which focus on specific tools and features that will supplement an overview. But in our experience, a video overview is a bit more tempting when you are first becoming acquainted with a resource.

So here I’ve embedded the VectorBase overview, which you can also find here: The slides to accompany it are also available there.

So have a look at VectorBase’s important collection of species data and tools. You can also read more about their foundations and directions in their publications, including the one below. I keep up with news about their new features from their newsletter, but you can also see other types of community outreach strategies over at their site.

Quick link:



Megy K., D. Lawson, D. Campbell, E. Dialynas, D. S. T. Hughes, G. Koscielny, C. Louis, R. M. MacCallum, S. N. Redmond & A. Sheehan & (2012). VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics, Nucleic Acids Research, 40 (D1) D729-D734. DOI:

Bonus video: The Gates blog hosted this highly-produced video about mosquito bites and their impact.

Video Tip of the Week: Google Genomics, API and GAbrowse

This week’s video tip comes to us from Google–it’s about their participation in the “Global Alliance for Genomics and Health” coalition. Global Alliance is aimed at developing genomic data standards for interoperability, and they’ve been working on creating the framework (some background links below in the references will provide further details). It has over 170 members, and one of these members is Google. Although Google talked about this earlier this year when they joined this group, more recently pieces have begun to emerge about the directions and specific tools. Google’s efforts made the mainstream news recently in their announcement about working on a project to examine genomic data associated with autism.

Although this video doesn’t talk about a single specific tool like we usually cover, it provides more detail about this framework for building tools which is important to understand. And in this video I learned about a new browser developed under this project that I did have a quick look at, and I’ll add below.

They browser that they reference is called GAbrowse–I assume that means Global Alliance browse–but there’s not a lot of detail. Their “about” dialog box says this:

GABrowse is a sample application designed to demonstrate the capabilities of the GA4GH API v0.1.

Currently, you can view data from Google, NCBI and EBI.

  • Use the button on the left to select a Readset or Callset.
  • Once loaded, choose a chromosome and zoom or drag the main graph to explore Read data.
  • Individual bases will appear once you zoom in far enough.

The code for this application is in GitHub and is a work in progress. Patches welcome!

I kicked the tires a bit, but it’s clearly not fully fleshed out at this point. When I tried to zoom up from the nucleotide level it went up a bit, but eventually you hit a point that says “This zoom level is coming soon!” So certainly there’s more to come, and a lot more functionality that would be necessary. But it’s early. And it’s just a demo. I have no idea if it’s intended to become a stand-alone public browser.

So if you are interested in issue of cross-compatibility of human genomic data (and as far as I can tell this is all human-centric, I’d like to see a wider conversation on this), it’s probably worth knowing what Google is offering here. You should also be aware of what the Global Alliance is working on. Below I’ve added some of the publications and media I’ve seen about their efforts.

Hat tip to Can Holyavkin on Google+ for the link to the video.

Quick links:

Global Alliance for Genomics and Health:

Google genomics:


(2013). Global Alliance to Create Standards For Sharing Genomic Data, American Journal of Medical Genetics Part A, 161 (9) xi-xi. DOI:

Callaway E. (2014). Global genomic data-sharing effort kicks off, Nature, DOI:

White paper 2013:

Framework for Responsible Sharing of Genomic and Health-Related Data – DRAFT # 7

Terry S.F. (2014). The Global Alliance for Genomics , Genetic Testing and Molecular Biomarkers, 18 (6) 375-376. DOI: [available here from GA:]

Video Tip of the Week: NCBI Variation Viewer

The folks at NCBI recently hosted a webinar that covered a number of resources: GTR, ClinVar, and MedGen. It was a nice introduction to these resources using a case study of exploring information about a 9-year-old child who needed to get clearance for participation in sports. So they follow the course of some details about this kid across the different resources at NCBI to show what you could learn at the different sites.

I was hoping that recording would become available so that could be a triple-tip of the week, but I haven’t seen any announcements of it; I’ll keep an eye out and highlight it in the future if it does. Below I have also referenced a paper that covers some of the same ground as that webinar. But in the meantime they also recently added a new short video about the Variation Viewer that I found handy as well. So that will be this week’s video tip.

I particularly liked the way you can easily select an exon to focus on, with the little bubbles near the top. That wasn’t obvious to me at first.  People are often asking me for handy ways to focus in on the specifics of a single exon.

In addition to this video, I will also offer a screen-cap of one of the slides from the longer webinar that linked to related resources around NCBI. If you haven’t checked out these associated tools you will want to look at them as well. There are a lot of terrific tools available and they are always adding new useful features. Follow them on Twitter for announcements about their tools and trainings–that’s how I stay on top of the new items.

NCBI webinar sitesQuick links:

Variation Viewer:





Landrum M.J., G. R. Riley, W. Jang, W. S. Rubinstein, D. M. Church & D. R. Maglott (2014). ClinVar: public archive of relationships among sequence variation and human phenotype, Nucleic Acids Research, 42 (D1) D980-D985. DOI:

Video Tip of the Week: Leukemia outcome predictions challenge

Although I had other tips in the pipeline, I’m bumping this one up because it is time sensitive. It’s about a competition (or challenge, as they describe it) to use data from cases of leukemia to model make predictions about the outcomes, which could help drive treatment decisions someday. It is called the Acute Myeloid Leukemia Outcome Prediction Challenge.

I found out about it on Google+ via Amina Qutub. In case you can’t see G+, here’s the detail from that post:

Join the crowd to make an impact on cancer research!

The crowd-sourced DREAM 9 Challenge Wiki site opened to all interested scientists, mathematicians, computer scientists, engineers and clinicians around the world. This DREAM Challenge’s goal is to use the wisdom of the crowds to develop new algorithms to understand and treat leukemia, using data provided by M.D. Anderson Cancer Center.

To join, learn about the Challenge, & interact with the data using new visualization tools, visit the DREAM Wiki:!Synapse:syn2455683/wiki/

You can sign up to access the data to begin to work with it. But even before that you can check out the visualization options they provide. This video illustrates a tool they have, which lets you examine specific proteins and specific clinical features, as well as the survival data. (As they note about this tool, though: “The DiBS Data Visualization modules are proprietary, patent-pending tools.”)

From their Data Visualization page, you can click below the video to start looking at that heat map and survival curve data–without having to sign up to access the underlying data. Under the video, click the “BioWheel Interactive Visualization” button to kick the tires a bit.

I started to click around with the visualization tools, I can’t quite figure out what the YWHAE and STK11 heat map patterns mean, they looked very different to me in the visualization.  I have signed up to look at the data itself but I haven’t had a chance to dig any more yet. But it’s available to anyone who agrees to the terms of use, and maybe you can suss out some of the signals that would meet the challenge’s goals:

Subchallenge 1: Determine the best model to predict which AML patients will have Complete Remission or will be Primary Resistant.

Subchallenge 2: For patients who have Complete Remission, predict remission duration.

Subchallenge 3: Predict the overall survival time for each patient

Researchers around he world are collecting lots of data on many disease scenarios. It needs to get closer to patients. Projects like this–with many eyes on it, are a nice way to help us get there–here’s a recent piece about other similar efforts: New platforms aim to obliterate silos of participatory science. There are other challenges from the Sage Bionetworks folks as well. They describe their mission this way:

As a 503c nonprofit organization, Sage Bionetworks’ mission is to catalyze a cultural transition from the traditional single-PI, single-lab, and single-company research paradigm to a model founded on broad precompetitive collaboration. This structure would benefit patients by accelerating development of disease treatments, and society as a whole by reducing the cost of health care and biological research. Sage Bionetworks is actively engaged with academic, industrial, governmental, and philanthropic collaborators in developing this distributed research model.

And there will be more challenges in the future–a reference below explains more of the foundation for these types of efforts. Keep an eye out for them, and hack away.


Boutros P.C., Kyle Ellrott, Thea C Norman, Kristen K Dang, Yin Hu, Michael R Kellen, Christine Suver, J Christopher Bare, Lincoln D Stein & Paul T Spellman & (2014). Global optimization of somatic variant identification in cancer genomes with a global community challenge, Nature Genetics, 46 (4) 318-319. DOI:

Dolgin E. (2014). New platforms aim to obliterate silos of participatory science, Nature Medicine, 20 (6) 565-566. DOI:

Video Tip of the Week: e-PathGen, Using Genomics to Support Public Health

Recently I saw the Director of Public Health Genomics for the CDC tweet about a resource that was new to me, ePathGenPathogen Genomics for Epidemiology. This is an area that I’m glad to see getting attention. My undergrad degree was in microbiology, and certainly the most memorable class I had in college was about pathogenic bacteria and viruses and their consequences throughout history and to the present. One thing that was stressed to us, though, was that we could only study the things we could culture. Some things were really challenging to grow or couldn’t be grown at all with current methods. I was struck by this again in a seminar I heard where a physician described the assessment of the the organisms in a brain abscess sample, they were able to culture 22 organisms. With PCR, the same sample showed 72. Eek.

But our new abilities to look at unculturable organisms by sequencing them rapidly, and then to more quickly and appropriately target infections, is also even getting NYT press coverage at this point: In a First, Test of DNA Finds Root of Illness. And that’s just one kid’s illness–this can also be used to more quickly put the brakes on community-wide issues too. So here’s the tweet that caught my eye:

And I went to see what e-PathGen was about. What they provide are a couple of video tutorials–but I can’t embed them here, they are part of a learning module that also has additional details and two case studies to work through.  The tutorials offer some guidance for folks who might be new to genomics and the sequencing technology from a public health perspective. Then the two case studies show how this type of information might be used on a specific outbreak of illness.

So here’s a look at their landing page, and you can click to go over there and watch their videos:

ePathGen Tutorials and Case Studies -- click to visit them.

ePathGen Tutorials and Case Studies — click to visit them.

Or go to the site with this link:

And usually we like to highlight specific database resources or other bioinformatics tools in our tips. And in the first case study I came across a database collection that was new to me–the BIGSdb system, Bacterial Isolate Genome Sequence Database. This is a framework that offers researchers and clinicians a place to store details of specific isolates of patient or environmental samples. It doesn’t require whole genome data–but it is flexible enough to support that as well as we will increasingly see more of that kind of data coming along.

This framework has now been used by a number of different groups to create databases with their organisms of interest. Check out this list of organisms that you can find individual samples from: I hope to take a look at this in a future tip–I’ve already gone longer than I like to in our weekly introduction to a new resource. So check back with us for more on this later.

Quick links:

ePathGen videos and case studies:

Bacterial Isolate Genome Sequence Database (BIGSdb):



Jefferies J. & McCulloch J. (2014). ePathGen – a new e-learning package in pathogen genomics., Euro surveillance : bulletin Européen sur les maladies transmissibles = European communicable disease bulletin, PMID:

Jolley K.A. & Maiden M.C.J. (2010). BIGSdb: Scalable analysis of bacterial genome variation at the population level., BMC bioinformatics, DOI: