Friday SNPpets

This week’s tips contain quite a range of things, from patent battles to drying tardigrades (probably somebody patented this?). I put in the goat genome again because I like goats. We have precision medicine, and mutants asking to not be discriminated against. Some interesting tools this week too.

Friday SNPpets

This week’s SNPpets include a variety of things–some the usual hard science, some were clever strategies to engage in softer ways. We also had genomics diagnostics and a roadmap to the clinic. Genes and developmental delays. Animal models for mechanisms, not for target discovery. There was a fun #DNADay16 talk about the genetics of Wizarding, using Harry Potter characters as the pedigrees. Metagenomics pizza. Also this week, the first time I saw news first “reported on github”.

Video Tip of the Week: Introduction to Biocuration and the career path


The ISB is a professional organization for biocurators

At OpenHelix, we’ve long sung the praises of curators. Some of us have been curators and worked with curation and database development teams. All of us have relied on quality information in the databases for research and teaching. But I think there are a lot of people who don’t understand the value of quality curation, how it’s done, and who curators are. They are widely taken for granted.

A recent talk by Claire O’Donovan of EBI-EMBL helps to explain the roles and the importance of biocurators. So although this talk isn’t a typical software talk, I think understanding this is crucial to everyone’s appreciation of how information you rely on gets into the databases you use. And if you find yourself in situations where you are guiding students, knowing about this career is also worthwhile.

Claire O’Donovan has had a front row seat to the development of this field, and has great enthusiasm for the future. And going forward, in your doctor’s office as precision medicine and treatments become a thing–how much do you want correct information in the databases? Mining data, standardizing language for descriptions of features, and sharing this information is crucial for all of us.

Here’s what’s covered in this video, from the agenda slide:

  • Introduction to the concept of biocuration.
  • The different kinds of biocurators, and the skill set needed.
  • Our community: Biocuration Society and conference.
  • The future of biocuration and career paths.

Specific examples of what curators do are illustrated (~6:30min). A sample UniProt entry illustrates what kind of information is captured and where it appears. She also touches on their work with Gene Ontology. And a bit about the ecosystem of curation, how teams at different resources help each other but don’t wish to duplicate work, using HGNC nomenclature as an example.

About 8min, the skill sets for biocuration are covered: data basics, curation skills, programming and database concepts, ontologies, and usability of the data collected. This also includes data access and management, as well as dissemination and outreach. This includes user training (yay!) and the concepts of data analysis for users.

There’s no formal degree path for curation practitioners at this point, and different groups will have different needs. But the community is begining to think about this, and about professional qualifications. She also mentioned a recent report from the National Academy of Sciences press on the topic of the future workforce skills and needs (linked below). This is an alternative career route for people with science training, and it’s important to understand not only the science but computational pieces. And it should be taken seriously as a discipline. There is now a journal that reflects this (also linked below).

Claire also takes a look at the future of biocuration, using the Center for Target Validation (CTTV) as an example. And she talks about the importance of quality information in medical records as we increasingly have genomic details in diagnosis and treatment situations. If we want precision medicine to work, we have to have the precise and correct information in the databases. So respect and value the curators. They are worth it. And if you know anyone that deserves special recognition–nominate!

Quick links:

International Society for Biocuration:  http://biocuration.org/

Preparing the Workforce for Digital Curation: http://www.nap.edu/catalog/18590/preparing-the-workforce-for-digital-curation 



Holliday, G., Bairoch, A., Bagos, P., Chatonnet, A., Craik, D., Finn, R., Henrissat, B., Landsman, D., Manning, G., Nagano, N., O’Donovan, C., Pruitt, K., Rawlings, N., Saier, M., Sowdhamini, R., Spedding, M., Srinivasan, N., Vriend, G., Babbitt, P., & Bateman, A. (2015). Key challenges for the creation and maintenance of specialist protein resources Proteins: Structure, Function, and Bioinformatics, 83 (6), 1005-1013 DOI: 10.1002/prot.24803

Gaudet, P., Munoz-Torres, M., Robinson-Rechavi, M., Attwood, T., Bateman, A., Cherry, J., Kania, R., O’Donovan, C., & Yamasaki, C. (2013). DATABASE, The Journal of Biological Databases and Curation, is now the official journal of the International Society for Biocuration Database, 2013 DOI: 10.1093/database/bat077


Video Tip of the Week: PhenomeCentral

Silos. This is a big problem for us with human genome data from individuals. We’re getting sequences, but they are locked up in various ways. David Haussler’s talk at the recent Global Alliance for Genomics and Health meeting (GA4GH) emphasized this barrier, and also talked about ways they are looking to work around the legal, social, and institutional barriers that we’ve created. He talked about Beacon, which I highlighted recently as a Tip of the Week. But there are other strategies needed to connect physicians and patients with other folks who might help them get to answers. Heidi Rehm’s talk provided information about a possible tool for this: PhenomeCentral.

Unfortunately, the videos aren’t uploaded to YouTube, you have to go to the June 10 Meeting page and obtain them from there. The one that contained the information on PhenomeCentral is the one called “Matchmaker Exchange”.

PhenomeCentralLogoThe mission of PhenomeCentral, according to their site, is:

PhenomeCentral is a repository for secure data sharing targeted to clinicians and scientists working in the rare disorder community. PhenomeCentral encourages global scientific collaboration while respecting the privacy of patients profiled in this centralized database.

Certainly people in bioinformatics are familiar with the really crucial information from OMIM and Orphanet. But these are aggregators of information, not patient-specific. There may be lists of features of a condition, but how they appear in a given patient’s situation might vary.

What this new strategy will do is let doctors and researchers take the phenotype and genotype data (you can upload VCF files), and make predictions about the genes involved. They also have ways to “matchmake” possibly similar disease manifestations. This project is part of the larger “MatchMaker Exchange” collection (Note: MME is not a dating site…it’s also still under development). But the idea is that with patient details one could search for matches with other similar patients (depending on the privacy level of the records, of course). It sounded to me like a kind of BLAST for medical conditions (they didn’t call it that). But it also has ways to semantically link phenotype concepts, because they might be entered differently by different evaluating physicians, yet be the same type of issue underneath. That Human Phenotype Ontology (HPO) that I’ve covered a couple of times lately enables this.

They have 3 levels of privacy settings included: private, matchable (where you can find it in a search, but it’s not wide open to everyone), and public.

So although I used the GA4GH talk as a launching point to learn more about the features and conceptual parts of the PhenomeCentral software, I also came across this other webinar that was more specific about the software features (which is what I typically prefer for our tips, the specific software tools). The Genetic Alliance is a patient-centric group interested in answers for genetic and genome-variant medical situations, actively working with advocacy groups and researchers to bridge the needs of both. In their webinar series last year they included PhenomeCentral.

What I didn’t realize from the GA4GH overview was that there are additional tools, including a pedigree tool in the PhenoTips part. We find a lot of people find our blog searching for pedigree tools, so I wanted to be sure to mention that specifically. You can try it out by entering fake data in the playground over there, and accessing the Pedigree Tool from that record. This was also handy for me because I didn’t create a login for the main PhenomeCentral site due to the privacy issues.

So have a look at PhenomeCentral. And from the GA4GH video I learned that there is a special journal issue coming up in the fall that will have papers related to these projects. So I’ll link to the PhenoTips publication below now, but when more references become available for this tool or project I’ll add them in. I expect there will be metrics about algorithms in use and other technical details that are important for fully evaluating the tool.

Quick links:

PhenomeCentral: https://phenomecentral.org/

PhenoTips: https://phenotips.org/ (has the playground + pedigree tool)

GA4GH videos: http://genomicsandhealth.org/news-events/events/june-10th-meeting-presentations

Girdea, M., Dumitriu, S., Fiume, M., Bowdin, S., Boycott, K., Chénier, S., Chitayat, D., Faghfoury, H., Meyn, M., Ray, P., So, J., Stavropoulos, D., & Brudno, M. (2013). PhenoTips: Patient Phenotyping Software for Clinical and Research Use Human Mutation, 34 (8), 1057-1065 DOI: 10.1002/humu.22347


Friday SNPpets

This week’s SNPpets include real-time visualization of Ebola spread, precision medicine informatics, big capacity for whole genomes, “genetobollocks” for a new description of media coverage of genomics papers, Neanderal pathogenic variants, and re-examining old problems on a couple of matters.

