Tag: NCBI

Tip of the Week: Introduction to Changes to NCBI’s Protein Database

14 July, 2010 (10:05) | General Science, Tip of the Week | By: Jennifer


In today’s tip I will introduce you briefly to the changes at NCBI’s Protein database. I highlighted that changes had been made in a Friday SNPets, and someone asked for more details. Our full updated tutorial will be much more complete than this short tip, so be watching for that to be completed in the near future – but for now, enjoy this tip & head over to NCBI to do some exploring of your own!

Updated Online Tutorials for NCBI resources including an NCBI Overview and PubMed and the Gene Expression Omnibus tutorials

8 June, 2010 (00:40) | OpenHelix News | By: Trey

Comprehensive tutorials on the publicly available NCBI resources enable researchers to quickly and effectively use these invaluable resources.

Seattle, WA (PRWEB) June 8, 2010 – OpenHelix today announced the availability of three updated tutorials on NCBI resources.

The National Center for Biotechnology Information, NCBI, is home to many of the most commonly used publicly available databases and tools in molecular biology today. They house such popular and widely used databases as GenBank, PubMed, GEO, Entrez Gene, Entrez Protein, and more. NCBI also produces, maintains and updates a variety of tools, like the large family of BLAST sequence similarity searching tools and the Entrez search and retrieval tools. In addition, they provide an extensive variety of services for education, news dissemination and different types of data submission. This tutorial presents a broad overview of NCBI’s databases, tools, educational resources and data submission protocols. In addition to an update on this overview, OpenHelix has updated both it’s PubMed and GEO tutorials. PubMed is the premiere search engine for biomedical literature. More than 18 million citations from life science journals can be searched through this free service. The Gene Expression Omnibus, or GEO, is a valuable resource designed to store high-throughput gene expression and molecular abundance data. These three tutorials, in conjunction with the many other OpenHelix up-to-date tutorials on NCBI resources such as BLAST, Entrez, dbSNP, MMDB, Viral resoruces, MapViewer and others will give you a set of training resources to help be efficient and effective at accessing and analyzing genome data.

The tutorial suites, available through an annual OpenHelix subscription, contain an online, narrated, multimedia tutorial, which runs in just about any browser connected to the web, along with slides with full script, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. The scripts, handouts and other materials can also be used as a reference or for training others.

These tutorials will teach users:

NCBI Overview

*to understand the basic structure of NCBI and its different types of resources
*to navigate NCBI to find the databases and analysis tools you need
*what types of educational resources are available at NCBI
*basic data submission procedures and background information
*how to search the entire NCBI site, as well as just the subset of Entrez databases

PubMed

*basic, advanced, and Boolean search methods
*additional searching methods like the Entrez Global query and the MeSH query
*tips to understand the visual cues and displays
*to use My NCBI to customize your results and save searches which can be run and emailed regularly

Gene Expression Omnibus (GEO)

*efficient ways to query GEO for specific genes or experimental designs
*how to navigate through GEO output displays to find the specific information you want
*how to navigate GEO’s complex data architecture to search GEO by specific record types

To find out more about these and over 85 other tutorial suites visit the OpenHelix Catalog and OpenHelix. Or visit the OpenHelix Blog for up-to-date information on genomics and genomics resources.

About OpenHelix
OpenHelix, LLC, (www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.

Friday SNPpets

21 May, 2010 (00:59) | SNPpets | By: Trey

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Education at NCBI

14 May, 2010 (09:59) | Genomics Research | By: Trey

I’d like to point out the new NCBI Education page. There is a lot there that you might want to check out. NCBI will be, starting this fall, offering a series of two-day training courses they are calling Discovery Workshops. Two years ago they ended the NCBI Field Guide workshops, so this seems to be a welcome change.

There are also webinars. Our research suggests that webinars are not particularly popular, so I’m curious how these turn out. There are also ‘how-to’ guides, documentation, community, teacher resources. It’s quite a nice site with lots of things to check out.

I’d also like to point out the “recommended links” section. There are lots of links to additional educational resources like the Cold Spring Harbor’s Dolan DNA Learning Center and much more. And, incidentally :) , a link to our own free tutorials which was very nice to see. You might want to check those out, we have over 10 including PDB, SGKB, UCSC Genome Browser, Galaxy, several model organism databases, and more.

Tip of the Week: WAVe, Web Analysis of the Variome

5 May, 2010 (00:14) | Tip of the Week | By: Trey

Today’s Tip of the Week is a short introduction to WAVe, or the Web Analysis of the Variome. The tool was recently introduced to us, and I’ve found it a welcome introduction to the tools available to the researcher to analyze human variation. This is apropos considering the recent paper we’ve been discussing on the clinical assessment of a personal genome (here, here and here) and that papers implications for personalized medicine and the use of online variation resources. WAVe also has introduced me to some additional tools I’ve either not been aware of, or haven’t used, which might be of use such as: LOVD (Leiden Open Variation Database), QuExT (Query Expansion Tool, also from the same developers as WAVe), and others. Of course there are also database information pulled in from Ensembl, Reactome, KEGG, InterPro, PDB, UniProt, NCBI and many others. Take some time to check it out.

New Online Tutorials for NCBI resources BioSystems and DCODE

22 February, 2010 (12:14) | OpenHelix News | By: Trey

Comprehensive tutorials on the publicly available NCBI resources BioSystems and DCODE enable researchers to quickly and effectively use these invaluable resources.

Seattle, WA (PRWEB) February 23, 2010 — OpenHelix today announced the availability of new tutorials on BioSystems and DCODE.

The NCBI BioSystems database provides easy access to information about all the molecules that interact in a biological system, including genes and proteins, small molecules in a pathway, as well as the genes biomarkers and drugs associated with diseases. DCODE is a public NCBI suite of tools for genomic alignments, multiple and pairwise sequence alignments, identifying gene regulatory elements and conserved transcription binding sites and more. These two tutorials, in conjunction with OpenHelix tutorials on OMIM, PubMed, Entrez Gene, BLAST, GEO and several others, give you a set of training resources that help you become efficient and effective using NCBI resources.

These two tutorials, in conjunction with OpenHelix tutorials on OMIM, PubMed, Entrez Gene, BLAST, GEO and several others, give you a set of training resources that help you become efficient and effective using NCBI resources.

The tutorial suites, available through an annual OpenHelix subscription, contain an online, narrated, multimedia tutorial, which runs in just about any browser connected to the web, along with slides with full script, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. The scripts, handouts and other materials can also be used as a reference or for training others.

These tutorials will teach users:

BioSystems

*to perform basic searches and to understand your results
*to link out to a variety of related information in many different fields
*about Entrez tools that you can use to customize your searches and results*
*how to access and customize full-size pathway images
*some advanced searching techniques, including Boolean queries and batch searching

DCODE

*how to conduct whole genome alignments and identify ECRs
*how to create interactive conservation profiles and phylogenetic trees
*how to identify proximal and distant regulatory elements in desired sequences
*how to identify conserved transcription factor binding sites

To find out more about these and over 85 other tutorial suites visit the OpenHelix Catalog and OpenHelix. Or visit the OpenHelix Blog for up-to-date information on genomics and genomics resources.

About OpenHelix
OpenHelix, LLC, (www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.

When databases crack you up…

18 January, 2010 (09:42) | General Science | By: Mary

If you are someone who’s spent a lot of time deep in the recesses of databases — deeper than the average end users — sometimes you find some really interesting  things.  Sometimes they are instructive, such as: hmm…I didn’t realized mice had a bone there until I was working the the anatomical hierarchy at Jax…  Sometimes they are creepy.  Buried in the MeSH hierarchy was about the most repulsive term I’d ever seen in a controlled vocabulary.  I complained about this to them probably 10 years ago, and just realized it doesn’t appear in MeSH 2010, finally.

But then there are other times when a database search leaves one ROFL.  That happened some time ago when I came across this odd tidbit in a search for gray hair genes. It generated some discussion among my sphere of colleagues about other funny things we’ve come across in the databases.

Well, there’s one whole blog dedicated to the pursuit of humor in NCBI’s PubMed.  I just found out from the #scio10 tweets from the ScienceOnline2010 meeting that they have found a new home on the Discover blogs collection!

NCBI ROFL: Hello, world! (again)

Congrats to them.  If you find you need to chuckle at the literature sometimes–or need a funny sample for a presentation perhaps, check them out at their new home. They also take suggestions. So if you find something in PubMed that cracks you up, send it along.

(re)Funding Databases II

16 November, 2009 (17:52) | Genomics Research, Genomics Resource News, New Resource | By: Trey

ResearchBlogging.orgSo, I wrote about defunding resources and briefly mentioned a paper in Database about funding (or ‘re’funding) databases and resources. I’d like to discuss this a bit further. The paper, by Chandras et. al, discusses how databases and, to use their term, Biological Resource Centers (BRCs) are to maintain financial viability.

Let me state first, I completely agree with their premise, that databases and resources have become imperative. The earlier model of “publication of experimental results and sharing of the reated research materials” needs to be extended. As they state:

It is however no longer adequate to share data through traditional modes of publication, and, particularly with high throughput (‘-omics) technologies, sharing of datasets requires submission to public databases as has long been the case with nucleic acid and protein sequence data.

The authors state, factually, that the financial model for most biological databases (we are talking the thousands that exist), has often been a 3-5 year development funding, that once runs out, the infrastructure needs to be supported by another source. In fact, this has lead to the defunding of databases such as TAIR and VBRC (and many others), excellent resources with irreplaceable data and tools, that then must struggle to find funding to maintain the considerable costs of funding infrastructure and continued development.

The demands of scientific research, open, shared data, require a funding model that maintains the publicly available nature of these databases. And thus the problem as they state:

If, for financial reasons, BRCs are unable to perform their tasks under conditions that meet the requirements of sceintfic research and the deamnds of industry, scientists will either see valuable information lost or being transferred into strictly commercial environment with at east two consequences: (i) blockade of access to this information and/or high costs and (ii) loss of data and potentioal for technology transfer for the foreseeable future. In either case the effect on both the scientific and broader community will be detrimental.

Again, I agree.

They discuss several possible solutions to maintaining the viability of publicly available databases including a private-public dual tier system where for-profits paid an annual fee and academic researchers have free access. They mention Uniprot, which underwent a funding crisis over a decade ago, as an example. Uniprot (then Swissprot) went back to complete public funding in 2002. There are still several other databases that are attempting to fund themselves by such a model. BioBase is one where several databases have been folded. TransFac is one. There is a free, reduced functionality, version that is available to academics through gene-regulation.com and the fuller version for a subscription at BioBase. This former version allows some data to be shared, as one could see at VISTA or UCSC. I am not privy to the financials of BioBase and other similar models, and I assume that will work for some, but I agree with the authors that many useful databases and resources would be hard-pressed to be maintained this way.

Other possibilities include fully  including databases under a single public institution funding mechanism. The many databases of NCBI and EBI fit this model. In fact, there is even a recent case of a resource being folded into this model at NCBI. Again, this works for some, but not all useful resources.

Most will have to find variable methods for funding their databases. Considering the importance of doing so, it is imperative that viable models are found. The authors reject, out of hand, advertising. As they mention, most advertisers will not be drawn to website advertising without a visibility of at least 10,000 visitors per month. There might be some truth to this (and I need to read the reference they cite that use to back that up).

But the next model they suggest seems to me to have the same drawback. In this model, the database or resource would have a ‘partnership of core competencies.’ An example they cite is MMdb (not to be confused with MMDB). This virtual mutant mouse repository provides direct trial links to Invitrogen from it’s gene information to the product page. They mention that though 6 companies were approached, only one responded. It would seem that this model has the same issues as directly selling advertising.

They also mention that, at least for their research community of mouse functional genomics, “Institutional Funding” seems the best solution for long-term viability and open access. Unfortunately, until institutions like NIH and EMBL are willing or able to fund these databases, I’m not sure that’s thats a solution.

As they mention in the paper, the rate of growth of the amounts and types of data that is being generated is exponential. I am not sure that government or institutional funding can financially keep up with housing the infrastructure needed to maintain and further develop these databases so that all the data generated can remain publicly and freely accessible.

Information is should be free, but unfortunately it is not without cost. It will be interesting to see how funding of databases and resources evolves in this fast growing genomics world (and imperative we figure out solutions).

PS: On a personal note, the authors use their resource, EMMA (European Mouse Mutant Archive), as an example in the paper. I like the name since it’s the name of my daughter, but it just goes to prove that names come in waves. We named our daughter thinking few would name their daughter the same. When even databases name the same name, you know that’s not the case.

Chandras, C., Weaver, T., Zouberakis, M., Smedley, D., Schughart, K., Rosenthal, N., Hancock, J., Kollias, G., Schofield, P., & Aidinis, V. (2009). Models for financial sustainability of biological databases and resources Database, 2009 DOI: 10.1093/database/bap017

BioGene: iPhone app for NCBI searches from MSKCC team

27 October, 2009 (16:39) | Genomics Resource News, New Resource | By: Mary

iphoneI can’t remember how I got on this email list–but I like it :)   Today I was notified that there was a handy iPhone app to quickly get gene info out of the NCBI resources.  I wish I had this last week at the ASHG meeting.  You know what happens: you catch a gene name or see a symbol in a talk, it’s just one of several on a slide…but you must know what that is right now!!  This handy-dandy quick interface will let you search for the symbol and links you to Entrez Gene info, which also links to references in PubMed.

I like it.  I expect to use it.  The first reviewer over in the iTunes store says it has already expanded their conversation.  I wish it also covered OMIM, but I haven’t used it too hard yet, maybe I’ll get there.  That also would have been a help last week.  I was hearing about a disease and I wanted some information.

Check out the MSKCC team page here for more details, and download it from the iTunes store (for free) if you like the sound of it.

BioGene: http://cbio.mskcc.org/tools/iphone_ipodtouch.html

For other iPhone apps we’ve come across, check out our earlier post on the iPhone and research.

(de)Funding Databases

20 October, 2009 (14:40) | Genomics Resource News | By: Trey


From Deepak Singh:

Scientists spend years collecting and generating increasing amounts data. The data ranges from raw instrument data, “finished” data (e.g. a

crisis_newbanner_correctsize1_flattenedgenome sequence which is constructed after aligning all the short reads from a next-gen sequencer), and annotated data, which has been marked up to add additional information. We have repositories where a lot of this data goes, RCSB, NCBI, etc. In many cases there is clarity in these

destinations and for the better part, resources like RCSB and NCBI are well funded and long lived (although I am always nervous about RCSB). However, many data repositories are dependent on funding, with no guarantees that the funding will be renewed. Given the size of some of these data resources, shouldn’t we be thinking of a more sustainable model for funding? This is a general problem for infrastructure resources, given the cost and the fact that you shouldn’t be looking at these from a 3-5 year perspective. This especially baffles me when libraries come into play. Shouldn’t the timescale there be in the 10’s of years?mndoci.com, The disconnect in funding data resources, Oct 2009

You should read the whole article.

A recent example of this is the arabidopsis resource, TAIR.

Click to continue reading “(de)Funding Databases”