Today’s tip is on Genomicus. Genomicus is a great tool to visualize gene duplication, synteny and genome evolution. The search and display interfaces are quite straightforward, and there are lots of great features (viewing ancestral gene information, links out to resources, different views of phylogenies, etc) in the tool. This video is only a short introduction. You can delve deeper into the tool with the help and documentation, including an 11 minute video.
Muffato, M., Louis, A., Poisnel, C., & Roest Crollius, H. (2010). Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes Bioinformatics DOI: 10.1093/bioinformatics/btq079
You will also notice today the video is a SciVee embed. We are trying out a new way to post and share our tips. SciVee allows us to not only post on our blog, but for you to share the tip with others and also for scientists in the SciVee community to view the tips. This is only a test. We will be working with this for the next couple weeks to find the best way to post and share. Eventually, soon, we hope to share these on Facebook and Youtube also. If the video is not high enough quality for you (SciVee and other video sharing sites by necessity reduce size, you can try out the entire mpeg4 version a this link.
There are a lot of research papers out there, more than ever. Along with the good news (increasing knowledge), comes some bad news: increasing duplication and plagiarism, more often than not going undetected. The developers of eTBLAST, which is a great tool we’ve had a tip on before, have created another tool using an eTBLAST search of Medline and other databases to find highly similar citations: Deja Vu.
These similar citations could be legitimate; a review of a previous article, an author using similar wording of an abstract from a previous paper for new research (the eTBLAST search can only search titles and abstracts), sanctioned duplications, etc, etc. as the author of the post “Deja Boo” points out. There are some real instances of duplications (authors attempting to pad their CVs) and plagiarism (stealing words and research). An earlier example (before Deja Vu) found at Panda’s Thumb is of a creationist attempting to pad a CV and look more legitimate. Errami and Gardner (two of the developers of the tool) published a paper in Nature earlier this year with many such instances of (and another in Science, reported on here with some interesting discussion) duplication and plagiarism.
Still, the database needs to be viewed with caution. Of the 74,792 ‘highly similar and duplicate citations’ found, 92% have not be verified. Of the 8% left that have been verified (this has to be done by manual curation), 65% have been found to be probably legitimate (as stated above) and 35% to be duplicates. But even the duplicates aren’t necessarily nefarious. Since full texts are not available, it is often the case that the duplication might be perfectly understandable (reusing an abstract with some minor changes for new research, etc). Still, it is a tool that, with some work, can help tremendously in that search for true duplicates and plagiarism, and perhaps even just the threat of it might lower the instances? :D
So, with that in mind, this week’s tip of the week is a quick view of “Deja Vu.”
Looking at the NHGRI News feed recently, I noticed this story (below) about a new genomic data collection that intrigued me. I found out about a new resource that I wanted to share as this week’s Tip of the Week. So this ~4 minute movie discusses my path to the Human Genome Structural Variation resource and a quick look at some of the data. But the paper was so influential on my thinking about the genome that I wanted to cover that in more detail in text form as well. So for a quick hit, watch the movie. For more detail, check out the text and links below. Quick trip to the database: http://hgsv.washington.edu
….Other recently created maps, such as the HapMap, have catalogued the patterns of small-scale variations in the genome that involve single DNA letters, or bases. However, the scientific community has been eagerly awaiting the creation of additional types of maps in light of findings that larger scale differences account for a great deal of the common genetic variation among individuals and between populations, and may account for a significant fraction of disease. While previous work has identified structural variation in the human genome, a sequence-based map provides much finer resolution and location information….
I spend a lot of time thinking about the official or “reference” human genome sequence. This sequence–the one that was released to all that fanfare a few years back–is a composite of several people. Rather like a “generic” genome.
….The discovery, reported on-line in the New England Journal of Medicine this afternoon, stems from the most extensive genome scanning for autism done so far. The scans found that in just over 1 percent of people with autism, a chunk of about 25 genes had been either duplicated or deleted, mainly in spontaneous mutations not carried by their parents….
The team was led by Mark Daly. I saw him recently at MGH giving a talk to students on genome-wide association studies. He is also responsible for developing great software, including HaploView and other tools. We have used HaploView to examine the HapMap data–you can pull data from the HapMap browser and load it right in to HaploView for a more detailed look at it.