What’s the Answer? (unpublished stuff goes public)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s question was unusual–it wasn’t bioinformatics directly. But it was about data security issues. Although framed in a way that makes the questioner fear of getting scooped, I think it also could be an issue of patient data that’s not supposed to be public as well.

Question: How easy would be for google to find unpublished interesting research results?

I’m a bit worried about the metadata that we generate daily using services as google. Since we, researchers, put a lot of effort to get interesting results, it is totally unfair for services like google to have this fresh and clean data ready-to-use, but at the same time it could be very difficult to achieve some of this results without google aid.

What do you think about this new phenomenon in biological research? Do you use google? Do you use alternative tools to hide your findings to metadata mining?Imagine you are working to get a vaccine for an important bacterial pathogen and google knows there is a need for this vaccine, so he can patent it and get money. With the appropriate mining tools, it would be extremely easy to do it.Imagine you have a vaccine candidate and want to patent it. In your email you will have keywords like:


-the name of the pathogen, for example AIDS, Salmonella, Haemophilus…


From all the email accounts google has, he could filter yours and go further. So now he knows you have a ‘vaccine’ for the pathogen ‘Salmonella’ and want to ‘patent’.

What remains here for google to patent:

-the gene locus tag?

What I mean is that with this simple argumentation, if google discovers that there is a link between ‘locus tag’, ‘vaccine’, ‘Salmonella’ and ‘patent’ he will only had to test this new vaccine candidate to see if protects and patent it. He would have saved years of research investment.

Too paranoid? Right?

Do you think this is not probable? Would you try to do this if you worked for google, had the required knowledge and access to all email accounts?

Thanks for your help. Please let me know if you find the question not appropriate in this forum.


The story of the CAPRI competition in the answers is very interesting….

As I also noted in the answers, I can remember a couple of discussions about things that weren’t supposed to be in Google Scholar turning up there. But I could swear someone in my twitter circle had an even more egregious example in the last couple of months. It was either a paper draft with heavy internal discussion notations, or a patent filing in preparation. But I can’t exactly remember now what that one was. Any ideas? I think it was chatter among the genomics/plant folks.