BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.
This week’s highlighted question tackles an issue that we’ll see more and more as environmental sampling and sequencing continues to yield new and interesting collections of sequences.
When trying to investigate a 16S rRNA dataset, I often identify several dozen/hundred species/families which are found in higher/lower abundances. I then start doing literature searches to see what they could be doing, where they have been observed before etc.
To me this sounds:
- Really selective, only sampling a few papers for each species.
- Limiting, as there is no way to do this fully for tens of species.
- Incredibly time consuming.
What I’m really looking for is a system which I can put a taxa list into and it’ll say “Those ones are all anoxic” or “Those 5 have shown denitrifying ability”. I don’t know if this could be done with literature mining or where to start this, or if there is a database around in the world which curates data like this…
Any suggestions are appreciated.
And I liked that a good answer is IMG. We’ve loved the IMG resources for years, and it always surprises me when I talk to researchers who could benefit from it but aren’t using it. Be sure to also note there’s a separate IMG/M for microbial communities. But if you have other help for this problem, bring the answer over to BioStar.