When you pick up 1 gram of garden soil or even 1ml of sea water there are millions of little microbes in that small sample. Each of those microorganism has its own DNA, and its own tiny little organelles, and its own machinery of protection, growing, spreading, moving, multiplying, etc. It is estimated that out of those millions we know about only a handful of organisms, because to know about these organisms we need to be able to grow them, and we are not able to grow them because we do not know how. Yes that’s right, it is estimated that there are over a trillion different microbial species on earth and we have cultivation knowledge of only 1% of them (Cardenas and Agugliaro 2017).
Studying microbial property of any sample thus became skewed and biased towards only the organisms that we know how to culture. That was until the rise of Metagenomics, which not only allowed us to study the known but also opened the door to the vastly unknown side of microbial universe. The idea of directly using environmental DNA first came about with Pace et. al. in 1985, but it was not until 1998 when the word ‘metagenomics’ was coined by Handelsman et. al. that Metagenomics was actually born.
What is Metagenomics?
Metagenomics is the Idea that we can learn about the microbial diversity and function of all the microbes in a sample by avoiding the need to culture each individual microorganism, instead the whole sample is subjected to DNA extraction protocol without, and the resultant DNA is sequenced. by using High Throughput Sequencing platforms like 454, Illumina or Ion Torrent.
The Idea of metagenomics was helped by two more technological advancements in molecular biology, PCR and NGS. As these technologies were more and more refined, metagenomics started getting more and more attention, and with introduction of the 3rd generation of sequencing (long-read sequencing) along with decreasing costs of sequencing, metagenomics came into reach of every researcher.
Metagenomics can be broadly divided into two types:
Amplicon based metagenomics
Here we amplify marker DNA like 16S, 18S, or ITR regions using PCR and then sequence the PCR product using any High Throughput Sequencing platform. This enables us to study the diversity and presence of microorganisms that have these marker genes, i.e. Bacteria, Archea, and Fungi. These marker genes are known to have a sequence that are conserved enough to be able to be multiplied by PCR but each taxa is said to have its own unique sequences, there are multiple databases that store the sequence of each taxonomic unit, eg. SILVA, greengenes, NCBI-16S, etc. These databases are easily available for analysis using BLAST or QIIME.
Whole Metagenome Shotgun
Here we do not introduce any PCR bias, and sequence the whole sample as it is. This technique is more useful to study not only taxonomy but also functional diversity which was not possible using amplicon based techniques. The added advantage of this technique is that we can study diversity of microorganisms that do not have such marker genes (Viruses) and organisms whose marker gene sequences are not available in databases. We usually use NCBI-nr or NCBI-nt databases for these studies.
We will look a bit more deeper into both these types of metagenomics, how they are carried out, what computational tools are required, some tips on steps and other things in future posts.