BIOM files are the simplest way to see what is present in a sample, but rarely any researcher wants to stop there. You may want to see what is significant, how they relevant to our context, how they co-relate to other environmental factors, how they change with time and locations, etc. So lets looka at a few ways you can proceed after creating BIOM files.
What are BIOM files?
BIOM files are nothing but standardized profile files, and can hold either taxonomic or functional information. They are in JSON(BIOMv1) or HDF5(BIOMv2) format which are optimized to store large quantities of diverse data in structured and easily accessible way. They also come with their own toolkit to carryout functions like converting them summarizing them and adding various sample metadata. This meta data can include things like phyiscal properties, location, time, and anything else relevant to your study. These BIOM files can be used in many programs like QIIME, MEGAN, MG-RAST, STAMP, PAST, PICRUSt, MetaPhlAn, etc. directly or by minimal processing.
What can you do after you acquire a profile/BIOM?
I will use the word features instead of taxon/OTU/function because the context depends on whether you have a taxonomic profile, functional profile or both.
- Filtering, Summarizing, Normalizing: Lets start with basics, you may want to identify the most common feature in a biome, or maybe you want to see the most rare features, or maybe you just need to filter out statistically insignificant features. You can carry out any of these filtering steps on your profile using biom-format-tools. You can also use this toolkit to normalize and create summary tables that you can use in presentations and tools that need tabular format of data.
- Advanced Statistical Analysis: Statistical analysis is one of the most important part of any scientific research. There are number of packages that can help you dive into statistical analysis, the most popular of them is R+Bioconductor, but it requires some knowledge of R-scripting, there are alternatively packages like STAMP and PAST that help in making some types of analysis easy.
- QIIME: this is a no-brainer, with QIIME providing inbuilt packages to carryout basic alpha and beta diversity analysis with a single command “`core_diversity_analyses.py“`. Though the plots generated by QIIME1 were often notgood enough for publication, the following tools create plots that are of publication quality (even rarefaction and PCOA plots).
- STAMP: http://kiwi.cs.dal.ca/Software/STAMP : Allows you to run tests like ANNOVA, Kuskal-Wallis, Games-Howell, Tukey-Kramer, Storey’s FDR, t-test, Welch’s test, Fisher’s test among many others, depending on how many samples or groups you are testing. It has a intutive graphical user interface, and the only step where you might face trouble is loading profiles into the software, for which they have also given a script to make label hierarchy compatible with their software.
- PAST: https://folk.uio.no/ohammer/past/ : Palaeontologist Statistics software was initially desigend to help as the name suggests palaeontologist studies, but works great with metagenomic data, and has tools like CCA to help co-relate environmental factors to observed population diversity. It give very good publication quality graphs and plots, though it may require some experimenting due to its peculiar interface.
- R+Bioconductor: This is a mostly CLI based tools package which has sub-packages like microbiome, with functions specific to microbiome analysis and visualization. It has tools to carryout regression, ordination, association studies, bimodality studies, and community comparisons tools like limma, PERMANNOVA, negative binomials, etc.
- MicrobiomeAnalyst: https://www.microbiomeanalyst.ca/ : This is a web based one stop shop for carrying out analysis on microbiomes, and definitely worth checking out.
- Functional Prediction: If you have carried out amplicon based metagenomics, but want to have a peak into its functional diversity as well, then there are tools out there that can predict this as well. They use KEGG database to retrieve estimated profile of an taxonomic unit. Currently there are two tools that I know of:
- PICRUSt: http://picrust.github.io/picrust/ : It uses BIOM files generated by QIIME1 using green-genes database, to predict functional profile of the biome as a whole.
- Tax4Fun: http://tax4fun.gobics.de/ :This one also uses QIIME1 taxonomicc output, but supports SILVA database as well, to predict functional profile at gene level. Further I have created tools that can:
- KO2Path: Create pathways profile using tax4fun’s functional profile and
- Path2Class: Create sub-system level class profiles made by KO2Path
- These can found at: https://www.bioinformatics-india.dev/scripts-and-codes/
- Visualization: Apart from using the above listed tools for creating plots and graphs, one may also need to visualize just plain diverisity, this can be done using:
- QIIME: QIIME2 has its own plotting plugins like barplot to create beautiful interctive bar plots.
- Krona charts: These are the beautiful charts like here: https://www.bioinformatics-india.dev/mananbshah/G2DL-LA.out.krona.html?dataset=0&node=1&collapse=true&color=false&depth=7&font=11&key=true : This is only for representing diversity in one sample, and cannot be used to compare.
- CIRCOS: Circos is a complicated tool to learn, but enable the most beautiful diagrams, and is definitely worth looking into.