Bioinformatics India

Bioinformatics India

Biology Meets Technology

Menu
  • Home
  • About
  • Blog
  • Repositiories
  • Contact

Scripts and Codes

Let us look at a few scripts I have made and how they function. (the links take you directly to github)

  • Script to create BIOM formatted files using almost any taxonomy assigner.
    • The script was designed to actually convert WIMP output provided by Oxford Nanopore's epi2me tool.
    • Butwith minor modifications it can use output of any taxonomy assigner to create biom files.
    • These biom files can be converted to HDF5 format and used in any tool that uses this format (e.g. QIIME, LEfSe, etc.)
    • It creates BIOM files for each taxonomical hierarchy, for each domain individually as well as combined.
    • Individual files can be used to asses diversity in individual domains:
      • Bacteria
      • Archea
      • Viruses
      • Eukaryota (Plants, Animals, Fungi, etc.)
    • One can also create a list of keywords which will be used to extract reads relating to that keywords as fasta files. e.g. if I want to extract all reads assigned to psuedomonas in a file to analyze them further, it can be done by adding keywords.
    • Script : https://github.com/mbshah/ncim-bioinfo/blob/master/virCodes/nanopore_epi2me_csv_parsev3.pl
  • QIIME wrapper scripts
    • Scripts that can be used to carryout basic QIIME analysis upto alpha and beta diversities
    • all step output and inputs are managed
    • takes paired end fastq files directly.
    • each step can be modified to your liking by just opening the file in notepad or and text editor
    • end_to_end_qiime.pl: https://github.com/mbshah/ncim-bioinfo/blob/master/end-to-end-qiime.pl
    • end_to_end_qiime_swarm.pl uses swarm otu algorithm, we found it to be better: https://github.com/mbshah/ncim-bioinfo/blob/master/end-to-end-qiime_swarm.pl
    • requires merge.pl from above.
  • Script to simplify download of sequences from NCBI:
    • The api to download sequences from NCBI can at times be unreliable and can end unexpectedly
    • So I wrote this script to make the download easier
    • It takes into account:
      • unstable internet connection: it will retry downloading the sequences until it can do so successfully in a batch of 1000
      • downloading from behind proxy: make appropriate changes to line 7 in the script
      • continuing from a failed download: if script ends due to power failure or any other reason, it can continue where it last left off
      • skipping results that are already present in the folder: the above component is also used in case you would like to update the results after particular times to include newer annotations.
    • Script was orignally designed to download only single whole genome sequence per taxonomy ID to create only viral whole genome blast DBS, this filtering can be turned off by changing line 13.
    • Script: https://github.com/mbshah/ncim-bioinfo/blob/master/virCodes/retrieve2.pl
    • Additionally also requires:
      • gb_utils_xml.pl to maipulate xml files: https://github.com/mbshah/ncim-bioinfo/blob/master/virCodes/gb_utils_xml.pl
      • dbmaker.pl to create blast db from the fasta files commenting out line number 160 will disable this blast maker step. https://github.com/mbshah/ncim-bioinfo/blob/master/virCodes/dbmaker.pl
  • KO2Path and Path2class
    • script to understand the output of tax4fun better.
    • more details can be found here: https://github.com/mbshah/metgenomics
    • also creates profiles.
  • Initial QC of raw reads
    • merges paired end reads
    • trims using trim_galore
    • requires pre-installation of trim galore.
    • merge.pl: https://github.com/mbshah/ncim-bioinfo/blob/master/merge.pl
  • Some more scripts including these can be found here: https://github.com/mbshah/ncim-bioinfo

Recent Posts

  • Bioinformatics – Prezi February 19, 2020
  • Deeper analysis of metagenomes using BIOM files. November 23, 2019
  • Whole Metagenome Shotgun Sequencing October 16, 2019
  • Amplicon Based Metagenomics September 20, 2019
  • Metagenomics September 8, 2019

Tags

amplicon bioinformatics history introduction metagenomics NGS Prezi seqeuecning statistics tools WGS whole metagenome metagenomics

Interesting Headlines

Bioinformatics Software Market Global Industry Analysis, Size, Share, Trends, and Forecast 2024 – BEE Tribune

Bioinformatics As A Career Option – NDTV

You Can Soon Get Your DNA Sequenced Anonymously -Wired

Follow Us

  • Facebook

Admin Tools

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy

Bioinformatics India 2021 . Powered by WordPress

Copyright © 2018 - 2019, bioinformatics-india.dev. All rights reserved
Privacy Policy

Log In