iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing. Additional tools for metagenomic sequencing are actively being incorporated into iVar. While each of these functions can be accomplished using existing tools, iVar contains an intersection of functionality from multiple tools that are required to call iSNVs and consensus sequences from viral sequencing data across multiple replicates. We implemented the following functions in iVar: (1) trimming of primers and low-quality bases, (2) consensus calling, (3) variant calling – both iSNVs and insertions/deletions, and (4) identifying mismatches to primer sequences and excluding the corresponding reads from alignment files.
Freyja is a tool to estimate relative abundance of SARS-CoV-2 lineages from sequencing of mixed-lineage virus samples, like wastewater. Freyja builds on iVar and is composed of two main steps: (1) SNV frequency estimation and (2) depth-weighted demixing using constrained least absolute deviation regression. Additional post-processing methods are available for output aggregation and visualization.
Bjorn is a pipeline to count mutations from a given set of genomes in a parallelized manner. The pipeline is currently used by outbreak.info to count substitutions and deletions in all the SARS-CoV-2 genomes (over 4 million as of Oct, 2021) available on GISAID. The pipeline consists of the following steps: (1) Download SARS-CoV-2 genomes via the GISAID API (2) Divide sequences into chunks of 10,000 and run downstream steps in parallel, (3) Align these sequences using minimap2 (Li, 2018), (4) Convert the alignment into a FASTA file using datafunk, (5) Count substitutions and deletions from this alignment, (6) standardize and filter the metadata: country, division, location (using shapefiles from GADM), PAGNO lineage, date of collection, and date of submission and (6) Combine results from all chunks and convert to a JSONL object.The final JSON object can be loaded into a database such as ElasticSearch.
Floki is a tool(under active development) that can be used to visualize and interpret metagenomic data using the hierarchical taxonomy structure. It can be used to identify and filter out contaminants. It can also be used to visualize batch effects by grouping multiple samples together. Annotations from other sources at every taxon level can be combined with sequencing results to narrow down the search space for taxons of interest.
An R package to access outbreak.info API. The package includes functions that allow users to easily retrieve data from the API for downstream analysis and visualization. Users can retrieve data by specifying administrative level (World Bank region, country, state/province, metropolitan area, county), location name(s), or by constructing a custom query with additional parameters. The package also allows users to directly plot metrics of interest for the specified locations. The api also includes the geometric features of each queried location allowing users to quickly create maps to visualize epidemiological data.