iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing. Additional tools for metagenomic sequencing are actively being incorporated into iVar. While each of these functions can be accomplished using existing tools, iVar contains an intersection of functionality from multiple tools that are required to call iSNVs and consensus sequences from viral sequencing data across multiple replicates. We implemented the following functions in iVar: (1) trimming of primers and low-quality bases, (2) consensus calling, (3) variant calling – both iSNVs and insertions/deletions, and (4) identifying mismatches to primer sequences and excluding the corresponding reads from alignment files.
Freyja is a tool to estimate relative abundance of SARS-CoV-2 and other viral lineages from sequencing of mixed-lineage virus samples, like wastewater. Freyja builds on iVar and is composed of two main steps: (1) SNV frequency estimation and (2) depth-weighted demixing using constrained least absolute deviation regression. Additional post-processing methods are available for output aggregation and visualization.
Bygul is a flexible and powerful amplicon read simulation tool designed to generate synthetic sequencing data from a wide range of pathogens, including but not limited to viral and bacterial genomes. It allows users to simulate complex mixtures in silico by providing a custom primer file and one or more reference FASTA files, with optional user-defined proportions for each sample. Bygul supports two read simulators: Wgsim for fast, simple simulations and Mason for more advanced, realistic sequencing scenarios. Users can easily configure parameters for each simulator to match their experimental needs. A key feature of Bygul is its generation of an amplicon statistics file, which reports the binding coordinates of each primer pair, whether amplification was successful, and details of the resulting amplicons.
Bjorn is a pipeline to count mutations from a given set of genomes in a parallelized manner. The pipeline is currently used by outbreak.info to count substitutions and deletions in all the SARS-CoV-2 genomes (over 4 million as of Oct, 2021) available on GISAID. The pipeline consists of the following steps: (1) Download SARS-CoV-2 genomes via the GISAID API (2) Divide sequences into chunks of 10,000 and run downstream steps in parallel, (3) Align these sequences using minimap2 (Li, 2018), (4) Convert the alignment into a FASTA file using datafunk, (5) Count substitutions and deletions from this alignment, (6) standardize and filter the metadata: country, division, location (using shapefiles from GADM), PAGNO lineage, date of collection, and date of submission and (6) Combine results from all chunks and convert to a JSONL object.The final JSON object can be loaded into a database such as ElasticSearch.
Floki is a tool (currently, stalled development) that can be used to visualize and interpret metagenomic data using the hierarchical taxonomy structure. It can be used to identify and filter out contaminants. It can also be used to visualize batch effects by grouping multiple samples together. Annotations from other sources at every taxon level can be combined with sequencing results to narrow down the search space for taxons of interest.
Outbreak.info is a platform to discover and explore COVID-19 data and SARS-CoV-2 variants. Our Variant Reports allow researchers to track and compare emerging or known variant using customizable visualizations, enabling near real-time genomic surveillance. Our Epidemiology Tools allow users to explore how COVID-19 cases and deaths changed across locations during the early phases of the pandemic. We also host a Research Library, which indexes publications, preprints, clinical trials, and more.
The SEARCH Alliance is providing real-time analyses on the COVID-19 situation in San Diego, using online analytics and visualizations from wastewater and clinical genomic data streams.