Overview

Summary

We are sequencing West Nile virus from California, with an emphasis on San Diego, Kern, and Sacramento/Yolo counties, to understand how 1) the virus spreads between regions, 2) is maintained locally between seasons, and 3) the factors that promote local outbreaks. Our goal is to generate thousands of new West Nile virus genomes from infected birds and mosquitoes. This research is part of the WestNile 4K Project.

Collaborations and data sources

The samples from San Diego county were provided by Nikos Garfield and Saran Grewal from the San Diego County Vector Control Program. The samples from all the other counties in California, including Sacramento-Yolo and Kern were provided by Ying Fang and Chris Barker from the Barker Lab, University of California, Davis and Sarah Wheeler from Sacramento-Yolo Mosquito and Vector Control Program.

Raw Data

The sequencing is being performed using an amplicon-based sequencing scheme using PrimalSeq. Our full protocol is available online here. Sequencing data is aligned using bwa and processed using iVar (Grubaguh et al. Genome Biology 2019).

Below is a table showing the count of sequenced genomes by county.

CountySequence Count
sandiego206
kern204
sacramento153
yolo42
losangeles28
fresno17
stanislaus13
butte10
kings9
tulare8
sanbernardino6
riverside6
contracosta5
yuba4
sutter4
merced4
shasta3
placer3
lake3
ventura2
solano2
madera2
alameda2
sanjoaquin1
lassen1
glenn1
calaveras1
Total740
Alignment statistics

Average depth and percent coverage of genome are available in a tsv file.

Alignment statistics

The following sequences with <50% coverage of the coding region weren't included in downstream analysis. They are highlighted in red in the figure above.

NameLength
Consensus_W052_L1_L2_L3_threshold_0_quality_204244
Consensus_W118_L1_threshold_0_quality_200
Consensus_W170_L1_threshold_0_quality_203526
Consensus_W251_L1_threshold_0_quality_201862
Consensus_W327_L1_threshold_0_quality_205042
Consensus_W329_L1_threshold_0_quality_204088
Consensus_W330_L1_threshold_0_quality_203255
Consensus_W336_L1_threshold_0_quality_201832
Consensus_W338_L1_threshold_0_quality_202305
Consensus_W341_L1_threshold_0_quality_20482
Consensus_W501_L1_L2_threshold_0_quality_204421
Consensus_W662_L1_L2_L3_threshold_0_quality_204665
Consensus_W804_L1_L2_threshold_0_quality_202260
Consensus_W805_L1_L2_threshold_0_quality_202937

Multiple sequence alignment

Alignment were performed using Mafft. The PHI test was used to test for recombination and RDP4 was used to narrow down sequences with potential contamination. These sequences are in consensus_sequences/contaminated_sequences.

NameMAXCHICHIMAERASISCAN
W162+-+
W301+++

Disclaimer. Please note that this data is still based on work in progress and should be considered preliminary. If you intend to include any of these data in publications, please let us know – otherwise please feel free to download and use without restrictions. We have shared this data with the hope that people will download and use it, as well as scrutinize it so we can improve our methods and analyses. Please contact us if you have any questions or comments – we’ll buy beers for #ResearchParasites that spot flaws and faults in the data and come up with improvements!


Andersen Lab
The Scripps Research Institute
La Jolla, CA, USA
data@andersen-lab.com

People

GitHub Commits

Data

Resources

Collaborators

Funding

NIH_CTSA
PEW