Spread of Asian/American lineage of Zika virus

Zika virus genomic data from Florida

Summary: Zika virus sequence data from the 2016 outbreak in Florida. Data here.


Sequences | Alignments | Trees | BAMs


Scripps IDPatient IDSample typeQuality score% CoverageLocation of infection
FL008U008Urine***90.07Puerto Rico
FL010U010Urine****97.69USA: Florida
FL016U016Urine***99.88Puerto Rico
FL021U021Urine***97.45USA: Florida
FL022U022Urine***97.77USA: Florida
FL030U030Urine***99.75USA: Florida
FL032U032Urine***99.75USA: Florida
FL036Se036Serum***99.73USA: Florida
FL038U038Urine***99.76USA: Florida
FL039U039Urine****99.75USA: Florida
Hu0015SaHu0015Saliva***99.75USA: Florida
FL01M7501Mosquito****99.75USA: Florida
FL02M7719Mosquito****99.74USA: Florida
FL03M7727Mosquito****99.83USA: Florida
FL04M16-10416Mosquito****99.72USA: Florida
FL05M16-3125Mosquito****99.74USA: Florida
FL06MMosquito****99.71USA: Florida
FL08Mpool#8Mosquito****97.70USA: Florida


**** Complete coding sequence, no apparent contamination in aligned bam file
*** Complete coding sequence, some contamination in aligned bam file (confident in consensus)
**  Partial coding sequence, no apparent contamination in aligned bam file
*   Partial coding sequence, some contamination in aligned bam file


  • Library construction was performed using an amplicon-based approach amplicon-based approach.
  • Sequencing was performed on the MiSeq.

Coding-complete genomes

  • Downloaded from NCBI, ViPR, and NextStrain. Only new genomes are added to later folders.
  • Consensus sequences sequenced in our lab are added to the root of the 'consensus_sequences' folder.


  • Created using MAFFT.
  • Trimmed to just contain the ORFs.

Maximum likelihood trees

  • Created using the fast algorithm in RAxML using 200 bootstraps.
  • Orange = travel-associated cases; Red = local transmissions.
  • We root trees on KX447517 from French Polynesia.

Time-scaled trees

  • Made with BEAST using the following parameters:
    • Uncorrelated relaxed clock with lognormal distribution, CTMC rate prior.
    • HKYΓ substitution model with codon partitioning.
    • SkyGrid, 50 parameters, time at last point = 4.
    • 30,000,000 states.

Disclaimer. Please note that this data is still based on work in progress and should be considered preliminary. If you intend to include any of these data in publications, please let us know – otherwise please feel free to download and use without restrictions. We have shared this data with the hope that people will download and use it, as well as scrutinize it so we can improve our methods and analyses. Please contact us if you have any questions or comments – we’ll buy beers for #ResearchParasites that spot flaws and faults in the data and come up with improvements!

Andersen Lab
The Scripps Research Institute
La Jolla, CA, USA
[email protected]

GitHub Commits




USAMRIID Zika virus sequences on GitHub
By: Jason Ladner, Mike Wiley, Gus Palacios

By: Richard Neher, Trevor Bedford

By: Andrew Rambaut

ZiBRA Project
By: Nick Loman and crew

ZEST – Zika Real-Time Experiments
By: Dave O’Connor



Main Collaborators

Florida Gulf Coast / Tulane
Lauren Paul
Amanda Tan
Scott Michael
Sharon Isern
Robert Garry
Florida Department of Health
Leah Gillis
Stephen White
Marshall Cone
Edgar Kopp
Kelly Hogan
Andrew Cannons
Mario Porcelli
Chalmers Vasquez
University of Miami
Diogo Magnani
David Watkins
Paola Lichtenberger
Mike Ricciardi
Varian Bailey
Jason Ladner
Mike Wiley
Gus Palacios
Broad Institute
Sabeti Lab
University of Birmingham
Josh Quick
Nick Loman
Oxford University
Julien Theze
Moritz Kraemer
Nuno Faria
Oliver Pybus
The Hutch / Edinburgh
Gytis Dudas
Trevor Bedford
Andrew Rambaut