We recently received plasma samples from Zika virus (ZIKV) patients in Colombia. We performed QC on the samples and unfortunately very few had detectable levels of ZIKV by qPCR. We extracted RNA from two of these patient samples (Z184 and Z186), as well as a positive control (seed stock of the Malaysian strain P6-740 passaged once on BHK-21 cells) kindly provided by Nathan Grubaugh and Greg Ebel at Colorado State.
- Z184, 42 year old female with fever, rash, joint pain, myalgia, eye pain, and cephalgia. Symptom onset December, 2015 [2 ZIKV reads]
- Z186, 33 year old male with fever, rash, joint pain, myalgia, eye pain, and cephalgia. Symptom onset December, 2015 [33 ZIKV reads]
- P6-740, positive control of Malaysian strain P6-740 passaged once on BHK-21 cells [20,729 ZIKV reads]
Total RNA was depleted for rRNA and sequenced on the Illumina MiSeq using previously published protocols with no specific amplification. A water-extraction control was also run. Reads were aligned to the P6-740 reference using Novoalign, duplicates removed using Picard, and realigned using GATK. Filters based on the water control were used to remove contaminants from the samples and Kraken was used to do metagenomic analyses on both the raw and filtered reads. Human reads have been removed from all raw files using bmtagger and SNAP.
Since the patient samples contain so little ZIKV material, this data probably isn’t all that helpful. However, we wanted to make this data available to the research community, so please feel free to download and use it as needed. Please note that the positive control has a lot of ZIKV reads and can be useful in tuning computational pipelines, etc.
Basic Insights from the data
Interestingly, the reads from Z184 and Z186 most closely match the Malaysian strain. Since we use this very strain as our positive control, one might suspect contamination, however, we have a couple of reasons that make us believe that this might not be the case,
- We do not have any ZIKV reads in our water-only control.
- The % identity between the reads from the patient samples and P6-740 is ~97%; taking Illumina errors into consideration, one would expect the % identity to be closer to 100% (typically we observe >99.5% identity in our Lassa and Ebola studies when we observe seed stock contaminations). Still, since we have so few reads we cannot make any firm conclusions and also cannot rule out contamination at this point in time. More sequencing as well as higher quality inbound samples will help us resolve these issues.
As always, please feel free to contact us directly if you have any questions or comments. We will continue to make data immediately available as it is generated.