Sequence accuracy in primary databases: A case study on HIV-1B

Balaji Seetharaman, Akash Ramachandran, Krittika Nandy, Shapshak Paul

Research output: Chapter in Book/Report/Conference proceedingChapter

2 Citations (Scopus)


This chapter revisits the history of sequencing methods and their advancements. It mainly focuses on the accuracy of the deposited sequences in primary public databases. The source of errors, frequency, errors due to sequencing, and sequence assembly, and their quality are discussed. The quality of sequencing pipelines and error rates of the next-generation sequencing (NGS) data are reviewed. Some tools and techniques to overcome errors are also reviewed. Sequence uncertainties in primary public databases are addressed with reference to HIV-1B sequences. The sequence ambiguities are highlighted along with annotations based on the reference genome (HXB2). There are ambiguities in sequences produced by different sequencing technologies and it is very difficult to distinguish true variants from the errors. This alarms data collection efforts and inferences derived from error-prone DNA-sequencing technologies. Future studies should be cautious in handling such sequences especially on analyzing mutations to understand pathogenesis, drug resistance, and geographical variations.

Original languageEnglish
Title of host publicationGlobal Virology II - HIV and NeuroAIDS
PublisherSpringer New York
Number of pages44
ISBN (Electronic)9781493972906
ISBN (Print)9781493972883
Publication statusPublished - 01-01-2017

All Science Journal Classification (ASJC) codes

  • Medicine(all)
  • Immunology and Microbiology(all)
  • Neuroscience(all)


Dive into the research topics of 'Sequence accuracy in primary databases: A case study on HIV-1B'. Together they form a unique fingerprint.

Cite this