Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences

Mehmet Erten, Madhav R. Acharya, Aditya P. Kamath, Niranjana Sampathila, G. Muralidhar Bairy, Emrah Aydemir, Prabal Datta Barua, Mehmet Baygin, Ilknur Tuncer, Sengul Dogan, Turker Tuncer

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)


SARS-CoV-2 and Influenza-A can present similar symptoms. Computer-aided diagnosis can help facilitate screening for the two conditions, and may be especially relevant and useful in the current COVID-19 pandemic because seasonal Influenza-A infection can still occur. We have developed a novel text-based classification model for discriminating between the two conditions using protein sequences of varying lengths. We downloaded viral protein sequences of SARS-CoV-2 and Influenza-A with varying lengths (all 100 or greater) from the NCBI database and randomly selected 16,901 SARS-CoV-2 and 19,523 Influenza-A sequences to form a two-class study dataset. We used a new feature extraction function based on a unique pattern, HamletPat, generated from the text of Shakespeare’s Hamlet, and a signum function to extract local binary pattern-like bits from overlapping fixed-length (27) blocks of the protein sequences. The bits were converted to decimal map signals from which histograms were extracted and concatenated to form a final feature vector of length 1280. The iterative Chi-square function selected the 340 most discriminative features to feed to an SVM with a Gaussian kernel for classification. The model attained 99.92% and 99.87% classification accuracy rates using hold-out (75:25 split ratio) and five-fold cross-validations, respectively. The excellent performance of the lightweight, handcrafted HamletPat-based classification model suggests that it can be a valuable tool for screening protein sequences to discriminate between SARS-CoV-2 and Influenza-A infections.

Original languageEnglish
Article number3181
Issue number12
Publication statusPublished - 12-2022

All Science Journal Classification (ASJC) codes

  • Clinical Biochemistry


Dive into the research topics of 'Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences'. Together they form a unique fingerprint.

Cite this