3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach

U. Vignesh, R. Parvathi

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

This work explains synthesis of protein structures based on the unsupervised learning method known as clustering. Protein structure prediction was performed for different crab and egg datasets with inputs collected from the Protein Data Bank (PDB ID: 3LIG, 2W3Z, 3ZVQ, 2KLR and 2YIZ). The three-dimensional protein structure was merged together with the filtering instances inbuilt in data mining techniques known as MergeSets. The problem description in this proposed methodology, referred to as attribute-related cluster sequence analysis, is to identify a good working algorithm for clustering of protein structures by comparing four existing algorithms: k-means, expectation maximization, farthest first and COBWEB. Experiments are conducted with the BioWeka data mining tool, Modeler 9.15 and the PyMOL tool with scripts using the Python programming language. This paper shows that the expectation maximization algorithm is the best for structured protein clustering, and this will also pave the way for identifying better algorithms for supervised learning methods.

Original languageEnglish
Pages (from-to)4287-4301
Number of pages15
JournalJournal of Supercomputing
Volume76
Issue number6
DOIs
Publication statusPublished - 01-06-2020

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of '3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach'. Together they form a unique fingerprint.

Cite this