3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach

U. Vignesh*, R. Parvathi

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    7 Citations (Scopus)

    Abstract

    This work explains synthesis of protein structures based on the unsupervised learning method known as clustering. Protein structure prediction was performed for different crab and egg datasets with inputs collected from the Protein Data Bank (PDB ID: 3LIG, 2W3Z, 3ZVQ, 2KLR and 2YIZ). The three-dimensional protein structure was merged together with the filtering instances inbuilt in data mining techniques known as MergeSets. The problem description in this proposed methodology, referred to as attribute-related cluster sequence analysis, is to identify a good working algorithm for clustering of protein structures by comparing four existing algorithms: k-means, expectation maximization, farthest first and COBWEB. Experiments are conducted with the BioWeka data mining tool, Modeler 9.15 and the PyMOL tool with scripts using the Python programming language. This paper shows that the expectation maximization algorithm is the best for structured protein clustering, and this will also pave the way for identifying better algorithms for supervised learning methods.

    Original languageEnglish
    Pages (from-to)4287-4301
    Number of pages15
    JournalJournal of Supercomputing
    Volume76
    Issue number6
    DOIs
    Publication statusPublished - 01-06-2020

    All Science Journal Classification (ASJC) codes

    • Software
    • Theoretical Computer Science
    • Information Systems
    • Hardware and Architecture

    Fingerprint

    Dive into the research topics of '3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach'. Together they form a unique fingerprint.

    Cite this