TY - JOUR
T1 - 3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach
AU - Vignesh, U.
AU - Parvathi, R.
N1 - Publisher Copyright:
© 2018, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2020/6/1
Y1 - 2020/6/1
N2 - This work explains synthesis of protein structures based on the unsupervised learning method known as clustering. Protein structure prediction was performed for different crab and egg datasets with inputs collected from the Protein Data Bank (PDB ID: 3LIG, 2W3Z, 3ZVQ, 2KLR and 2YIZ). The three-dimensional protein structure was merged together with the filtering instances inbuilt in data mining techniques known as MergeSets. The problem description in this proposed methodology, referred to as attribute-related cluster sequence analysis, is to identify a good working algorithm for clustering of protein structures by comparing four existing algorithms: k-means, expectation maximization, farthest first and COBWEB. Experiments are conducted with the BioWeka data mining tool, Modeler 9.15 and the PyMOL tool with scripts using the Python programming language. This paper shows that the expectation maximization algorithm is the best for structured protein clustering, and this will also pave the way for identifying better algorithms for supervised learning methods.
AB - This work explains synthesis of protein structures based on the unsupervised learning method known as clustering. Protein structure prediction was performed for different crab and egg datasets with inputs collected from the Protein Data Bank (PDB ID: 3LIG, 2W3Z, 3ZVQ, 2KLR and 2YIZ). The three-dimensional protein structure was merged together with the filtering instances inbuilt in data mining techniques known as MergeSets. The problem description in this proposed methodology, referred to as attribute-related cluster sequence analysis, is to identify a good working algorithm for clustering of protein structures by comparing four existing algorithms: k-means, expectation maximization, farthest first and COBWEB. Experiments are conducted with the BioWeka data mining tool, Modeler 9.15 and the PyMOL tool with scripts using the Python programming language. This paper shows that the expectation maximization algorithm is the best for structured protein clustering, and this will also pave the way for identifying better algorithms for supervised learning methods.
UR - http://www.scopus.com/inward/record.url?scp=85045049550&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045049550&partnerID=8YFLogxK
U2 - 10.1007/s11227-018-2319-4
DO - 10.1007/s11227-018-2319-4
M3 - Article
AN - SCOPUS:85045049550
SN - 0920-8542
VL - 76
SP - 4287
EP - 4301
JO - Journal of Supercomputing
JF - Journal of Supercomputing
IS - 6
ER -