TY - JOUR
T1 - Machine learning study of classifiers trained with biophysiochemical properties of amino acids to predict fibril forming peptide motifs
AU - Kumaran Nair, Smitha Sunil
AU - Subba Reddy, N. V.
AU - Hareesha, K. S.
PY - 2012/9
Y1 - 2012/9
N2 - It is important to understand the cause of amyloid illnesses by predicting the short protein fragments capable of forming amyloid-like fibril motifs aiding in the discovery of sequence-targeted anti-aggregation drugs. It is extremely desirable to design computational tools to provide affordable in silico predictions owing to the limitations of molecular techniques for their identification. In this research article, we tried to study, from a machine learning perspective, the performance of several machine learning classifiers that use heterogenous features based on biochemical and biophysical properties of amino acids to discriminate between amyloidogenic and non-amyloidogenic regions in peptides. Four conventional machine learning classifiers namely Support Vector Machine, Neural network, Decision tree and Random forest were trained and tested to find the best classifier that fits the problem domain well. Prior to classification, novel implementations of two biologically- inspired feature optimization techniques based on evolutionary algorithms and methodologies that mimic social life and a multivariate method based on projection are utilized in order to remove the unimportant and uninformative features. Among the dimenionality reduction algorithms considered under the study, prediction results show that algorithms based on evolutionary computation is the most effective. SVM best suits the problem domain in its fitment among the classifiers considered. The best classifier is also compared with an online predictor to evidence the equilibrium maintained between true positive rates and false positive rates in the proposed classifier. This exploratory study suggests that these methods are promising in providing amyloidogenity prediction and may be further extended for large-scale proteomic studies.
AB - It is important to understand the cause of amyloid illnesses by predicting the short protein fragments capable of forming amyloid-like fibril motifs aiding in the discovery of sequence-targeted anti-aggregation drugs. It is extremely desirable to design computational tools to provide affordable in silico predictions owing to the limitations of molecular techniques for their identification. In this research article, we tried to study, from a machine learning perspective, the performance of several machine learning classifiers that use heterogenous features based on biochemical and biophysical properties of amino acids to discriminate between amyloidogenic and non-amyloidogenic regions in peptides. Four conventional machine learning classifiers namely Support Vector Machine, Neural network, Decision tree and Random forest were trained and tested to find the best classifier that fits the problem domain well. Prior to classification, novel implementations of two biologically- inspired feature optimization techniques based on evolutionary algorithms and methodologies that mimic social life and a multivariate method based on projection are utilized in order to remove the unimportant and uninformative features. Among the dimenionality reduction algorithms considered under the study, prediction results show that algorithms based on evolutionary computation is the most effective. SVM best suits the problem domain in its fitment among the classifiers considered. The best classifier is also compared with an online predictor to evidence the equilibrium maintained between true positive rates and false positive rates in the proposed classifier. This exploratory study suggests that these methods are promising in providing amyloidogenity prediction and may be further extended for large-scale proteomic studies.
UR - http://www.scopus.com/inward/record.url?scp=84865800237&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865800237&partnerID=8YFLogxK
U2 - 10.2174/092986612802084429
DO - 10.2174/092986612802084429
M3 - Article
C2 - 22486618
AN - SCOPUS:84865800237
SN - 0929-8665
VL - 19
SP - 917
EP - 923
JO - Protein and Peptide Letters
JF - Protein and Peptide Letters
IS - 9
ER -