TY - GEN
T1 - Bench marking of classification algorithms
T2 - 2015 International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015
AU - Datla, Manish Varma
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/6/15
Y1 - 2016/6/15
N2 - Decision Trees and Random Forests are leading Machine Learning Algorithms, which are used for Classification purposes. Through the course of this paper, a comparison is made of classification results of these two algorithms, for classifying data sets obtained from Kaggle's Bike Sharing System and Titanic problems. The solution methodology deployed is primarily broken into two segments. First, being Feature Engineering where the given instance variables are made noise free and two or more variables are used together to give rise to a valuable third. Secondly, the classification parameters are worked out, consisting of correctly classified instances, incorrectly classified instances, Precision and Accuracy. This process ensured that the instance variables and classification parameters were best treated before they were deployed with the two algorithms i.e. Decision Trees and Random Forests. The developed model has been validated by using Systems data and the Classification results. From the model it can safely be concluded that for all classification problems Decision Trees is handy with small data sets i.e. less number of instances and Random Forests gives better results for the same number of attributes and large data sets i.e. with greater number of instances. R language has been used to solve the problem and to present the results.
AB - Decision Trees and Random Forests are leading Machine Learning Algorithms, which are used for Classification purposes. Through the course of this paper, a comparison is made of classification results of these two algorithms, for classifying data sets obtained from Kaggle's Bike Sharing System and Titanic problems. The solution methodology deployed is primarily broken into two segments. First, being Feature Engineering where the given instance variables are made noise free and two or more variables are used together to give rise to a valuable third. Secondly, the classification parameters are worked out, consisting of correctly classified instances, incorrectly classified instances, Precision and Accuracy. This process ensured that the instance variables and classification parameters were best treated before they were deployed with the two algorithms i.e. Decision Trees and Random Forests. The developed model has been validated by using Systems data and the Classification results. From the model it can safely be concluded that for all classification problems Decision Trees is handy with small data sets i.e. less number of instances and Random Forests gives better results for the same number of attributes and large data sets i.e. with greater number of instances. R language has been used to solve the problem and to present the results.
UR - https://www.scopus.com/pages/publications/84979282682
UR - https://www.scopus.com/inward/citedby.url?scp=84979282682&partnerID=8YFLogxK
U2 - 10.1109/ITACT.2015.7492647
DO - 10.1109/ITACT.2015.7492647
M3 - Conference contribution
AN - SCOPUS:84979282682
T3 - International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015
BT - International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015
A2 - Viswasanathan, C.
A2 - Deepti, A.R.
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 21 December 2015 through 22 December 2015
ER -