TY - GEN
T1 - Query Quality Prediction on Source Code Base Dataset
T2 - 7th International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018
AU - Swathi, B. P.
AU - Muniyal, Balachandra
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/30
Y1 - 2018/11/30
N2 - Source code retrieval is a task under text retrieval which is performed by software developers regularly. The existing source code retrieval approaches are regular expression based and anticipate that the software developer querying the code base has an extensive acquaintance with the source code. Unlike keyword or regular expression based source code search which are difficult to remember, software developers should be able to query the code base in a sentential form. Although, performance of the search on text widely depends upon query quality, it succeeds when the quality of the textual query is high. Query quality prediction ahead of query execution on a source code retrieval system will save developers time and effort by notifying him/her when a query is unlikely to perform. This paper assesses the performance of prominent classification algorithms namely Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boosted Tree (GBT) and Decision Tree (DT) to predict the query quality on a data set created from the documentation of the source code files. Experimental results using benchmark open source projects data set demonstrates that Gradient Boosted Tree performs better than others in comparison.
AB - Source code retrieval is a task under text retrieval which is performed by software developers regularly. The existing source code retrieval approaches are regular expression based and anticipate that the software developer querying the code base has an extensive acquaintance with the source code. Unlike keyword or regular expression based source code search which are difficult to remember, software developers should be able to query the code base in a sentential form. Although, performance of the search on text widely depends upon query quality, it succeeds when the quality of the textual query is high. Query quality prediction ahead of query execution on a source code retrieval system will save developers time and effort by notifying him/her when a query is unlikely to perform. This paper assesses the performance of prominent classification algorithms namely Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boosted Tree (GBT) and Decision Tree (DT) to predict the query quality on a data set created from the documentation of the source code files. Experimental results using benchmark open source projects data set demonstrates that Gradient Boosted Tree performs better than others in comparison.
UR - https://www.scopus.com/pages/publications/85060062917
UR - https://www.scopus.com/pages/publications/85060062917#tab=citedBy
U2 - 10.1109/ICACCI.2018.8554602
DO - 10.1109/ICACCI.2018.8554602
M3 - Conference contribution
AN - SCOPUS:85060062917
T3 - 2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018
SP - 1115
EP - 1119
BT - 2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 September 2018 through 22 September 2018
ER -