Query Quality Prediction on Source Code Base Dataset: A Comparative Study

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Source code retrieval is a task under text retrieval which is performed by software developers regularly. The existing source code retrieval approaches are regular expression based and anticipate that the software developer querying the code base has an extensive acquaintance with the source code. Unlike keyword or regular expression based source code search which are difficult to remember, software developers should be able to query the code base in a sentential form. Although, performance of the search on text widely depends upon query quality, it succeeds when the quality of the textual query is high. Query quality prediction ahead of query execution on a source code retrieval system will save developers time and effort by notifying him/her when a query is unlikely to perform. This paper assesses the performance of prominent classification algorithms namely Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boosted Tree (GBT) and Decision Tree (DT) to predict the query quality on a data set created from the documentation of the source code files. Experimental results using benchmark open source projects data set demonstrates that Gradient Boosted Tree performs better than others in comparison.

Original languageEnglish
Title of host publication2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1115-1119
Number of pages5
ISBN (Electronic)9781538653142
DOIs
Publication statusPublished - 30-11-2018
Event7th International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018 - Bangalore, India
Duration: 19-09-201822-09-2018

Conference

Conference7th International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018
Country/TerritoryIndia
CityBangalore
Period19-09-1822-09-18

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Query Quality Prediction on Source Code Base Dataset: A Comparative Study'. Together they form a unique fingerprint.

Cite this