Skip to main navigation Skip to search Skip to main content

Exploring the Effectiveness of Feature Reduction and Kernel-Based Matching for Query-by-Example Spoken Term Detection using CNN

  • Manisha Naik Gaonkar
  • , Veena Thenkanidiyoor
  • , Dileep Aroor Dinesh
  • , H. Muralikrishna*
  • *Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Query-by-example spoken term detection (QbE-STD) refers to the search for an audio query in a repository of audio utterances. A common approach for QbE-STD involves computing a matching matrix between the feature representations of the query and the reference utterance and deciding the relevance of the reference utterance to the query based on the computed matching matrix. The time required to compute the matching matrix is crucial since a matching matrix must be computed between a query and every reference utterance. This time depends on the number of feature representations in the query and reference utterance. Feature reduction is a technique that reduces the number of feature representations to reduce the time required to compute a matching matrix. In this study, we propose to explore feature reduction in combination with kernel-based matching of reduced feature representation for query and reference utterances. We propose to decide the relevance of a reference utterance using a convolutional neural network (CNN) based classifier on the matching matrix. We demonstrate that the proposed approach not only results in a reduction in search time but also increases the accuracy of QbE-STD.

    Original languageEnglish
    Pages (from-to)194462-194474
    Number of pages13
    JournalIEEE Access
    Volume12
    DOIs
    Publication statusPublished - 2024

    All Science Journal Classification (ASJC) codes

    • General Computer Science
    • General Materials Science
    • General Engineering

    Fingerprint

    Dive into the research topics of 'Exploring the Effectiveness of Feature Reduction and Kernel-Based Matching for Query-by-Example Spoken Term Detection using CNN'. Together they form a unique fingerprint.

    Cite this