Transparent, Low Resource and Context-Aware Information Retrieval from a Closed Domain Knowledge Base

Shubham Rateria, Sanjay Singh

Research output: Contribution to journalArticlepeer-review

Abstract

In large-scale enterprises, vast amounts of textual information are shared across corporate repositories and intranet websites. Traditional search techniques that lack context sensitivity, often fail to retrieve pertinent data efficiently. Modern techniques that use a distributed representation of words require a considerable training dataset and computation, thereby presenting financial and operational burdens. Generative models for information search suffer from problems of transparency and hallucination, which can be detrimental, especially for organizations and their stakeholders who rely on these results for critical business operations. This paper presents a non-goal oriented conversational agent based on a collection of finite state machines and an information search model for text search from an extensive collection of stored corporate documents and intranet websites. We used a distributed representation of words derived from the BERT model, which allows for contextual searching. We minimally fine-tuned a BERT model on a multi-label text classification task specific to a closed-domain knowledge base. Based on DCG metrics, our information retrieval model using distributed embeddings from the minimally trained BERT model and Word Movers Distance for calculating topic similarity is more relevant to user queries than BERT embeddings with cosine similarity and BM25. Our architecture promises to significantly expedite and improve the accuracy of information retrieval in closed-domain systems without the need for a massive training dataset or expensive computing while maintaining transparency.

Original languageEnglish
Pages (from-to)44233-44243
Number of pages11
JournalIEEE Access
Volume12
DOIs
Publication statusPublished - 2024

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Transparent, Low Resource and Context-Aware Information Retrieval from a Closed Domain Knowledge Base'. Together they form a unique fingerprint.

Cite this