Mining high utility itemsets with time-aware scheduling using Apache Spark

Anup Brahmavar, Harish Venkatarama, Geetha Maiya

Research output: Contribution to journalArticlepeer-review

Abstract

Since the last decade, Market Basket Analysis has been propelled by augmentation of revenue information. Termed as high utility itemset mining (HUIM), this task considers the factors of purchase quantity and unit profit of the items in the transaction database. Although several sequential algorithms to mine HUIs exist, their performance degrades as the database becomes voluminous. Distributed computing solutions such as Apache Hadoop and Apache Spark have proven effective in alleviating this bottleneck. In this regard, the current study develops a parallel workflow to adapt a single-phase tree-based algorithm called the single phase utility computation (SPUC) algorithm on a Spark cluster. Based on the time taken to mine individual conditional pattern bases in SPUC, an assignment strategy that partitions the search space across the cluster is proposed in parallel SPUC (PSPUC) algorithm. Experimental evaluation conducted using real and synthetic datasets demonstrate that PSPUC outperforms PHUI-Growth algorithm. Apart from this, PSPUC in conjunction with the time-aware assignment strategy converges mining faster than a random assignment of items. A linear speedup of PSPUC is also demonstrated.

Original languageEnglish
Article numbere7192
JournalConcurrency and Computation: Practice and Experience
Volume34
Issue number23
DOIs
Publication statusPublished - 25-10-2022

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Mining high utility itemsets with time-aware scheduling using Apache Spark'. Together they form a unique fingerprint.

Cite this