TY - GEN
T1 - Association rule mining with modified apriori algorithm using top down approach
AU - Shah, Ashish
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/4/25
Y1 - 2017/4/25
N2 - Data Mining is a field of computer science that is concerned with extracting useful information from varied sources. In an era where information has become the inherent necessity of human beings, its increased relevance and usefulness has taken focus as need of the hour. The most important part of this association rule mining is the mining of item sets that are frequent. Market basket analysis is done by companies in order to retrieve itemsets that are frequent and often used together by customers. Apriori algorithm is a widely used technique in order to find those combinations of itemsets. However, when any of these frequent itemsets increases in length, the algorithm needs to pass through many iterations and, as a result, the performance drastically decreases. In this paper, we propose a modification to the apriori algorithm by using a hash function which divides the frequent item sets into buckets. Further, we propose a novel technique to be used in conjunction with the apriori algorithm by eliminating infrequent itemsets from the candidate set. In this top down approach, it finds the frequent itemsets without going through several iterations, thus saving time and space. By discovering a large maximal frequent itemset very early in the algorithm, all its subsets are also frequent hence we no longer need to scan them. Clearly, the proposed technique has an advantage over the existing apriori algorithm when the most frequent itemset's length is long.
AB - Data Mining is a field of computer science that is concerned with extracting useful information from varied sources. In an era where information has become the inherent necessity of human beings, its increased relevance and usefulness has taken focus as need of the hour. The most important part of this association rule mining is the mining of item sets that are frequent. Market basket analysis is done by companies in order to retrieve itemsets that are frequent and often used together by customers. Apriori algorithm is a widely used technique in order to find those combinations of itemsets. However, when any of these frequent itemsets increases in length, the algorithm needs to pass through many iterations and, as a result, the performance drastically decreases. In this paper, we propose a modification to the apriori algorithm by using a hash function which divides the frequent item sets into buckets. Further, we propose a novel technique to be used in conjunction with the apriori algorithm by eliminating infrequent itemsets from the candidate set. In this top down approach, it finds the frequent itemsets without going through several iterations, thus saving time and space. By discovering a large maximal frequent itemset very early in the algorithm, all its subsets are also frequent hence we no longer need to scan them. Clearly, the proposed technique has an advantage over the existing apriori algorithm when the most frequent itemset's length is long.
UR - https://www.scopus.com/pages/publications/85020196801
UR - https://www.scopus.com/pages/publications/85020196801#tab=citedBy
U2 - 10.1109/ICATCCT.2016.7912099
DO - 10.1109/ICATCCT.2016.7912099
M3 - Conference contribution
AN - SCOPUS:85020196801
T3 - Proceedings of the 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology, iCATccT 2016
SP - 747
EP - 752
BT - Proceedings of the 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology, iCATccT 2016
A2 - Niranjan, S.K.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International Conference on Applied and Theoretical Computing and Communication Technology, iCATccT 2016
Y2 - 21 July 2016 through 23 July 2016
ER -