TY - GEN
T1 - A novel data structure for efficient representation of large data sets in data mining
AU - Pai, Radhika M.
AU - Ananthanarayana, V. S.
PY - 2006
Y1 - 2006
N2 - An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper, we propose a novel data structure called Prefix-Postfix structure(PP-structure), which is an abstraction of the data that can be built by scanning the database only once. We prove that this structure is compact, complete and incremental and therefore is suitable to represent dynamic databases. Further, we propose a clustering algorithm using this structure. The proposed algorithm is tested on different real world datasets and is shown that the algorithm is both space efficient and time efficient for large datasets without sacrificing for the accuracy. We compare our algorithm with other algorithms and show the effectiveness of our algorithm.
AB - An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper, we propose a novel data structure called Prefix-Postfix structure(PP-structure), which is an abstraction of the data that can be built by scanning the database only once. We prove that this structure is compact, complete and incremental and therefore is suitable to represent dynamic databases. Further, we propose a clustering algorithm using this structure. The proposed algorithm is tested on different real world datasets and is shown that the algorithm is both space efficient and time efficient for large datasets without sacrificing for the accuracy. We compare our algorithm with other algorithms and show the effectiveness of our algorithm.
UR - https://www.scopus.com/pages/publications/38149107420
UR - https://www.scopus.com/pages/publications/38149107420#tab=citedBy
U2 - 10.1109/ADCOM.2006.4289952
DO - 10.1109/ADCOM.2006.4289952
M3 - Conference contribution
AN - SCOPUS:38149107420
SN - 142440715X
SN - 9781424407156
T3 - Proceedings - 2006 14th International Conference on Advanced Computing and Communications, ADCOM 2006
SP - 547
EP - 552
BT - Proceedings - 2006 14th International Conference on Advanced Computing and Communications, ADCOM 2006
T2 - 14th International Conference on Advanced Computing and Communications, ADCOM 2006
Y2 - 20 December 2006 through 23 December 2006
ER -