Regression and Correlation Analysis of Different Interestingness Measures for Mining Association Rules
Keywords:
Association rules, correlation, interestingness measures, regression analysisAbstract
Association Rule Mining is the significant way to extract knowledge from data sets. The association among the instance of a dataset can measured with Interestingness Measures (IM) metrics. IM define how much interesting the extract knowledge is. Researchers have proved that the classical Support-Confidence metrics can’t extract the real knowledge and they have been proposing different IM. From a user perspective it’s really tough to select the minimal and best measures from them. From our experiment, the correlation among the various IM such as Support, Confidence, Lift, Cosine, Jaccard, Leverage etc. are evaluated in different popular data sets. In this paper our contribution is to find the correlation among the IM with different ranges in different types of data sets which were not applied in past researches. This study also identified that the correlation varies from data set to data set and proposed a solution based on multiple criterion that will help the users to select the minimal and best from a large number of IM.
Downloads
References
P. N. Tan, V. Kumar, and J. Srivastava, “Selecting the right objective measure for association analysis,” Inf. Syst., vol. 29, no. 4, pp. 293–313, 2004.
C. Tew, C. Giraud-Carrier, K. Tanner, and S. Burton, “Behavior-based clustering and analysis of interestingness measures for association rule mining,” Data Min. Knowl. Discov., vol. 28, no. 4, pp. 1004–1045, 2014.
R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items in large databases,” in ACM SIGMOD International Conference on Management of Data, 1993, pp. 207–216.
D. Martin, A. Rosete, J. Alcala-Fdez, and F. Herrera, “A new multiobjective evolutionary algorithm for mining a reduced set of interesting positive and negative quantitative association rules,” IEEE Trans. Evol. Comput., vol. 18, no. 1, pp. 54–69, 2014.
M. M. J. Kabir, S. Xu, B. H. Kang, and Z. Zhao, “A new evolutionary algorithm for extracting a reduced set of interesting association rules,” in 22nd International Conference On Neural Information
Processing, 2015, pp. 133–142.
M. M. J. Kabir, S. Xu, B. H. O. Kang, and Z. Zhao, “Association Rule Mining for Both Frequent and Infrequent Items Using Particle Swarm Optimization Algorithm,” Int. J. Comput. Sci. Eng., vol. 6, no. 07, pp. 221–231, 2014.
M. M. J. Kabir, S. Xu, B. H. Kang, and Z. Zhao, “Comparative analysis of genetic based approach and apriori algorithm for mining maximal frequent item sets,” in IEEE Congress on Evolutionary Computation, 2015, pp. 39–45.
M. M. J. Kabir, S. Xu, B. H. Kang, and Z. Zhao, “A new multiple seeds based genetic algorithm for discovering a set of interesting Boolean association rules,” Expert Syst. Appl., vol. 74, pp. 55–69, 2017.
G. Piatetsky-Shapiro, “Discovery, analysis, and presentation of strong rules,” in Knowledge Discovery in Databases, Menlo Park, Calif, USA: AAAI/MIT Press, 1991, pp. 229–248.
S. Birn, R. Motwani, J. . Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” in Proceeding of the ACM SIGMOD, 1997, pp. 255–264.
M. M. Mukaka, “Statistics corner: A guide to appropriate use of correlation coefficient in medical research,” Malawi Med. J., vol. 24, no. 3, pp. 69–71, 2012.