現在位置首頁 > 博碩士論文 > 詳目
論文中文名稱:基於MapReduce平行運算之多重項目支持度挖掘頻繁集模式 [以論文名稱查詢館藏系統]
論文英文名稱:MapReduce-Based Frequent Pattern Mining Framework with Multiple Item Support [以論文名稱查詢館藏系統]
院校名稱:臺北科技大學
學院名稱:管理學院
系所名稱:資訊與財金管理系碩士班
畢業學年度:105
畢業學期:第二學期
出版年度:106
中文姓名:張瑞岩
英文姓名:Jui-Yen Chang
研究生學號:104AB8013
學位類別:碩士
語文別:中文
口試日期:2017/06/15
論文頁數:82
指導教授中文名:王貞淑
指導教授英文名:Chen-Shu Wang
口試委員中文名:蕭文龍;丁一賢
口試委員英文名:Wen-Lung Shiau;I-Hsien Ting
中文關鍵詞:關聯規則多重項目支持度
英文關鍵詞:Association RulesHadoop MapReduceMultiple Item Support
論文中文摘要:身處於資訊爆炸的網路知識世代,每天不斷面臨大規模數據所帶來的挑戰與衝擊。而資料挖掘之主要重點研究工作,即是從巨量雜亂且沒有規則的資料中,有效地提取富有意義的頻繁項目知識模式。然而,設置單一項目支持度門檻值,是不足以真實地反映每一資料庫中的實際項目,若訂定太低將可能會產生許多無意義之關聯規則,反之標準太高也會使得某些不頻繁出現的項目未被發現,但其卻是決策者真正關心的模式。因此,如何依循不同項目之特質屬性設立多重支持度準則,以發現稀有罕見項目集是數據挖掘之切要問題之一。此外,為了快速地實現高性能以處理從大量的資訊中尋找深具代表之模式,Hadoop MapReduce之軟體運行架構已被證實得以公平地減少執行時間。據此,本研究提出一兩階段概念模型之解決方案,以滿足每一項目設置多個支持度值,其不需經修剪和重建調整階段,而是以相同的MIS門檻值將數據資料予以分割成不同的資料區塊,搭配使用MapReduce框架,準確且高效地透過平行運算的方式查找涉及頻繁和罕見項目的相關模式,產生更多有趣和有用之關聯規則。並相較單機與平行計算使用多重項目支持度門檻值挖掘頻繁項目的既有演算方法之執行時間,以明顯地證明本架構之執行績效能確實地進行關聯項目模式與規則的發現。期研究結果能為衡量資料探勘之關聯規則活動依據,進而供後續研究參考。
論文英文摘要:The analysis of big data mining for frequent patterns is become even more problematic. It got a lot of applications and attempt to promote people’s health and daily life better and easier. Association mining is the analyzing process of discovering interesting and useful association rules hidden from huge and complicated data in different databases. However, use a single minimum item support value for all items are not sufficient since it could not reflect the characteristic of each item. When the minimum support value (MIS) is set too low, despite it would find rare items, similarly, it may generate a large number of meaningless patterns. On the other hand, if the minimum support value is set too high, we will lose useful rare patterns. Thus, how to set the threshold value of minimum support for each item to find out correlated patterns efficiently and accurately is essential. In addition, efficient computing has been an active research issue of data mining in recent years. MapReduce was proposed in 2008, it could easier implement parallel algorithm to compute various kinds of derived data and reduce run-time. Accordingly, in this paper we proposed to a concept model of solutions set multiple support value for each item and using MapReduce framework to find correlated patterns involving both of frequent and rare items accurately and efficiently. It would not require post pruning and rebuilding phases since each item are either promising more or equal to MIN-MIS, thereby improving the overall performance of mining frequent patterns and rare items accurately and efficiently.
論文目次:中文摘要 i
英文摘要 ii
誌謝   iii
目錄   iv
表目錄  v
圖目錄  vi
第一章 緒論  1
1.1 研究背景與動機 1
1.2 研究目的與限制 6
1.3 研究流程 7
第二章 文獻探討 9
2.1 資料探勘與關聯法則 9
2.2 Apriori algorithm 12
2.3 FP-growth algorithm 14
2.4 MSapriori algorithm 17
2.5 CFP-growth algorithm 19
2.6 MISFP-growth algorithm 21
2.7 Hadoop MapReduce 24
第三章 研究方法與模型 27
3.1 研究方法實例演示 31
3.2項目最小支持度訂定準則 36
3.3關聯規則衡量指標 38
第四章 實驗分析與結果 39
4.1實驗設計 39
4.2實驗資料 40
4.2.1實驗資料一 40
4.2.2實驗資料二 43
4.2.3實驗資料三 44
4.3實驗結果 46
4.3.1實驗一 46
4.3.2實驗二 49
4.3.3實驗三 52
第五章 結論 53
5.1研究貢獻 53
5.2未來研究方向 54
參考文獻 55
附錄 58
論文參考文獻:1. 陳垂呈,高效率探勘高頻項目集之演算法,高雄師大學報:自然科學與科技類,第26卷,43-60頁,2009。
2. 陳政富,2003,雙門檻值制定應用於關聯法則之研究,碩士論文,大葉大學資訊管理學系碩士班,彰化。
3. 湯瑪斯.戴文波特,江裕真譯,大數據@工作力:如何運用巨量資料,打造個人與企業競爭優勢,天下文化,2014。
4. 鄧安生,2002,新式探勘方法在關聯法則門檻值制定之研究,碩士論文,大葉大學資訊管理學系碩士班,彰化。
5. 羅閔隆,2003,以經驗法則應用在關聯法則門檻值制定之研究,碩士論文,大葉大學資訊管理學系碩士班,彰化。
6. Aalst, W. V. D., Zhao, J. L., Wang, H. J., Editorial: Business process intelligence: Connecting data and processes. ACM Transactions on Management Information Systems (TMIS), Vol.5, No.4, 18e, 2015.
7. Agrawal, R., Imieliński, T., & Swami, A., Mining association rules between sets of items in large databases. In Acm sigmod record, Vol.22, No.2, pp.207-216, 1993.
8. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I., Fast discovery of association rules. Advances in knowledge discovery and data mining, Vol.12, No.1, pp.307-328, 1996.
9. Agrawal, R., Srikant, R., Fast algorithms for mining association rules. The 20th VLDB Conference. Vol.1214, pp.487-499, 1994.
10. AL-HAMODI, A. A., LU, S., & AL-SALHI, Y. E., An Enhanced Frequent Pattern Growth Based on MapReduce for Mining Association Rules. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.6, No.2, 2016.
11. B. Liu, W. Hsu, Y. Ma, Mining association rules with multiple minimum supports, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), San Diego, CA, USA, 1999.
12. Berry, M. J., & Linoff, G., Data Mining Technique: for Marketing, Sales and Customer Support, Publishedby John Wiley & Sons, Inc., 1997.
13. Brijs, Tom., Retail market basket data set. In Workshop on Frequent Itemset Mining Implementations, 2003.
14. Brin, S, Motwani, R., and Silverstein, C., Beyond market baskets: generalizing association rules to correlations, in Proceedings of the 1997 ACM SIGMOD international conference on Management of data, pp. 265-276, 1997.
15. Chen, S. S., Huang, T. C. K. & Lin, Z. M., New and efficient knowledge discovery of partial periodic patterns with multiple minimum supports, Journal of Systems and Software, Vol.84, No.10, pp.1638-1651, 2011.
16. Dean, J., & Ghemawat, S., MapReduce: simplified data processing on large clusters. Communications of the ACM, Vol.51, No.1, pp.107-113, 2008.
17. Elgaml, E. M., Ibrahim, D. M., & Sallam, E. A., Improved FP-growth Algorithm with Multiple Minimum Supports Using Maximum Constraints. World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, Vol.9, No.5, pp.1087-1094, 2015.
18. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P., From data mining to knowledge discovery in databases. AI magazine, Vol.17, No.3, pp.37, 1996.
19. Frawley, W. J., Paitetsky-Shapiro, G., & Matheus, C. J., Knowledge Discovery in Databases: An Overview, AAAI/MIT Press, Vol.13, No.3, pp.57-70, 1992.
20. Guil, F. & Marín, R., A theory of evidence-based method for assessing frequent patterns, Expert systems with applications, Vol.40, No.8, pp.3121-3127, 2013.
21. Han, J., & Kamber, M., Data mining: concepts and techniques ,the Morgan Kaufmann Series in data management systems, 2001.
22. Han, J., Kamber, M., & Pei, J., Data Mining: Concepts and Techniques, Published by Morgan Kaufmann, 2001.
23. Han, J., Pei, J., & Kamber, M., Data mining: concepts and techniques, Elsevier, 2011.
24. Han, J., Pei, J., & Yin, Y., Mining frequent patterns without candidate generation. In ACM Sigmod Record, Vol.29, No.2, pp.1-12, 2000.
25. Hu, Y. H., & Chen, Y. L., Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism. Decision Support Systems, Vol.42, No.1, pp.1-24, 2006.
26. Hu, Y. H., Tsai, C. F., Tai, C. T. & Chiang, I. C., A novel approach for mining cyclically repeated patterns with multiple minimum supports, Applied Soft Computing, Vol.28, pp.90-99, 2015.
27. Hu, Y. H., Wu, F. & Liao, Y. J., An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports, Journal of Systems and Software, Vol.86, No.5, pp.1224-1238, 2013.
28. Huang, T. C. K., Discovery of fuzzy quantitative sequential patterns with multiple minimum supports and adjustable membership functions, Information Sciences, Vol.222, pp.126-146, 2013.
29. Kleissner, C., Data Mining for the Enterprise, Proc. of the 31th Hawaii Int. Conf. on System Sciences(HICSS-31), Vol.7, pp.295-304, 1998.
30. Le, T., & Vo, B., An N-list-based algorithm for mining frequent closed patterns. Expert Systems with Applications, Vol.42, No.19, pp.6648-6657, 2015.
31. Leavitt, N., Storage Challenge: Where Will All That Big Data Go?. IEEE Computer, Vol.46, No.9, pp.22-25, 2013.
32. Lee, Y. C., Hong, T. P. & Lin, W. Y., Mining association rules with multiple minimum supports using maximum constraints, International Journal of Approximate Reasoning, Vol.40, No.1, pp.44-54, 2005.
33. Lin, K. W. & Chung, S. H., A fast and resource efficient mining algorithm for discovering frequent patterns in distributed computing environments, Future Generation Computer Systems, Vol.52, pp.49-58, 2015.
34. Liu, B., Hsu, W. & Ma, Y., Mining association rules with multiple minimum supports, Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.337-341, 1999.
35. Liu, B., Hsu, W., Chen, S. & Ma, Y., Analyzing the subjective interestingness of association rules, IEEE Intelligent Systems, Vol.15, No.5, pp.47-55, 2000.
36. Michael Hahsler, Kurt Hornik, and Thomas Reutterer, Implications of probabilistic data modeling for mining association rules. In M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nuernberger, and W. Gaul, editors, From Data and Information Analysis to Knowledge Engineering, Studies in Classification, Data Analysis, and Knowledge Organization, Springer-verlag, pp.598-605, 2006.
37. Mitra, S., Bande, S., Kudale, S., Kulkarni, A., & Deshpande, A. P. L., A. Efficient FP Growth using Hadoop-(Improved Parallel FP-Growth). International Journal of Scientific and Research Publications, 2014.
38. Ramakrishnudu, T., & Subramanyam, R. B. V., Mining Interesting Infrequent Itemsets from Very Large Data based on MapReduce Framework. International Journal of Intelligent Systems and Applications, Vol.7, No.7, pp.44, 2015.
39. Sadeq Darrab, Belgin Ergenç., Frequent Pattern Mining under Multiple Support Thresholds. WSEAS transactions on computer research, Vol.4, E-ISSN:2415-1513, 2016.
40. Taktak, W., & Slimani, Y., MS-FP-Growth: A multi-support Vrsion of FP-Growth Agorithm. International Journal of Hybrid Information Technology, Vol.7, No.3, pp.155-166, 2014.
41. Wang, C. S., Lin, S. L., Chiu, H. C., Juan, C. J. & He, X. Y., Is a medical examination necessary? Analysis of medical examination transactions through association mining using multiple minimum supports, Journal of Medical Imaging and Health Informatics, 2017.
42. Wu, X., Zhu, X., Wu, G. Q., & Ding, W., Data mining with big data. IEEE transactions on knowledge and data engineering, Vol.26, No.1, pp.97-107, 2014.
43. Yadav, R. & Garg, K., An empirical analysis of multiple level association rules mining method for feature extraction, World of Computer Science and Information Technology Journal (WCSIT), Vol.5, No.11, pp.165-171, 2015.
論文全文使用權限:不同意授權