現在位置首頁 > 博碩士論文 > 詳目
  • 同意授權
論文中文名稱:雙聚類問題的研究及在工業上的應用 [以論文名稱查詢館藏系統]
論文英文名稱:A Study of Biclustering Problems and Their Applications in Industry [以論文名稱查詢館藏系統]
院校名稱:臺北科技大學
學院名稱:管理學院
系所名稱:工業工程與管理研究所
畢業學年度:97
出版年度:98
中文姓名:李恕毅
英文姓名:Shu-Yi Li
研究生學號:96378002
學位類別:碩士
語文別:中文
口試日期:2009-06-29
論文頁數:50
指導教授中文名:吳建文
指導教授英文名:Chienwen Wu
口試委員中文名:黃祥熙;陳凱瀛
口試委員英文名:Hsiang-Hsi Huang;Kai-Ying Chen
中文關鍵詞:雙聚類高頻項目集資料探勘
英文關鍵詞:BiclusteringFrequent itemsetData mining
論文中文摘要:本研究在探討雙聚類問題,雙聚類是指資料矩陣裡,行(column)與列(row)的值之間含有某種趨勢特性。近年來陸續有學者提出雙聚類問題的演算法,雙聚類被廣泛應用於各個研究領域,例如:生物晶片、產業、資訊檢索與文字探勘、推薦系統、目標行銷、資料庫研究及財務預測…等。
本研究透過高頻項目集的概念,找出資料矩陣中的雙聚類。雙聚類問題可分為四大類型,分別是「共同值雙聚類問題」、「共同行值或共同列值雙聚類問題」、「共同連續值雙聚類問題」及「共同發展趨勢雙聚類問題」。本研究透過適當的格式轉換,並藉由Microsoft SQL Server 2005的高頻項目集功能找到雙聚類。本篇論文運用四種不同的雙聚類虛擬資料,以及葡萄酒和乳癌腫瘤兩組真實樣本資料,證明研究的可行性與正確性。根據研究結果顯示,在葡萄酒與乳癌腫瘤樣本資料中,本研究的方法分別比Abdullah et al. 及Rice et al. 所提出的方法較好,說明了使用高頻項目集所探勘的結果數據皆比過去的方法來得優越。本研究亦將雙聚類問題應用於工業領域上,期望能提供一個新的研究方向,以提供後續研究學者研究方法的參考。
論文英文摘要:In this study we present a study of the biclustering problems. Given a data matrix, a bicluster is a subset of rows that exhibit similar behavior across a subset of columns, and vice versa. In the literature, A large number of clustering approaches have been proposed for biclustering problems. The biclustering problems has applications in many fields, including bioinformatics, industry, information retrieval and text mining, collaborative filtering, recommendation systems, market research, target marketing, database research, data mining, and financial forecasting...etc. We discover the biclusters based on the concept of frequent itemset mining. We address four major classes of biclusters, including “biclusters with constant values”, “biclusters with constant values on rows or columns”, “biclusters with coherent values” and “biclusters with coherent evolutions”.
We propose a novel technique, which first transforms the data matrix and then discover the biclusters by frequent itemset mining using Microsoft SQL Server 2005. The proposed technique is compared with Abdullah’s and Rice’s approaches by experiments on both wine and breast cancer data sets. The results show that our technique demonstrates very good performance.
論文目次:摘要 i
ABSTRACT ii
誌謝 iv
目錄 v
表目錄 vii
圖目錄 ix
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 研究範圍 2
1.4 研究流程 3
第二章 文獻探討 5
2.1 高頻項目集探勘問題 5
2.2 雙聚類問題 7
2.2.1 共同值雙聚類問題 8
2.2.2 共同行值或共同列值雙聚類問題 10
2.2.3 共同連續值雙聚類問題 12
2.2.4 共同發展趨勢雙聚類問題 15
第三章 研究方法 17
3.1 共同值雙聚類問題的轉換方法 17
3.2 共同行值或列值雙聚類問題的轉換方法 19
3.2.1 共同行值雙聚類問題的轉換方法 19
3.2.2 共同列值雙聚類問題的轉換方法 21
3.3 共同連續值雙聚類問題的轉換方法 22
3.3.1 等差共同連續值雙聚類問題的轉換方法 22
3.3.2 等比共同連續值雙聚類問題的轉換方法 24
3.4 共同發展趨勢雙聚類問題的轉換方法 25
第四章 實驗結果 28
4.1 虛擬資料 28
4.1.1 共同值雙聚類 28
4.1.2 共同列值雙聚類 31
4.1.3 共同連續值雙聚類 32
4.1.4 共同發展趨勢雙聚類 33
4.2 真實資料 34
4.2.1 葡萄酒樣本資料 34
4.2.2 乳癌腫瘤樣本資料 36
4.2.3 機器與零件關係資料 39
第五章 結論與建議 43
5.1 結論 43
5.2 研究貢獻 44
5.3 後續研究建議 44
參考文獻 45
附錄 Apriori演算法 50
論文參考文獻:[1] 曾憲雄,蔡秀滿,蘇東興,曾秋蓉,王慶堯,資料探勘,台北:旗標出版社,2005,第vii2-vii8頁。
[2] 劉尚志、倪貴榮、陳秀雯,「生物晶片之專利保護、授權、侵權及上市前程序之研究(I)」,行政院國家科學委員會專題研究計畫,2003。
[3] A. Abdullah and A. Hussain, “A new biclustering technique based on crossing minimization,” Neurocomputing, vol. 69, no. 16-18, 2006, pp. 1882-1896.
[4] A. Abdullah and A. Hussain, “Using biclustering for automatic attribute selection to enhance global visualization,” Springer Verlag Lecture Notes in Computer Science, 4370, 2007, pp. 35-47.
[5] A. Ben-Dor, B. Chor, R. Karp and Z. Yakhini, “Discovering local structure in gene expression data: the order-preserving submatrix problem,” Proceedings Sixth International Conference on Computational Molecular Biology, Washington, DC, USA, 2002, pp. 49-57.
[6] A. Califano, G. Stolovitzky and Y. Tu, “Analysis of gene expression microarrays for phenotype classification,” Proceedings International Conference Computation Molecular Biology, San Diego, 2000, pp. 75-85.
[7] A. Tanay, R. Sharan and R. Shamir, “Discovering statistically significant biclusters in gene expression data,” Bioinformatics, vol. 18, 2002, pp. S136-S144.
[8] C. Jermaine, “Finding the most interesting correlations in a database: how hard can it be?” Information System, vol. 30, no. 1, 2005, pp. 21-46.
[9] C. Tang, L. Zhang, I. Zhang and M. Ramanathan, “Interrelated two-way clustering: an unsupervised approach for gene expression data analysis,” Proceedings Second IEEE International Symposium on Bioinformatics and Bioengineering, 2001, pp. 41-48.
[10] D. Jiang, J. Pei, M. Ramanathan, C. Tang and A. Zhang, “Mining coherent gene clusters from gene-sample-time microarray data,” Proceedings of tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, 2004, pp. 430-439.
[11] E. Segal, B. Taskar, A. Gasch, N. Friedman and Koller D, “Rich probabilistic models for gene expression”, Bioinformatics, vol. 17, 2001, pp. S243-S252.
[12] G. Getz, E. Levine and E. Domany, “Coupled two-way clustering analysis of gene microarray data,” Proceedings of the Natural Academy of Sciences, USA, vol. 97, no.22, 2000, pp. 12079-12084.
[13] G. Park and Szpankowski, “Analysis of biclusters with applications to gene expression Data,” Proceeding of Conference on Analysis of Algorithms, Barcelona, 2005, pp. 267-274.
[14] H. Cho, I. S. Dhillon, Y. Guan and S. Sra, “Minimum sum-squared residue co-clustering of gene expression data,” Proceedings of the fourth SIAM International Conference on Data Mining, Lake Buena Vista, Fla, USA, 2004, pp. 114-125.
[15] H. Wang, W. Wang, J. Yang and P. S. Yu, “Clustering by pattern similarity in large data sets,” Proceeding 2002 ACM SIGMOD International Conference Management of Data, Madison, USA, 2002, pp. 394-405.
[16] I. S. Dhillon, “Co-clustering documents and words using bipartite spectral graph partitioning,” Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, 2001, pp. 269-274.
[17] J. A. Hartigan, “Direct clustering of a data matrix,” Journal of the American Statistical Association, vol. 67, no. 337, 1972, pp. 123-129.
[18] J. Abello, P. M. Pardalos and M. G. Resende, Handbook of massive data sets, Dordrecht: Kluwer Academic Publishers, 2002.
[19] J. Han and M. Kamber, Data mining: concepts and techniques, USA: Morgan Kaufmann Publishers, 2000, pp. 230-239.
[20] J. Liu and W. Wang, “OP-Cluster: clustering by tendency in high dimensional space,” Proceedings Third IEEE International Conference Data Mining, Melbourne, Florida, 2003, pp. 187-194.
[21] J. Yang, W. Wang, H. Wang and P. Yu, “δ-Clusters: capturing subspace correlation in a large data set,” Proceedings of the 18th IEEE International Conference on Data Engineering, San Jose, USA, 2002, pp. 517-528.
[22] J. Yang, W. Wang, H. Wang and P. Yu, “Enhanced biclustering on expression data,” Proceedings of the third IEEE Conference on Bioinformatics and Bioengineering, 2003, pp. 321-327.
[23] L. Haizhou and Y. Hong, “Bicluster analysis of currency exchange rates,” Soft Computing Applications in Business, 2008, pp. 19-34.
[24] L. Lazzeroni and A. Owen, “Plaid models for gene expression data,” Statistica Sinica, vol. 12, no. 1, 2002, pp. 61-86.
[25] M. D. Rice, M. Siff, “Clusters, Concepts, and Pseudometrics,” Electronic Notes in Theoretical Computer Science, vol. 40, 2000.
[26] M. P. Chandrasekharan and R. Rajagopalan, “An ideal seed non-hierarchical clustering algorithm for cellular manufacturing,” International Journal of Production Research, vol. 24, 1986, pp. 451-464.
[27] P. Baldi and G. W. Hatfield, DNA microarrays and gene expression: from experiments to data analysis and modeling, New York: Cambridge University Press, 2002.
[28] Q. Sheng, Y. Moreau and B. D. Moor, “Biclustering microarray data by Gibbs sampling,” Bioinformatics, vol. 19, 2003, pp. ii196-ii205.
[29] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, 1994, pp. 487-499.
[30] R. Tibshirani, T. Hastie, M. Eisen, D. Ross, D. Botstein and P. Brown, “Clustering methods for the analysis of DNA microarray data,” Technical report, Department of Health Research and Policy, Stanford University, Stanford, Calif, USA, 1999.
[31] S. Busygin, G. Jacobsen and E. Kramer, “Double conjugated clustering applied to leukemia microarray data,” Proceedings of the 2nd SIAM International Conference Data Mining, Workshop on Clustering High Dimensional Data, 2002.
[32] S. Busygin, O. Prokopyev and P. M. Pardalos, “Biclustering in data mining,” Computers and Operations Research, vol. 35, no. 9, 2008, pp. 2964-2987.
[33] S. C. Madeira and A. L. Oliveira, “Biclustering algorithms for biological data analysis: A Survey,” IEEE Transactions on Computation Biology and Bioinformatics, vol. 1, no. 1, 2004, pp. 24-45.
[34] S. Lonardi, W. Szpankowski and Q. Yang, “Finding biclusters by random projections,” Theoretical Computer Science, vol. 368, no. 3, 2006, pp. 217-230.
[35] S. Mitra, R. Das, H. Banka and S. Mukhopadhyay, “Gene interaction-an evolutionary biclustering approach,” Information Fusion, vol. 10, no. 3, 2009, pp. 242-249.
[36] T. M. Marali and S. Kasif, “Extracting conserved gene expression motifs from gene expression data,” Proceedings Pacific Symposium on Biocomputing, vol. 8, 2003, pp. 77-88.
[37] Y. Cheng and G. M. Church, “Biclustering of expression data,” Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, 2000, pp. 93-103
[38] Y. Klugar, R. Basri, J. T. Chang and M. Gerstein, “Spectral biclustering of microarray data: coclustering genes and conditions,” Genome Research, vol. 13, no. 4, 2003, pp. 703-716.
[39] Y. Okada, K. Okubo, P. Horton and W. Fujibuchi, “Exhaustive search method of gene expression modules and its application to human tissue data,” IAENG International Journal of Computer Science, vol. 34, no. 1, 2007, pp. 119-126.
論文全文使用權限:同意授權於2011-08-03起公開