現在位置首頁 > 博碩士論文 > 詳目
論文中文名稱:應用維度縮減於新聞標題關鍵字組合推薦 [以論文名稱查詢館藏系統]
論文英文名稱:Applying Dimension Reduction to Keyword Combination of News Title Recommendation [以論文名稱查詢館藏系統]
院校名稱:臺北科技大學
學院名稱:管理學院
系所名稱:資訊與財金管理系碩士班
畢業學年度:104
畢業學期:第二學期
中文姓名:吳靜怡
英文姓名:Jing-Yi Wu
研究生學號:103AB8009
學位類別:碩士
語文別:中文
口試日期:2016/07/05
指導教授中文名:翁頌舜
指導教授英文名:Sung-Shun Weng
口試委員中文名:陳育威;楊欣哲;吳瑞堯
口試委員英文名:Yu-Wei Chen;Shin-Jer Yang;Rei-Yao Wu
中文關鍵詞:網路新聞文字探勘字串比對維度縮減推薦系統
英文關鍵詞:Web NewsText MiningString MatchingDimension ReductionRecommendation Systems
論文中文摘要:隨著社群網路興起,許多新聞機構對於新聞發佈的管道,逐漸從傳統的新聞媒體轉至社群網路,其中兩者最大的差別在於發佈的速度及跨平台之延伸。網路新聞的特點為追求快速發佈且豐富的內容,因此網路新聞編輯者若要在短時間內撰寫一則能吸引讀者來點讚的新聞標題其實是非常困難的。本研究利用過去點讚率較高之網路新聞標題作為分析之主軸,新聞標題經由分解後,接著計算出熱門的新聞標題與關鍵字之相似性並結合維度縮減作為核心的推薦模型,使網路新聞編輯者能夠快速得知哪些關鍵字之組合較能夠提升新聞點讚率並且吸引讀者看完整篇新聞報導。
論文英文摘要:Social networking sites have grown massively nowadays which intrigues news agencies to use social media platforms for news releases rather than traditional media platforms. Two main differences between social and traditional media platforms are Real time creation and Cross-platform extensions. Contents posted on social media platforms require rapidity and abundance; hence, web news editors find it relatively difficult to finish an article in a short time while still attracts audiences to click the like button and read simply by its headline. In this paper, we analyze those “web news headlines with high like rates” to find out catchy headlines and similarities of keywords. Moreover, with the help of Dimension Reduction as major technique, news editors are able to identify the best keyword combinations that could be used on social media platforms which help to increase page views as well as attract more audiences to click the like button and read articles thoroughly.
論文目次:中文摘要 i
英文摘要 ii
誌 謝 iii
目 錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 3
1.3 研究流程 4
第二章 文獻探討 5
2.1 網路新聞(Web News) 5
2.2 文字探勘(Text Mining) 6
2.2.1 中文斷詞(Chinese Word Segmentation) 6
2.2.2 權重計算(Weight Computing) 9
2.3 字串比對(String Matching) 10
2.4 維度縮減(Dimension Reduction) 13
2.5 推薦系統(Recommender System) 15
2.5.1 人口統計過濾法(Demographic Filtering) 16
2.5.2 內容導向式推薦(Content-based Recommendations) 17
2.5.3 協同過濾式推薦(Collaborative Filtering Recommendations) 17
2.5.4 混合式推薦(Hybrid Recommendations) 19
第三章 研究方法 21
3.1 系統架構 21
3.2 網路新聞資料搜集 22
3.3 模組功能說明 23
3.3.1 字詞處理模組(Word Processing Module, WPM) 23
3.3.2 關鍵字相似度模組(Keyword Similarity Module, KSM) 25
3.3.3 推薦模組(Recommendation Module, RM) 32
第四章 實驗結果與評估 35
4.1 實驗環境 35
4.2 實驗數據 36
4.3 實驗設計與成果 37
4.3.1 模組實驗 37
4.3.2 系統展示 41
4.4 評估方法 45
4.4.1 平均絕對誤差(Mean Absolute Error, MAE) 45
4.4.2 精確率(Precision)、召回率(Recall)與F1度量(F1-Measure) 46
4.5 評估結果 47
4.5.1 實驗結果分析 47
4.5.2 以MAE為實驗評估方法 47
4.5.3 以Precision-Recall及F1-Measure為實驗評估方法 49
第五章 結論 52
5.1 研究結論與貢獻 52
5.2 研究限制與建議 53
參考文獻 54
論文參考文獻:1. 中央研究院詞庫小組,中文斷詞系統,2015年12月擷取自詞類標記列表: http://ckipsvr.iis.sinica.edu.tw/,1986。
2. 東森新聞雲股份有限公司,ETtoday新聞總覽,2015年12月擷取自 ETtoday東森新聞雲:http://www.ettoday.net/,2011。
3. 國立政治大學商學院民意與市場調查研究中心,2015年台灣寬頻網路使用調查報告, 2015年10月擷取自財團法人台灣網路資訊中心: http://www.twnic.net.tw/download/200307/20150901e.pdf,2015。
4. 陳萬達,《網路新聞學》,臺北,威仕曼出版社,2007。
5. 李家豪、謝佳玲,臺灣電視新聞標題研究與教學啟示,華語文教學研究,第八卷,第三期,第79-114頁,2011。
6. 李維平、鍾任明、吳澤民,運用文字探勘於日內股價漲跌趨勢預測之研究,中華管理評論,第十卷,第一期,2007。
7. 陳良駒、陳日鑫,植基於詞彙數量關係探討軍事新聞主題-以青年日報為例,資訊管理展望,第十二卷,第一期,第21-42頁,2010。
8. 陳林志、陳大仁、吳忠澄、葉國暉,使用不同語意模型分析線上部落格文件,中華民國資訊管理學報,第二十二卷,第三期,第273-316頁,2015。
9. 林照真,社群網站與新聞生產:從聚合觀點檢視全球性媒體如何經營社群網站,中華傳播學刊,2014。
10. 王彥叡,應用潛在語意分析建構階層式概念集群之分群法,碩士論文,國立臺北大學資訊管理系,臺北,2014。
11. 洪勝家,應用壓力事件導向模型預測網路使用者的憂鬱傾向,碩士論文,國立成功大學醫學資訊系,2013。
12. 徐道智,以主成分影像作特徵抽取之三維模型檢索系統,碩士論文,國立交通大學電機與控制工程學系,2009。
13. 張正霖,以成語涵義為基礎之中文成語檢索系統,碩士論文,國立交通大學數位圖書資訊系,2010。
14. 馮廣明,正面和負面資訊需求對資訊檢索效能之影響研究,碩士論文,國立台灣大學資訊工程系,2003。
15. 蕭為元,應用文字探勘及機器學習技術於股票推薦系統之研究,碩士論文,屏東科技大學資訊管理系,2013。
16. 蕭嘉凌,使用漸進式奇異值分解法於結合社群關係之大型推薦系統,碩士論文,國立成功大學工程科學系,2014。
17. Aalberg T., Coen S., Curran J., Hayashi K., Jones P. K., Papathanassopoulos S., Splendore S., Rowe D. and Tiffen R., ”Internet revolution revisited: a comparative study of online news,” Media Culture Society, Vol. 35, No. 7, pp. 880-897, 2013.
18. Aciar, S., Zhang, D., Simoff, S., & Debenham, J., “Informed recommender: basing recommendations on consumer product reviews,” Intelligent Systems IEEE, Vol. 22, No. 3, pp. 39-47, 2007.
19. Adomavicius G. and Tuzhilin A., “Toward the next generation of recommender systems:A survey of the state-of-the-art and possible extensions,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 6, pp. 734-749, 2005.
20. Balabanović M. and Shoham Y., “Fab: content-based, collaborative recommendation,” Communications of the ACM, Vol. 40, No. 3, pp. 66-72, 1997.
21. Barbu V. and Neagu C., “Principal component analysis of the factors involved in the extraction of beetroot betalains,” Journal of Agroalimentary Processes and Technologies, Vol. 20, No. 4, pp. 311-318, 2014.
22. Bardach N. S., DaviesJason M., Dean M. L., Dudley R. A. and Marafino B. J., “N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit,” Journal of the American Medical Informatics Association, Vol. 20, No. 5, pp. 871-875, 2014.
23. Barranco M. J., Castro J. and Noguera J. M., “A context-aware mobile recommender system based on location and trajectory,” Advances in Intelligent Systems and Computing, Vol. 171, pp. 153-162, 2012.
24. Bourke J. P., Langley P., Lord S., Murray A., Murray S. and Raine D., “Principal component analysis of atrial fibrillation: inclusion of posterior ECG leads does not improve correlation with left atrial activity,” Medical Engineering and Physics, Vol. 35, No. 2, pp. 251-255, 2015.
25. Boyer R. S., and Moore J. S., “A fast string searching algorithm,” Communications of the ACM, Vol. 20, No. 10, pp. 762-772, 1977.
26. Breese J. S., Heckerman D. and Kadie C., “Empirical analysis of predictive algorithms for collaborative filtering,” Proceedings of the Fourteenth conference on Uncertainty in artificial, pp. 43-52, 1998.
27. Burke R., “Hybrid recommender systems: survey and experiment,” User Modeling and User-Adapted Interaction, Vol. 12, No. 4, pp. 331-370, 2002.
28. Caravelas P. and Lekakos G., “A hybrid approach for movie recommendation,” Multimed Tools Appl, Vol. 36, No. 1-2, pp. 55-70, 2008.
29. Cattell R. B., “The scree test for the number of factors,” Multivariate Behavioral Research, Vol. 1, No. 2, pp. 245-276, 1966.
30. Chen X. and Wu Y. F., “Extracting features from web search returned hits for hierarchical classification,” Proceedings of the International Conference on Information and Knowledge Engineering, Vol. 1, pp. 103-108, 2003.
31. Chen Z., Liu H., Lu Y., Sun J. T. and Zeng H. J., “CubeSVD: a novel approach to personalized web search,” Proceedings of the 14th international conference on World Wide Web, pp. 382-390, 2005.
32. Chhabra T. and Tarhio J., “A filtration method for order-preserving matching,” Information Processing Letters, Vol. 116, No. 2, pp. 71-74, 2016.
33. Chyi I. H., Jeong H. S. and Ju A., “Will social media save newspapers?” Journalism Practice, Vol. 8, No. 1, pp. 1-17, 2014.
34. Dai Y., Khoo C. S., and Loh T. E., “A new statistical formula for chinese text segmentation incorporating contextual information,” Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 82-89, 1999.
35. Fodor I. K., “A survey of dimension reduction techniques,” Technical report UCRL-ID-148494, 2002.
36. Gan J. Q. and Plansangket S., “A new term weighting scheme based on class specific document frequency for document representation and classification,” Computer Science and Electronic Engineering Conference (CEEC) 2015 7th, pp. 5-8, 2015 September.
37. Gan J. Q. and Plansangket S., “A query suggestion method combining TF-IDF and Jaccard coefficient for interactive web search,” Artificial Intelligence Research, Vol. 4, No. 2, pp. 119-125, 2015.
38. Gaskins B. and Jerit J., “Internet news: is it a replacement for traditional media outlets?” The International Journal of Press Politics, Vol. 17, No. 2, pp. 190-213, 2012.
39. Golub G. and Kahan W., “Calculating the singular values and pseudo-inverse of a matrix,” Journal of the Society for Industrial and Applied Mathematics Series B Numerical Analysis, Vol. 2, No. 2, pp. 205-224, 1965.
40. He L., Qiu M. and Yang Y., “Exploration and improvement in keyword extraction for news based on TFIDF,” Energy Procedia, Vol. 13, pp. 3551-3556, 2011.
41. Herrada O. C., “Music recommendation and discovery in the long tail,” Springer, 2008.
42. Jaiswal M., “Accelerating enhanced Boyer-Moore string matching algorithm on multicore GPU for network security,” International Journal of Computer Applications, Vol. 97, No. 1, 2014.
43. Jeong Y., “High performance parallelization of Boyer-Moore algorithm on many-core accelerators,” 2014 IEEE International Conference on Cloud and Autonomic Computing, pp. 265-272, 2014.
44. Karypis G., Konstan J., Riedl J. and Sarwar B., “Analysis of recommendation algorithms for E-commerce,” Proceedings of the 2nd ACM Conference on Electronic Commerce, pp. 158-167, 2000.
45. Karypis G., Konstan J., Sarwar B. and Riedl J. “Item-based collaborative filtering recommendation algorithms,” Proceedings of the 10th international conference on World Wide Web, pp. 285-295, 2001.
46. Khoshgoftaar T. M. and Su X. Y., “A survey of collaborative filtering techniques,” Advances in Artificial Intelligence, pp. 421-425, 2009.
47. Kim B. M. , Li Q., Park C. S., Kim S. G. and Kim J. Y., “A new approach for combining content-based and collaborative filters,” Journal of Intelligent Information Systems archive, Vol. 27, No. 1, pp. 79-91, 2006.
48. Knuth E. D., Morris H. J. and Pratt R.V., “Fast pattern matching in strings,” SIAM Journal on Computing, Vol. 6, No. 2, pp. 323-350, 1977.
49. Kumar V., Steinbach M. and Tan P. N., “Introduction to data mining,” U.S.A: Addison-Wesley Longman Publishing Co, 2005.
50. Levry D., Newman N. and Nielsem R. K., “Tracking the future of news,” Reuters Institute Digital News Report 2015, 2015.
51. Niwattanakul, S., Singthongchai, J., Naenudorn, E., & Wanapu, S., “Using of Jaccard coefficient for keywords similarity” Proceedings of the International MultiConference of Engineers and Computer Scientists, Vol. 1, 2013.
52. On F. R., Jailani R. and Hassan S. L., ”Analysis of singular value decomposition using high dimensionality data,” 2015 IEEE 11th International Colloquium on Signal Processing & its Applications, pp. 186-191, 2015.
53. Oysal Y., Polat H. and Renckes S., “A new hybrid recommendation algorithm with privacy,” Expert Systems, Vol. 29, No. 1, pp. 39-55, 2012.
54. Paul J. “Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines,” Bulletin de la Société Vaudoise des Sciences Naturelles, Vol. 37, pp. 961-967, 1901.
55. Pazzani M. J., “A framework for collaborative, content-based and demographic filtering,” Artificial Intelligence Review, Vol. 13, No. 5-6, pp. 393-408, 1999.
56. Pearson K., “On lines and planes of closest fit to systems of points in space,” Philosophical Magazine, Vol. 2, No. 6, pp. 559-572, 1901.
57. Prathima S., Rajesh S. and Reddy L.S.S. “Unusual pattern detection in DNA database using KMP algorithm,” International Journal of Computer Applications, Vol. 1, No. 22, 2010.
58. Resnick P. and Varian H. R., “Recommender systems.,” Communication of ACM, Vol. 40, No. 3, 1997.
59. Robertson S. “Understanding inverse document frequency: on theoretical arguments for IDF,” Journal of Documentation, Vol. 60, No. 5, pp. 503-520, 1972.
60. Sparck Jones, K. “A statistical interpretation of term specificity and its application in retrieval,” Journal of Documentation, Vol. 28, pp. 11-21, 1972.
61. Suen C. Y. “N-gram satistics for natural language understanding and text processing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 2, pp. 164-172, 1979.
62. Sullivan D., “Document warehousing and text mining,” Canada: Wiley Computer Publishing, 2001.
論文全文使用權限:不同意授權