現在位置首頁 > 博碩士論文 > 詳目
  • 同意授權
論文中文名稱:以主客觀分析與相互資訊檢索探討情感分析之準確度-以電影評論為例 [以論文名稱查詢館藏系統]
論文英文名稱:Using the Subjective Analysis and PMI-IR in the Accuracy of Sentiment Analysis - A Case Study of Movie Comments [以論文名稱查詢館藏系統]
院校名稱:臺北科技大學
學院名稱:管理學院
系所名稱:資訊與運籌管理研究所
畢業學年度:99
出版年度:100
中文姓名:楊惠淳
英文姓名:Hui-Tsun Yang
研究生學號:98938005
學位類別:碩士
語文別:中文
口試日期:2011-06-22
論文頁數:46
指導教授中文名:翁頌舜
口試委員中文名:吳瑞堯;楊欣哲
中文關鍵詞:情感分析文字探勘語意分析
英文關鍵詞:Sentiment analysisText miningSemantic analysis
論文中文摘要:隨著網際網路蓬勃發展,使用者面臨龐大資訊量,以往依主題的分類方式已無法有效過濾資訊。搜尋引擎所回傳的大量蒐尋結果,除了無法看完之外,更可能造成搜尋者只得知部份偏頗的意見。為了幫助使用者組織大量評論,以得到更好的資訊內容,學界開始研究可自動將評論分類的方法-情感分析(Sentiment Analysis)。情感分析為文字探勘(Text Mining)技術應用的一種,主要是將文章依照文章的正負面情感進行分類。一篇評論往往夾雜著許多客觀的事實敘述,因而造成錯誤的分類結果,此情況尤其在電影評論中更為常見,電影評論因此被視為最難分類的評論(Turney, 2002)之一。因此判斷文章句子為主觀或客觀變得十分重要,如何避開評論裡客觀(劇情敘述)的部分,針對評論主觀(作者的個人觀感)進行分析,以助於情感分析之精準度,成為本研究的重點。
本研究選擇以中文電影評論進行研究,架構主要可分為兩階段:主客觀分析階段先以主客觀分析排除客觀(劇情敘述)的部分,將主觀句子抽取出來作為每篇評論的主觀代表句,情感分析階段針對每篇評論的主觀代表句進行情感分析。實驗結果證明此架構確實能提高情感分析之準確度,在使用前2000個情感分析特徵詞時,分類效果最好,並進一步觀察PMI-IR所使用的對立詞組影響。
論文英文摘要:With the Internet growing, users face a huge amount of information. Classification according to the theme of the past has been unable to filter information effectively. Search engine returns such a large number of search results that searchers cannot browse all. Moreover, it causes searchers could only get part of the views. To help users organize a large number of comments and get better information content, scholars began to study the automatic classification method of the comments - sentiment analysis. Sentiment analysis is a kind of text mining technology, it classified articles in accordance with article positive and negative emotions. Commentary is often mixed with objective facts described, resulting in erroneousclassification.This is particularly more common in movie reviews.Movie reviews are regarded as one of the most difficult category of comments (Turney, 2002). Therefore, determining the sentence of the article subjective or objective becomes very important, and how to avoid the objective (narrative story) part, only for subjective (the author's personal view) to analysis to help the accuracy of sentiment analysis, as the focus of the study.
This study analysis Chinese movie reviews. The research framework is divided into two phases: analysis phase of subjective and analysis phase of emotion; First of all, we use subjective analysis to exclude the objective (narrative story) part, subjective sentences are extracted as the representative of each comment. Experimental results show the classification results is best when using the first 2000 features, show how opposition phrase affect the classification results and architecture can improve the classification results.
論文目次:摘 要...............................ii
ABSTRACT...............................iii
誌 謝...............................iv
第一章 緒論...............................1
1.1 研究背景及動機...............................1
1.2 研究目的...............................2
1.3 研究流程與論文架構...............................3
第二章 文獻探討...............................5
2.1 情感分析...............................5
2.2 文字探勘...............................6
2.2.1 斷詞系統...............................6
2.2.2 特徵詞篩選...............................7
2.2.3 文件分類...............................11
第三章 研究方法...............................16
3.1 研究架構...............................16
3.2 POS斷詞系統...............................20
3.3 特徵詞篩選...............................21
3.3.1 主客觀特徵詞...............................21
3.3.2 情感分析特徵詞...............................23
3.4 特徵詞語意分類...............................24
3.5 電影評論之主客觀分析...............................25
3.6 情感分析器...............................26
3.7 評估方式...............................26
第四章 實驗設計與分析...............................29
4.1 資料來源與處理...............................29
4.1.1 劇情資料庫...............................29
4.1.2 評論資料庫...............................30
4.1.3 電影評論資料庫(預測用)...................32
4.2 實驗結果...............................33
4.2.1 研究問題與假設...............................34
4.2.2 主客觀分析-主觀句子提取.......................34
4.2.3 情感分析判斷...............................37
4.2.4 PMI-IR所使用的對立詞組影響....................40
第五章 結論...............................42
5.1 研究結論...............................42
5.2 研究貢獻...............................42
5.3 研究限制與後續發展建議........................43
參考文獻...............................45
論文參考文獻:[1] 李孟潔,2009。利用機器學習作法之中文意見分析,國立清華大學資訊工程研究所碩士論文。
[2] 晏文珍,2005。利用資料探勘技術於文件分類之研究,南台科技大學資訊管理研究所碩士學位論文。
[3] 陳立,2010。中文情感語意自動分類之研究,國立臺灣師範大學資訊工程研究所碩士論文。
[4] 喻欣凱,2008。運用支援向量機與文字探勘於股價漲跌趨勢之預測,輔仁大學資訊管理學系碩士論文。
[5] 潘佳琪,2008。運用機會發現與網路探勘建構個人化廣告推薦系統之研究,輔仁大學資訊管理學系碩士論文。
[6] Cestnik, B., 1990. Estimating probabilities: A crucial task in machine learning, In ECAI-90.
[7] Church, K.W., & Hanks, P., 1989. Word association norms, mutual information and lexicography, Proceedings of the 27th Annual Conference of the ACL, pp. 76-83.
[8] Clark, P., &Niblett, T., 1989. The CN2 Induction Algorithm, Machine Learning, pp. 261-284.
[9] Domingos, P., &Pazzani, M., 1997. On the Optimality of the Simple Bayesian Classifier under Zero One Loss, Machine Learning, Vol.29, pp. 103-130.
[10] Galavotti L., Sebastiani, F., & Simi, M., 2000. Feature selection and negative evidence in automated text categorization, In Proceedings of KDD.
[11] Langley, P., Iba, W., & Thompson, K., 1992. An Analysis of Bayesian Classifiers. Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, CA, AAAI Press, pp. 223-228.
[12] Li, N., & Wu, D.D., 2010. Using text mining and sentiment analysis for online forums hotspot detection and forecast, Decision Support Systems 48, pp. 354–368.
[13] Pang, B., & Lee, L., 2003. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pp. 271-278.
[14] Pang, B., Lee, L., &Vaithyanathan, S., 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques, In Proceedings of EMNLP.
[15] Tan, S., & Zhang, J., 2008. An empirical study of sentiment analysis for chinese documents, Expert Systems with Applications 34, pp. 2622–2629
[16] Tan, S., &Dheng, X., 2009. Improving SDL Model for Sentiment-Transfer Learning, Proceedings of NBBDL HLT 2009: Short Papers, pp.181-184.
[17] Tan, S., Dheng, X., Wang, Y., &Xu, H., 2009.Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis, pp. 337-349.
[18] Turney, P.D., 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews, In Proceedings of the association for computational linguistics 40th anniversary meeting, New Brunswick, NJ.
[19] Turney, P.D., & Littman, M.L., 2002. Unsupervised learning of semantic orientation from a hundred-billion-word corpus, Technical Report EGB-1094, National Research Council Canada.
[20] Turney, P.D., & Littman, M.L., 2003. Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems 21, pp. 315–346.
[21] Vapnik, V.N., The nature of statistical learning theory, New York: Springer, 1995.
[22] Wu, Q., Tan, S., &Dheng, X., 2009. Graph Ranking for Sentiment Transfer, Proceedings of the BDL-IJDNLP Conference Short Papers, pp. 317-320.
[23] Yang, Y., & Pedersen, J.O., 1997. A comparative study on feature selection in text categorization, ICML, pp. 412–420.
[24] Wikipedia, 2006. Sentiment analysis, Retrieved January 09, 2011, available http://en.wikipedia.org/wiki/Sentiment_analysis
論文全文使用權限:同意授權於2013-08-01起公開