現在位置首頁 > 博碩士論文 > 詳目
論文中文名稱:辨識網路虛假評論之研究 [以論文名稱查詢館藏系統]
論文英文名稱:A Study on Identifying Online Fake Reviews [以論文名稱查詢館藏系統]
院校名稱:臺北科技大學
學院名稱:管理學院
系所名稱:資訊與財金管理系碩士班
畢業學年度:104
畢業學期:第二學期
中文姓名:吳敬翔
英文姓名:Chin-Shiang Wu
研究生學號:103AB8013
學位類別:碩士
語文別:中文
口試日期:2016/07/05
指導教授中文名:翁頌舜
口試委員中文名:翁頌舜;陳育威;楊欣哲;吳瑞堯
中文關鍵詞:網路口碑虛假評論文件分類相似度運算
英文關鍵詞:Electronic Words-of-MouthFake ReviewsDocument classificationSimilarity Computing
論文中文摘要:隨著網際網路的蓬勃發展,人們可以透過論壇及網路社交平台,來分享個人對於商品或服務的意見給一般社會大眾。然而,有些廠商會為了刻意詆毀或吹捧某些商品,對商品留下不真實的評論以誤導消費者做出錯誤購買決策,當人們面對網路上大量訊息時,很難分辨哪些是不真實的評論。
本研究提出適合用於中文領域之垃圾評論分類方法,蒐集論壇上的評論進行實驗,根據評論的特點建立特徵集合,採用監督式機器學習方法來建構分類模型,評估一條評論是無用評論的可能性,並計算文件間的相似度,找出具有近似重複特點的虛假評論,實驗結果顯示,Accuracy、Precision、Recall和F-measure四個指標皆高於75%,證明本研究所提出之方法能有效幫助辨識虛假評論。
論文英文摘要:With the rapid development of Internet, people can share their personal opinions for commodities or services to the public through the Internet forums and social network platforms. However, some manufacturers will deliberately denigrate or flatter for some commodities; leaving fake reviews to mislead consumers make the wrong purchasing decisions, when people face with many messages on the Internet, which is difficult to distinguish fake reviews by themselves.
This study proposes a method of spam reviews classification in Chinese areas. We collected real reviews from web forum, then established feature set based on a review of the characteristics. We use supervised machine learning methods to construct a classification model, to assess the spam possibility of a review. Finally, we calculate the degree of similarity among the files to identify the characteristics of having near duplicate fake reviews. The experimental results show that four indicators of the accuracy, precision, recall and F-measure are higher than 75%. Therefore, the method proposed in this study has been proved effective in helping identify fake reviews.
論文目次:摘 要 i
ABSTRACT ii
誌 謝 iii
目 錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 4
1.3 論文架構 5
第二章 文獻探討 7
2.1 網路口碑(Electronic Words-of-Mouth) 7
2.2 虛假評論(Fake Reviews) 8
2.2.1 評論特徵(Features of Reviews) 9
2.2.2 虛假評論偵測(Detecting Fake Reviews) 10
2.3 文件分類(Document classification) 11
2.3.1 支持向量機(Support Vector Machines) 12
2.3.2 貝氏分類器(Naïve Bayes classifier) 13
2.3.3 羅吉斯迴歸模型(Logistic Regression Model) 13
2.3.4 決策樹(Decision Tree) 14
2.4 相似度計算(Similarity Computing) 15
第三章 研究設計與方法 18
3.1 研究設計 18
3.2 資料蒐集 19
3.3 資料預處理 20
3.3.1 斷詞處理 20
3.3.2 停用詞過濾 22
3.3.3 字詞權重計算 23
3.4 無用評論分類模組 25
3.4.1 特徵集合 25
3.4.2 分類器 29
3.5 虛假評論辨識模組 30
3.5.1 對資料集進行縮減 31
3.5.2 建立關鍵字詞-評論矩陣 31
3.5.3 評論相似度計算 32
第四章 實驗設計與分析 34
4.1 實驗環境 34
4.2 實驗資料來源與前處理 35
4.3 評估指標 38
4.4 實驗結果 39
4.4.1 實驗測試資料 39
4.4.2 資料預處理 40
4.4.3 建立特徵集合 41
4.4.4 分類器結果 46
4.4.5 虛假評論辨識與驗證 48
第五章 結論 54
5.1 研究結論與貢獻 54
5.2 研究限制與未來展望 55
參考文獻 56
論文參考文獻:1. 三星寫手門事件揭密相關資訊
http://taiwansamsungleaks.org/,2012。
2. 台灣Moblile01的三星寫手門事件,維基百科https://zh.wikipedia.org/wiki/%E4%B8%89%E6%98%9F%E5%AF%AB%E6%89%8B%E9%96%80%E4%BA%8B%E4%BB%B6,2013。
3. 王黌翔,中文Blog Comment Spam偵測技術之研究,碩士論文,元智大學資訊管理研究所,桃園,2007。
4. 林銘笙,中文部落格評論之分類,碩士論文,台北科技大學資訊工程研究所,臺北,2010。
5. Abernethy, J., Chapelle, O., & Castillo, C. (2010). Graph regularization methods for web spam detection. Machine Learning, 81(2), 207-225.
6. Ante, S. E. (2009). Amazon: turning consumer opinions into gold. Business Week, 15.
7. Banerjee, S., & Chua, A. Y. (2014). Dissecting genuine and deceptive kudos: The case of online hotel reviews. Editorial Preface.
8. Briggs, R., & Hollis, N. (1997). Advertising on the Web: Is there response before click-through?. Journal of Advertising research, 37(2), 33-46.
9. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
10. Cestnik, B. (1990, August). Estimating probabilities: a crucial task in machine learning. In ECAI (Vol. 90, pp. 147-149).
11. Chatterjee, P. (2001). Online reviews: do consumers use them?.
12. Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine learning, 3(4), 261-283.
13. Clemons, E. K., Gao, G. G., & Hitt, L. M. (2006). When online reviews meet hyperdifferentiation: A study of the craft beer industry. Journal of Management Information Systems, 23(2), 149-171.
14. Crawford, M., Khoshgoftaar, T. M., Prusa, J. D., Richter, A. N., & Al Najada, H. (2015). Survey of review spam detection using machine learning techniques. Journal of Big Data, 2(1), 1.
15. Dellarocas, C., Zhang, X. M., & Awad, N. F. (2007). Exploring the value of online product reviews in forecasting sales: The case of motion pictures. Journal of Interactive marketing, 21(4), 23-45.
16. Dichter, E. (1966). {How word-of-mouth advertising works}. Harvard business review, 44(6), 147-160.
17. Feng, S., Banerjee, R., & Choi, Y. (2012, July). Syntactic stylometry for deception detection. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2 (pp. 171-175). Association for Computational Linguistics.
18. Fette, I., Sadeh, N., & Tomasic, A. (2007, May). Learning to detect phishing emails. In Proceedings of the 16th international conference on World Wide Web (pp. 649-656). ACM.
19. González, C. G., Bonventi Jr, W., & Rodrigues, A. V. (2008, October). Density of closed balls in real-valued and autometrized boolean spaces for clustering applications. In Brazilian Symposium on Artificial Intelligence (pp. 8-22). Springer Berlin Heidelberg.
20. Granitz, N. A., & Ward, J. C. (1996). Virtual community: A sociocognitive analysis. NA-Advances in Consumer Research Volume 23.
21. Hammad, A. S. A. (2013). An Approach for Detecting Spam in Arabic Opinion Reviews (Doctoral dissertation, Islamic University of Gaza).
22. Kamber, M. (2006). Data mining: concepts and techniques.
23. Hennig‐Thurau, T., Gwinner, K. P., Walsh, G., & Gremler, D. D. (2004). Electronic word‐of‐mouth via consumer‐opinion platforms: What motivates consumers to articulate themselves on the Internet?. Journal of interactive marketing, 18(1), 38-52.
24. Hennig-Thurau, T., Walsh, G., & Walsh, G. (2003). Electronic word-of-mouth: Motives for and consequences of reading customer articulations on the Internet. International Journal of Electronic Commerce, 8(2), 51-74.
25. Hu, N., Liu, L., & Sambamurthy, V. (2011). Fraud detection in online consumer reviews. Decision Support Systems, 50(3), 614-626.
26. Jindal, N., & Liu, B. (2008, February). Opinion spam and analysis. In Proceedings of the 2008 International Conference on Web Search and Data Mining (pp. 219-230). ACM.
27. Jindal, N., Liu, B., & Lim, E. P. (2010, October). Finding unusual review patterns using unexpected rules. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 1549-1552). ACM.
28. Kerr, G. N., & Manfredo, M. J. (1991). An attitudinal based model of pricing for recreation services. Journal of Leisure Research, 23(1), 37.
29. Kim, S. M., Pantel, P., Chklovski, T., & Pennacchiotti, M. (2006, July). Automatically assessing review helpfulness. In Proceedings of the 2006 Conference on empirical methods in natural language processing (pp. 423-430). Association for Computational Linguistics.
30. Lai, C. L., Xu, K. Q., Lau, R. Y., Li, Y., & Jing, L. (2010, November). Toward a language modeling approach for consumer review spam detection. In e-Business Engineering (ICEBE), 2010 IEEE 7th International Conference on (pp. 1-8). IEEE.
31. Lau, R. Y., Liao, S. Y., Kwok, R. C. W., Xu, K., Xia, Y., & Li, Y. (2011). Text mining and probabilistic language modeling for online review spam detection. ACM Transactions on Management Information Systems (TMIS), 2(4), 25.
32. Lee, D., Kim, H. S., & Kim, J. K. (2012). The role of self-construal in consumers’ electronic word of mouth (eWOM) in social networking sites: A social cognitive approach. Computers in Human Behavior, 28(3), 1054-1062.
33. Li, F., Huang, M., Yang, Y., & Zhu, X. (2011, July). Learning to identify review spam. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence (Vol. 22, No. 3, pp. 24-88).
34. Lim, E. P., Nguyen, V. A., Jindal, N., Liu, B., & Lauw, H. W. (2010, October). Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 939-948). ACM.
35. Liu, L., & Wang, Y. (2012, April). A method for sorting out the spam from Chinese product reviews. In Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on (pp. 35-38). IEEE.
36. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.
37. Moe, W. W., & Trusov, M. (2011). The value of social dynamics in online product ratings forums. Journal of Marketing Research, 48(3), 444-456.
38. Hu, N., Bose, I., Koh, N. S., & Liu, L. (2012). Manipulation of online reviews: An analysis of ratings, readability, and sentiments. Decision Support Systems, 52(3), 674-684.
39. Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011, June). Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 309-319). Association for Computational Linguistics.
40. Ridings, C. M., Gefen, D., & Arinze, B. (2002). Some antecedents and effects of trust in virtual communities. The Journal of Strategic Information Systems, 11(3), 271-295.
41. Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4), 35-43.
42. Wu, G., Greene, D., Smyth, B., & Cunningham, P. (2010, July). Distortion as a validation criterion in the identification of suspicious reviews. In Proceedings of the First Workshop on Social Media Analytics (pp. 10-13). ACM.
43. Jo, Y., & Oh, A. H. (2011, February). Aspect and sentiment unification model for online review analysis. In Proceedings of the fourth ACM international conference on Web search and data mining (pp. 815-824). ACM.
44. Zhang, R., & Tran, T. (2008, December). An entropy-based model for discovering the usefulness of online product reviews. In Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology-Volume 01 (pp. 759-762). IEEE Computer Society.
論文全文使用權限:不同意授權