現在位置首頁 > 博碩士論文 > 詳目
論文中文名稱:基於深度學習之社群主題趨勢預測 [以論文名稱查詢館藏系統]
論文英文名稱:Trend Prediction for Social Topics Based on Deep Learning [以論文名稱查詢館藏系統]
院校名稱:臺北科技大學
學院名稱:管理學院
系所名稱:資訊與財金管理系碩士班
畢業學年度:105
畢業學期:第二學期
出版年度:106
中文姓名:丁皓東
英文姓名:Hao-Tung Ting
研究生學號:104AB8007
學位類別:碩士
語文別:中文
論文頁數:56
指導教授中文名:翁頌舜
指導教授英文名:Sung-Shun Weng
口試委員中文名:楊欣哲;吳瑞堯
中文關鍵詞:社群媒體主題偵測K-Means分群趨勢預測RNN
英文關鍵詞:Social MediaTopic DetectionK-Means ClusteringTrend PredictionRNN
論文中文摘要:社群媒體的興起,逐漸取代了傳統人與人溝通的方式,使得社會大眾轉而透過不同社群平台來關注與了解社會中發生的大小事件。如何從社群媒體平台的大量資訊中找出社會大眾所關心的重要議題,以及議題的討論趨勢預測,進而了解市場趨勢且提早進行策略的布局,以成為企業與政府主要課題。
過去主題偵測研究大多於新聞文章,而本研究則以PTT社群文章為實驗對象,將文章關鍵字轉換成文字向量,結合K-Means分群與餘弦相似度計算,找出文章的重要主題,並將各主題之留言時間趨勢透過RNN中的LSTM模型進行資料訓練與趨勢預測,結果顯示本研究能有效偵測出PTT八卦版的重要主題,在趨勢預測過程也發現PTT八卦版留言討論趨勢變化快速但並不是很複雜,且在各主題中的留言趨勢是多樣化的,另外,在模型的預測效果上也非常不錯。
論文英文摘要:Ever-rising social media and developments of social platforms have caused a revolution to communication for people updating current events in the society. Governments and enterprises rush to study how to find out critical issues in the mass of information on social platforms, trying to predict how these issues develop and to make preparation.
Instead of online news, this study is based on articles on PTT, trying to find out the critical topics in articles by text vectors converted from keywords, the k-means algorithm and cosine similarity. Also, this study intends to predict the trend of commends in each topic by analyzing LSTM models in RNN. The result proves this research is able to detect critical issues on PTT Gossiping as well as the development of these issues, which changes rapidly but not complicatedly; also there is a great variety in comments for each topic. The prediction on models is reasonably satisfying.
論文目次:摘要 i
ABSTRACT ii
誌謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1究背景與動機 1
1.2研究目的 4
1.3研究流程 5
第二章 文獻探討 7
2.1社群媒體(Social Media) 7
2.1.1 批踢踢實業坊(PTT) 8
2.1.2 社群媒體行銷(Social Media Marketing) 9
2.2主題偵測(Topic Detection) 9
2.2.1以分群方法為基礎(Clustering-based) 10
2.2.2以關鍵字為基礎(Keyword-based) 11
2.2.3以主題模型為基礎(Topic Model-based) 12
2.3詞頻與逆向文件頻率(Term Frequency - Inverse Document Frequency, TF-IDF) 12
2.4熱門主題趨勢預測(Trend Prediction of Topic Popularity) 13
2.4.1以迴歸模型為基礎(Regression-based) 14
2.4.2以機器學習方法為基礎(Machine Learning-based) 15
2.5深度學習(Deep Learning) 16
第三章 研究方法 20
3.1研究架構 20
3.2資料蒐集 22
3.3資料預處理 23
3.3.1斷詞處理 23
3.3.2停用詞過濾 25
3.4主題偵測與發現 26
3.4.1關鍵字萃取 26
3.4.2 K-Means分群分析 27
3.5熱門主題的趨勢預測 28
3.5.1循環神經網路(Recurrent Neural Network, RNN) 28
3.5.2長短期記憶(Long-Short Term Memory, LSTM) 34
第四章 實驗結果與討論 37
4.1 實驗環境 37
4.2 實驗設計 37
4.3 實驗資料蒐集與預處理 38
4.4 實驗評估指標 40
4.5 實驗結果 41
4.5.1 K-Means分群之主題群數K值的決定 41
4.5.2不同時間間隔的留言時間趨勢預測比較 43
4.5.3各主題留言趨勢與整體資料集留言趨勢之預測比較 45
4.5.4不同參數與架構下之預測模型的留言時間趨勢預測比較 47
第五章 結論 49
5.1 研究結論與貢獻 49
5.2 研究限制與未來展望 50
參考文獻 51
論文參考文獻:1. Statista線上數據統計平台,2010-2020年全球社群網路使用人數統計調查。
https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/,2016。
2. 資策會產業情報研究所[MIC],消費者網購資訊來源調查報告。
https://mic.iii.org.tw/IndustryObservations_PressRelease02.aspx?sqno=354,2014。
3. 國際研究機構Gartner。Gartner Predicts That Refusing to Communicate by Social Media Will Be as Harmful to Companies as Ignoring Phone Calls or Emails Is Today.
http://www.gartner.com/newsroom/id/2101515,2012。
4. 羅之盈,誰是你的雲端情人。
http://www.cw.com.tw/article/article.action?id=5070178,2015。
5. Rinu Boney,Theoretical Motivations for Deep Learning。
http://rinuboney.github.io/2015/10/18/theoretical-motivations-deep-learning.html,2015。
6. 周世惠,臺灣臉書效應:Facebook 行銷實戰,臺北市:天下雜誌,2011。
7. 謝儲鍵、陳敦源,「虛擬網絡行動者角色與政策審議品質關係之研究:一個應用社會網絡分析法的探索,民主與治理期刊」,第三卷,第一期,2016,第63-109頁。
8. 姜義臺,「運用社群媒體行銷圖書館服務之淺析」,臺北市立圖書館館訊,第三十三卷,第二期,2015,第55-68頁。
9. 陳欽雨、張書豪、張卿儀,「網路口碑、社群認同與知覺利益對網購意願之影響:以台灣區Facebook粉絲專頁為例」,電子商務研究,第十一卷,第四期,2013,第403-430頁。
10. 許懷文,基於主題與時間序列模型之社群主題趨勢預測,碩士論文,台北科技大學資訊與財金管理所,臺北,2016。
11. 黃靖傑,社群資訊主題偵測-以Twitter為例,碩士論文,國立成功大學資訊管理研究所,臺南,2014。
12. 董彥欣,探討FACEBOOK粉絲專頁使用意圖及其對品牌形象、購買意願之影響:以KKBOX為例,碩士論文,國立中正大學電訊傳播研究所,嘉義,2010。
13. 張皓程,社群粉絲團情感辨識之研究,碩士論文,台北科技大學資訊與財金管理所,臺北,2016。
14. 蔡季霖,社群媒體熱門主題偵測研究_以批踢踢實業坊為例,碩士論文,台北科技大學資訊工程系,臺北,2015。
15. AlSumait, L., Barbar´a D., & Domeniconi, C. (2008). On-Line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking. Eighth IEEE International Conference on Data Mining, 3-12.
16. Al-Mansouri, E., & Amos, S. (2016). Using Artificial Neural Networks and Sentiment Analysis to Predict Upward Movements in Stock Price. The Degree in Bachelor of Science in Computer Science.
17. Akita, R., Yoshihara, A., Matsubara, T., & Uehara, K. (2016). Deep Learning Stock Prediction Using Numerical and Textual information. IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), 1-6.
18. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 993-1022.
19. Dai, X. Y., Chen, Q. C., Wang, X. L., & Xu, J. (2010). Online topic detection and tracking of financial news based on hierarchical clustering. International Conference on Machine Learning and Cybernetics, 3341-3346.
20. Das, A., Roy, M., Dutta, S., Ghosh, S., & Das, A. K. (2015). Predicting Trends in the Twitter Social Network: A Machine Learning Approach. Swarm, Evolutionary, and Memetic Computing, SEMCCO 2014. Lecture Notes in Computer Science, vol 8947. Springer, Cham, 570-581.
21. Diao, Q. M., Jiang, J., Zhu, F. D., & Lim, E. P. (2012). Finding Bursty Topics from Microblogs. ACL12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 536-544.
22. de Vries, L., Gensler, S., & Leeflang, P. S.H. (2012). Popularity of Brand Posts on Brand Fan Pages: An Investigation of the Effects of Social Media Marketing. Journal of Interactive Marketing, Volume 26, Issue 2, 83–91.
23. Gao, N., Gao, L., He, Y. Y., Wang, H., & Sun, Q. (2013). Topic Detection based on Group Average Hierarchical Clustering. International Conference on Advanced Cloud and Big Data, 88-92.
24. Gao, T., Du, J. P., Wang, S., & Chen, L. P. (2010). Topic detection for emergency events based on FCM document clustering. 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT), 1181-1185.
25. Gong, J. B., & Sun, S. T. (2009). A New Approach of Stock Price Trend Prediction Based on Logistic Regression Model. International Conference on New Trends in Information and Service Science, 1366-1371.
26. Gu, C. M., & Wang, S. S. (2012). Emprical Study on Social Media Marketing Based on Sina Microblog. 2012 Second International Conference on Business Computing and Global Informatization, IEEE, 537-540.
27. He, Q., Chang, K. Y., Lim, E. P., & Banerjee, A. (2010). Keep It Simple with Time: A Reexamination of Probabilistic Topic Detection Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1795-1808.
28. Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Journal of Neural Computation, Volume 18, Issue 7, 1527-1554.
29. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, Vol. 313, Issue 5786, 504-507.
30. Hofmann, T. (1999). Probabilistic Latent Semantic Analysis. UAI99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, 289-296.
31. Hofmann, T. (2001). Unsupervised Learning by Probabilistic Latent Semantic Analysis. Journal of Machine Learning, Volume 42, Issue 1-2, 177-196.
32. Huang, H. Y., Zhang, W. Y., Deng, G. C., & Chen, J. (2014). Predicting Stock Trend Using Fourier Transform And Support Vector Regression. IEEE 17th International Conference on Computational Science and Engineering, 213-216.
33. Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, Vol. 28 Iss: 1, 11-21.
34. Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of Social Media. Business Horizons, 53(1), 59-68.
35. Kim, H. G., Lee, S. J., & Kyeong, S. H. (2013). Discovering Hot Topics using Twitter Streaming Data Social Topic Detection and Geographic Clustering. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), 1215-1220.
36. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 1106-1114.
37. Lai, R. K., Fan, C. Y., Huang, W. H., & Chang, P. C. (2009). Evolving and clustering fuzzy decision tree for financial time series data forecasting. Expert Systems with Applications, Volume 36, Issue 2, Part 2, 3761-3773.
38. Lee, M. C. (2009). Using support vector machine with a hybrid feature selection method to the stock trend prediction. Expert Systems with Applications, Volume 36, Issue 8, 10896-10904.
39. Lerman, K., & Hogg, T. (2010). Using a Model of Social Dynamics to Predict Popularity of News. WWW 10 Proceedings of the 19th international conference on World Wide Web, 621-630.
40. Lerman, K., & Hogg, T. (2012). Using Stochastic Models to Describe and Predict Social Dynamics of Web Users. Journal of ACM Transactions on Intelligent Systems and Technology (TIST), Volume 3, Issue 4, Article No.62.
41. Lewis, C.D. (1982). Industrial and business forecasting methods. Journal of Forecasting, Volume 2, Issue 2, 194–196, London: Butterworths.
42. Lin, H., Sun, B., Wu, J. J., & Xiong, H. T. (2016). Topic Detection from Short Text:A Term-based Consensus Clustering Method. In 13th International Conference on Service Systems and Service Management (ICSSSM), 1-6.
43. Long, R., Wang, H. F., Chen, Y. Q., Jin, O., & Yu Y. (2011). Towards Effective Event Detection, Tracking and Summarization on Microblog Data. Web-Age Information Management, WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg, 652-663.
44. Pinto, H., Almeida, J. M., & Gonçalves, M. A. (2013). Using Early View Patterns to Predict the Popularity of YouTube Videos. WSDM13 Proceedings of the sixth ACM international conference on Web search and data mining, 365-374.
45. Sainath, T. N., Kingsbury, B., Mohamed, A. R., & Ramabhadran, B. (2013). Learning filter banks within a deep neural network framework. IEEE Workshop on Automatic Speech Recognition and Understanding, 297-302.
46. Seide, F., Li G., & Yu, D. (2011). Conversational Speech Transcription Using Context-Dependent Deep Neural Networks. International Speech Communication Association, Interspeech 2011, 437-440.
47. Szabo, G., & Huberman, B. A. (2010). Predicting the Popularity of Online Content. Communications of the ACM, Volume 53, Issue 8, 80-88.
48. Szegedy, C., Liu, W., Jia, Y. Q., Sermanet, P., Reed, S, Anguelov, D., Erhan, ., Vanhoucke, V., & Rabinovich, A. (2015). Going Deeper with Convolutions. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9.
49. Tatar, A., Antoniadis, P., de Amorim, M. D., & Fdida, S.(2013). Ranking News Articles Based on Popularity Prediction. ASONAM12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining, 106-110.
50. Wang, M., & Wang, X. R. (2015). An Incremental Clustering Method of Micro-Blog Topic Detection. 11th International Conference on Natural Computation (ICNC), 655 - 660.
51. Wang, J. J., Li, L., Niu, D. X., & Tan, Z. F. (2012). An annual load forecasting model based on support vector regression with differential evolution algorithm. Applied Energy, Volume 94, 65-70.
52. Wartena, C., & Brussee, R. (2008). Topic Detection by Clustering Keywords. In 19th International Workshop on Database and Expert Systems Applications. 54-58.
53. Yue, L., Xiao, S. B., Lv, X. Q., & Wang, T. (2011). Topic Detection Based On Keyword. International Conference on Mechatronic Science, Electric Engineering and Computer (MEC), 464-467.
54. Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. European Conference on Computer Vision, 818-833.
55. Zhang, Q., & Bruno, G. (2016). Topical differences between Chinese language Twitter and Sina Weibo. Proceedings of the 25th International Conference Companion on World Wide Web, 625-628.
56. Zhao, W. X., Jiang, J., Weng, J. S., He, J., Lim, E. P., Yan, H. F., & Li, X. M. (2011). Comparing Twitter and Traditional Media Using Topic Models. ECIR11 Proceedings of the 33rd European conference on Advances in information retrieval, 338-349.
論文全文使用權限:不同意授權