現在位置首頁 > 博碩士論文 > 詳目
論文中文名稱:應用改良式邏輯斯迴歸於信用評分之研究 [以論文名稱查詢館藏系統]
論文英文名稱:A Study of Applying a Modified Logistic Regression on Credit Scoring [以論文名稱查詢館藏系統]
院校名稱:臺北科技大學
學院名稱:管理學院
系所名稱:資訊與財金管理系碩士班
畢業學年度:104
畢業學期:第二學期
中文姓名:吳毓芳
英文姓名:Yu-Fang Wu
研究生學號:103AB8017
學位類別:碩士
語文別:中文
口試日期:2016/07/05
指導教授中文名:翁頌舜
指導教授英文名:Sung-Shun Weng
口試委員中文名:陳仲儼;楊亨利;王貞淑
中文關鍵詞:隨機梯度下降、邏輯斯迴歸、信用評分、大數據分析、金融科技
英文關鍵詞:Stochastic Gradient Descent, Logistic Regression, Credit Scoring, Big Data Analytics, FinTech
論文中文摘要:對於金融機構而言,信用評分已成為銀行用來評估授信客戶是否違約或延遲還款的重要項目之一。隨著金融市場開放與網路服務蓬勃發展,我們已經處於金融大數據時代,如何運用數據科學方法進行大量資料的處理與分析,將是銀行所面臨的新議題,故本研究提出應用改良式邏輯斯迴歸分類法於信用評分的議題中,在邏輯斯迴歸方法裡結合隨機梯度下降(Stochastic Gradient Descent, SGD)演算法,達到目標函數優化效果,讓銀行在大量的授信資料裡有效地做好客戶的信用風險評估,以建立一套客觀的信用評分預測模型。另外,本研究亦採用邏輯斯迴歸進行比較與分析,藉以探討何種分類方法建立的信用評分模型較佳。研究結果發現,在Hadoop雲端運算環境下應用改良式邏輯斯迴歸演算法可有效地提升分類準確率,而且不論是原始屬性變數或者是篩選屬性後之資料分析結果,其準確率都可達到86%,同時可降低分類時所產生的型一誤差和型二誤差,在分類模型的建模時間執行上也具有成本較低之優勢。
論文英文摘要:For finance institutes, credit scoring has become the important issue that banks used to assess whether customers may pass due delinquency or not. With the development of financial market liberalization and Internet services to flourish, we are in the financially big data environment. How to use scientific methods to handle and analyze large amount of data have become a new issue faced by the banks. The study is to explore how to apply a modified logistic regression to solve the credit scoring problems. With the logistic regression method, we combine it with the stochastic gradient descent algorithm to reach the target function optimization. The consolidation method can help banks minimize the customers’ credit risks in a huge amount of data and construct an objective credit scoring model. In addition, the study also compared the logistic regression analysis in order to investigate the credit scoring models which were established by the preferred classification method. In the Hadoop cloud computing environment, we show that the application of modified logistic regression algorithm can effectively upgrade classification accuracy. Whether in the original attributes or the filter attributes, the proposed algorithm outperforms logistic regression. Both of them get accurate rate of 86% by credit scoring prediction models. Simultaneously, the modified logistic regression models are effective in reducing Type I and Type II errors. They have the lower cost in modeling time.
論文目次:中文摘要i
英文摘要ii
致謝iii
目錄iv
表目錄vi
圖目錄vii
第一章 緒論1
1.1研究背景與動機1
1.2研究目的2
1.3研究流程3
第二章 文獻探討5
2.1信用評分(Credit Scoring)5
2.2主成分分析(Principal Component Analysis, PCA)6
2.3隨機梯度下降(Stochastic Gradient Descent, SGD)8
2.4邏輯斯迴歸(Logistic Regression)10
2.4.1多元邏輯斯迴歸14
第三章 研究方法15
3.1分析模型及研究架構15
3.2資料預處理(Data Preprocessing)17
3.2.1平衡資料17
3.3主成分萃取19
3.4邏輯斯迴歸分類20
3.4.1最大概似估計(Maximum Likelihood Estimate)20
3.4.2先驗估計(Prior Estimate)21
3.4.3誤差函數(Error Function)23
3.5邏輯斯迴歸與SGD演算法24
3.6傳統的邏輯斯迴歸運作模式26
3.7信用評分模型之準確率衡量28
3.7.1預測模型評估 28
3.7.2分類準確率衡量30
3.7.3AUC評估準則30
第四章 實驗結果與數據分析32
4.1實驗環境32
4.2資料來源34
4.2.1屬性變數描述34
4.3實驗結果產生過程36
4.4實證分析39
4.4.1原始屬性之實驗結果39
4.4.2篩選屬性後之實驗結果41
4.4.3分析結果比較44
4.4.4信用評分預測模型48
第五章 結論49
5.1研究結論49
5.2管理意涵50
5.3研究限制與未來建議50
參考文獻51
附錄56
符號彙編58
論文參考文獻:[1]梁琪,「企業經營管理預警:主成分分析在logistic迴歸方法中應用」,管理工程學報,第十九卷,第一期,2005,第100-103頁。
[2]陳奕昌,利用資料探勘技術建構整合型信用評等最佳化模型,碩士論文,國立臺北科技大學商業自動化與管理研究所,臺北,2008。
[3]Abdou, H. A. and Pointon, J., “Credit scoring, statistical techniques and evaluation criteria: A review of the literature,” Intelligent Systems in Accounting, Finance & Management, vol. 18, no. 2-3, 2011, pp.59-88.
[4]Andrew G. and Gao J., “Scalable training of L1-regularized log-linear models,” In Proceedings of ICML, 2007, pp.33-40.
[5]Antonelli, P., Principal Component Analysis: A Tool for Processing Hyperspectral Infrared Data, New York: University of Wisconsin-Madison, 2001, pp.68-75.
[6]Beyer, M. A. and Laney, D., “The importance of big data: A definition,” Stanford CT: Gartner, 2012.
[7]Bottou, L., “Large-scale machine learning with stochastic gradient descent,” In Proceedings of COMPSTAT’ 2010, Springer, 2010, pp.177-186.
[8]Bottou, L. and Bousquet, O., “The Tradeoffs of Large Scale Learning,” In Advances in Neural Information Processing Systems, vol. 20, 2008, pp.161-168.
[9]Carpenter, B., Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression, Technical report, Alias-i., 2008.
[10]Chen, F., Deng, P., Wan, J., Zhang, D., Vasilakos, A. V., & Rong, X., “Data mining for the internet of things: literature review and challenges,” International Journal of Distributed Sensor Networks, vol. 12, 2015.
[11]Cortes, C. and Vapnik, V. N., “Support Vector Networks,” Machine Learning, vol. 20, 1995, pp.273-297.
[12]Danenas P., Garsva G. and Gudas S., “Credit risk evaluation model development using Support Vector based classifiers,” Proceedings of the International Conference on Computational Science, ICCS 2011, vol. 4, 2011, pp.1699-1707.
[13]Dean, J. and Ghemawat, S., “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, 2008, pp.107-113.
[14]Desai, V. S., Crook, J. N. and Overstreet, G. A., “A comparison of neural networks and linear scoring models in the credit union environment,” European Journal of Operational Research, vol. 95, no. 1, 1996, pp.24-37.
[15]Drummond, C., and Holte, R. C., “C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling Beats Over-Sampling,” In Workshop on Learning from Imbalanced Data Sets, vol. 11, 2003.
[16]Durand, D., Risk Elements in Consumer Installment Financing, New York: National Bureau of Economic Research, 1941.
[17]Fan Y. Q., Yang Y. L. and Qin Y. S., “Credit Scoring Model Based on PCA and improved tree augmented Bayesian Classification,” Information and Communications Technologies (IETICT 2013), IET International Conference, 2013, pp.169-175.
[18]Ghemawat, S., Gobioff, H. and Leung, S. T., “The Google file system,” In ACM SIGOPS operating systems review, vol. 37, no. 5, 2003, pp.29-43.
[19]Goldberger, J., Hinton, G. E., Roweis, S. T., & Salakhutdinov, R., “Neighbourhood components analysis,” In Advances in neural information processing systems, 2004, pp.513-520.
[20]Han, L., Han, L. and Zhao, H., “Orthogonal support vector machine for credit scoring,” Engineering Applications of Artificial Intelligence, vol. 26, no. 2, 2013, pp.848-862.
[21]Harris, T., “Credit scoring using the clustered support vector machine,” Expert Systems with Applications, vol. 42, no. 2, 2015, pp.741-750.
[22]He, H. and Garcia E. A., “Learning from Imbalanced Data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, 2009, pp.1263-1284.
[23]Hormozi, E., Akbari, M. K., Hormozi, H., & Javan, M. S., “Accuracy evaluation of a credit card fraud detection system on Hadoop MapReduce,” 5th IEEE Conference on In Information and Knowledge Technology (IKT), 2013, pp.35-39.
[24]Hsu, C. W. and Lin, C. J., “A comparison of methods for multiclass support vector machines,” IEEE transactions on Neural Networks, vol. 13, no. 2, 2002, pp.415-425.
[25]Huang, Z., Chen, H., Hsu, C. J., Chen W. H. and Wu, S., “Credit Rating Analysis with Support Vector Machines and Neural Networks: A Market Comparative Study,” Decision Support Systems, vol. 37, 2004, pp.543-558.
[26]Japkowicz, N. and Stephen, S., “The class imbalance problem: a systematic study,” Intelligent Data Analysis, vol. 6, no. 5, 2002, pp.429-449.
[27]Jung, B. C., Choi, S. I., Du, A. X., Cuzzocreo, J. L., Geng, Z. Z., Ying, H. S., Perlman, S. L., Toga, A. W., Prince, J. L. and Ying, S. H., “Principal component analysis of cerebellar shape on MRI separates SCA types 2 and 6 into two archetypal modes of degeneration,” Cerebellum(London, England), vol. 11, no. 4, 2012, pp.887-895.
[28]Kang D., Lim W., Shin K., Sael L. and Kang U., “Data/Feature Distributed Stochastic Coordinate Descent for Logistic Regression,” Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014, pp.1269-1278.
[29]Koutanaei, F. N., Sajedi, H. and Khanbabaei, M., “A Hybrid Data Mining Model of Feature Selection Algorithms and Ensemble Learning Classifiers for Credit Scoring,” Journal of Retailing and Consumer Services, vol. 27, 2015, pp.11-23.
[30]Kubat, M., Holte, R. and Matwin, S., “Learning when negative examples abound,” Proceedings of Europeanm Conference on Machine Learning, 1997, pp.146-153.
[31]Labrinidis, A. and Jagadish, H. V., “Challenges and opportunities with big data,” Proceedings of the VLDB Endowment, vol. 5, no. 12, 2012, pp.2032-2033.
[32]Laney, D., “3-D Data Management: Controlling Data Volume, Velocity and Variety”, META Group Research Note, February 6, 2001.
[33]Leong, C. K., “Credit risk scoring with bayesian network models,” Computational Economics, 2015, pp.1-24.
[34]Lewis, D. and Catlett, J., “Heterogeneous Uncertainty Sampling for Supervised Learning,” Proceedings of the 11th International Conference on Machine Learning, 1994, pp.144-156.
[35]Liao, T. W., “Classification of Weld Flaws with Imbalanced Class Data,” Expert Systems with Applications, no. 35, vol. 3, 2008, pp.1041-1052.
[36]Lin, J., & Kolcz, A., “Large-scale machine learning at twitter,” In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, 2012, pp.793-804.
[37]Liu B., Blasch E., Chen Y., Shen D. and Chen, G., “Scalable Sentiment Classification for Big Data Analysis Using Naive Bayes Classifier,” IEEE International Conference on Big Data, 2013, pp.99-104.
[38]Ma, J., Saul, L. K., Savage, S. and Voelker, G. M., “Identifying suspicious URLs: an application of large-scale online learning,” In Proceedings of the 26th annual international conference on machine learning, ACM, 2009, pp.681-688.
[39]MacQueen, J., “Some Methods for Classification and Analysis of Multivariate Observations,” In Fifth Berkeley Symposium on Mathematics, Statistics, and Probabilities, vol. 1, 1967, pp.281-297.
[40]Mahout. Available at http://mahout.apache.org/.
[41]Martens, D., Baesens, B., Gestel, T. V. and Vanthienen, J., “Comprehensible credit scoring models using rule extraction from support vector machines,” European Journal of Operational Research, vol. 183, 2007, pp.1466-1476.
[42]McAfee, A., Brynjolfsson, E., Davenport, T. H., Patil, D. J. and Barton, D., “Big data,” The management revolution. Harvard Bus Rev, vol. 90, no. 10, 2012, pp.61-67.
[43]Minelli, M., Chambers, M. and Dhiraj, A., Big data, big analytics: emerging business intelligence and analytic trends for todays businesses, New York: John Wiley & Sons, 2012.
[44]Nie, G., Rowe, W., Zhang, L., Tian, Y. and Shi, Y., “Credit card churn forecasting by logistic regression and decision tree,” Expert Systems with Applications, vol. 38, no. 12, 2011, pp.15273-15285.
[45]Ong, C. S., Huang, J. J. and Tzeng, G. H., “Building credit scoring models using genetic programming,” Expert Systems with Application, vol. 29, 2005, pp.41-47.
[46]Owen, S., Anil, R., Dunning, T. and Friedman, E., Mahout in Action, United States: Manning Publications Co., Greenwich, CT, 2011.
[47]Patra, S., Shanker, K. and Kundu, D., “Sparse maximum margin logistic regression for credit scoring,” 8th IEEE International Conference on Data Mining, ICDM, 2008, pp.977-982.
[48]Pearson, K., “Principal components analysis,” The London, Edinburgh and Dublin Philosophical Magazine and Journal, vol. 6, no. 2, 1901, pp.566.
[49]Rosenblatt, F., “The Perceptron: A perceiving and recognizing automaton,” Technical Report 85-460-1, Project PARA, Cornell Aeronautical Lab, 1957.
[50]Thomas, L. C., Edelman, D. B. and Crook, J. N., Credit Scoring and Its Applications, Philadelphia: Society for Industrial and Applied Mathematics, 2002.
[51]Tibshirani, R., “Regression shrinkage and selection via the Lasso,” Journal of the Royal Statistical Society, Series B, vol. 58, no. 1, 1996, pp.267-288.
[52]Wang, H., Xu, Q., and Zhou, L., “Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble,” PLoS one, vol. 10, no. 2, 2015, e0117844.
[53]Wang, G., Hao, J. X., Ma, J. and Jiang, H. B., “A comparative assessment of ensemble learning for credit scoring,” Expert Systems with Applications, vol. 38, no. 1, 2011, pp. 223-230.
[54]White, T., Hadoop: The definitive guide, OReilly Media, Inc., 2012.
[55] Widrow, B., and Hoff, M. E., “Adaptive Switching Circuits,” IRE WESCON Conv. Record, Part 4, 1960, pp.96-104.
[56]Wu, D. D., Olson D. L. and Luo C., “A decision support approach for accounts receivable risk management,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 12, 2014, pp.1624-1632.
[57]Wu, X., Zhu, X., Wu, G. and Ding, W., “Data mining with big data,” IEEE Transactions on Knowledge and Data Engineering, vol. 26 , no. 1, 2014, pp.97-107.
[58]Xu, L. and Chow, M. Y., “A classification approach for power distribution systems fault cause identification,” IEEE Transactions on Power Systems, vol. 21, no. 1, 2006, pp.53-60.
[59]Yoo, I., Alafaireet, P., Marinov, M., Pena-Hernandez, K., Gopidi, R., Chang, J. F. and Hua, L., “Data mining in healthcare and biomedicine: a survey of the literature,” Journal of medical systems, vol. 36, no. 4, 2012, pp.2431-2448.
[60]Zhou, L., & Wang, H., “Loan default prediction on large imbalanced data using random forests,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 10, no. 6, 2012, pp.1519-1525.
[61]Zhou, L., Xu, Q. and Wang H., Rotation survival forest for right censored data., PeerJ 3:e1009, https://doi.org/10.7717/peerj.1009, 2015.
論文全文使用權限:不同意授權