現在位置首頁 > 博碩士論文 > 詳目
  • 同意授權
論文中文名稱:基於中文詞網之領域詞義區分試驗 [以論文名稱查詢館藏系統]
論文英文名稱:Some Experiments on Domain WSD based on Chinese Wordnet [以論文名稱查詢館藏系統]
院校名稱:臺北科技大學
學院名稱:人文與社會科學學院
系所名稱:應用英文系碩士班
畢業學年度:99
出版年度:100
中文姓名:施孟賢
英文姓名:Meng-Hsien Shih
研究生學號:97548003
學位類別:碩士
語文別:英文
口試日期:2011-03-30
論文頁數:77
指導教授中文名:洪媽益
口試委員中文名:謝舒凱;龔書萍;黃希敏
中文關鍵詞:詞義區分詞義消歧中文領域詞義啟發式詞義區分
英文關鍵詞:WSDChinese domain senseheuristic rule based WSD
論文中文摘要:詞義區分是機器翻譯、資訊檢索以及摘要系統的必備工作。近年來由於計算語言學的迅速發展,詞義區分已被視為下一個需要解決的問題。在各式各樣的詞義區分類型中,只針對特定的詞彙進行詞義區分的準確率中英文皆可達到七成以上,對文件中每個字進行詞義區分的英文系統可亦可達七成的準確率,然而目前並無相當的中文詞義區分系統。

本文提出一中文詞義區分系統,運用中文詞網的特性以及所設計的啟發式規則,進行領域文件的詞義區分:系統先找出詞的領域詞義,若非領域詞則試圖從該詞所有詞義與上下文詞義之重疊判斷可能詞義,最後再考慮該詞之原型詞義。本詞義區分系統目前可針對領域文件(例如環保文件)中動詞及名詞進行詞義區分,經初步測試準確率可達百分之五十六。
論文英文摘要:Word Sense Disambiguation (WSD) is essential for language understanding systems such as machine translation, information retrieval, and summarization systems. WSD has also been considered the next crucial task to be taken. Among various WSD tasks in Chinese, the lexical sample task achieves a precision rate of more than 70% , so is the all-words task in English, but currently no Chinese all-words WSD system is available.

This thesis presents a WSD system for domain texts. It is considered that POS tagging and heuristic rules can help eliminate sense ambiguity. Three heuristic rules are applied: first consider domain senses for words in the text; if no domain sense is available, identify the intended non-domain sense from the overlapping of sense definition from context words (Lesk algorithm); if the above two rules do not apply, assume prototype senses are more likely to apply to a non-domain word in a domain text. The system achieved 56% precision rate on nouns and verbs in a domain (e.g., environment-related) text.
論文目次:ABSTRACT (Chinese) i
ABSTRACT (English) ii
ACKNOWLEDGEMENTS iii
Table of Contents iv
List of Tables vii
List of Figures viii
Chapter 1 INTRODUCTION 1
Chapter 2 RELATED WORKS 5
2.1 Word Sense Disambiguation on Generic Domain 5
2.1.1 Supervised Approach 6
2.1.2 Knowledge-based Approach 9
2.1.2.1 Thesaurus-based Disambiguation 9
2.1.2.2 Disambiguation based on Translation 10
2.1.2.3 Disambiguation based on Sense Definition 12
2.1.2.4 WordNet-based Disambiguation 13
2.1.2.5 The Problem of Knowledge-based Approach 15
2.2 Word Sense Disambiguation on Specific Domains 15
2.3 Semantic Evaluation (SemEval) 17
2.4 Princeton English WordNet 18
Chapter 3 METHODOLOGY 20
3.1 Proposed System 20
3.1.1 Domain Sense Detection 22
3.1.2 Preprocessing of Domain Texts 22
3.1.3 Heuristic Rules 22
3.2 Experiment 24
3.2.1 Settings 24
3.2.1.1 Document Sources 24
3.2.1.2 Chinese Wordnet 27
3.2.1.3 Yahoo! Segmentation/Tagging Service 30
3.2.2 Procedures 30
3.2.2.1 Domain Sense Detection 31
3.2.2.2 Preprocessing 31
3.2.2.3 Sense Tagging 33
3.2.2.4 System Evaluation 33
Chapter 4 RESULTS AND DISCUSSIONS 34
4.1 Results 34
4.1.1 Domain Sense Detection 34
4.1.2 Preprocessing 37
4.1.3 System Output 38
4.2 Discussion 41
4.2.1 Domain Sense Detection Issue 43
4.2.2 Parts of Speech Issue 44
4.2.3 Lesk Implementation 45
4.2.4 Adjective and Other POS Issues 46
4.2.5 The Issue of Metaphorical Usage 48
4.2.6 The Ceiling of System Performance 50
Chapter 5 CONCLUSION 52
5.1 Contribution 52
5.2 Future Development 53
REFERENCES 55
APPENDIX 59
A. Original Document for Disambiguation Test (Excerpt) 59
B. Test Document (Excerpt) 61
C. Metadata of Merging Words into Sentences for POS Tagging 69
D. Metadata with Preprocessing of Segmentation/POS Tagging 71
E. Disambiguated Text (Excerpt) 73
F. Main Program Code 75
List of Abbreviations 77
論文參考文獻:Agirre, E., Lacalle, O. Lopez de, Fellbaum, C., Hsieh, S.-K., Tesconi, M., Monachini, M., et al. (2010). Semeval-2010 task 17: All-words word sense disambiguation on a specific domain. In Proceedings of the 5th International Workshop on Semantic Evaluations (SemEval-2010), Association for Computational Linguistics.
Bar-Hille, Y. (1960). Automatic translation of languages. In D. Booth & R. E. Meagher (Eds.), Advances in computers. New York: Academic.
bass. (2010). In Merriam-Webster online dictionary.
Bentivogli, L., Forner, P., Magnini, B., & Pianta, E. (2004). Revising WordNet Domains Hierarchy: Semantics, Coverage, and Balancing. In COLING 2004 Workshop on Multilingual Linguistic Resources, Geneva, Switzerland, August 28, 2004, pp. 101-108.
Chan, Y. S., & Ng, H. T. (2007). Domain adaptation with active learning for word sense disambiguation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics.
Chen, A., Zhou, Y., Zhang, Y., & Sun, G. (2009). Unigram language model for Chinese word segmentation. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing (Second International Chinese Segmentation Bakeoff).
Chu, C. C., & Chi, T.-J. (1999). A cognitive-functional grammar of mandarin Chinese. Taipei: Crane.
Cole, D. (2004). The Chinese Room Argument. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. Stanford, CA: Stanford University.
Dagan, I., & Itai, A. (1994). Word sense disambiguation using a second language monolingual corpus. Computational Linguistics, 20, 563-596.
Dagan, I., Itai, A., & Schwall, U. (1991). Two languages are more informative than one. In ACL 29, pp. 130-137.
Dennett, D. (1980). The milk of human intentionality. Behavioral and Brain Sciences, 3, 425-430.
Escudero, G., Marquez, L., & Rigau, G. (2000). An empirical study of the domain dependence of supervised word sense disambiguation systems. In Proceedings of EMNLP/VLC00.
Fisher, J. A. (1997). The wrong stuff: Chinese rooms and the nature of understanding. Philosophical Investigations, 11, 279-299.
Gale, W. A., Church, K.W., & Yarowsky, D. (1992). A methods for disambiguating word senses in a large corpus. Computers and the Humanities, 26, 415-439.
Harnad, S. (1991). Other bodies, other minds: A machine incarnation of an old philosophical problem. Minds and Machines, 1, 5-25.
Hauser, L. (1997). Searle’s Chinese box: Debunking the Chinese room argument. Minds and Machines, 7, 199-226.
Hayes, P. J. (1982). Introduction. In P. J. Hayes & M. M. Lucas (Eds.), Proceedings of the cognitive curricula conference.
Hofstadter, D. (1980). Reductionism and religion. Behavioral and Brain Science, 3, 433-434.
Huang, C.-R., Chen, K.-J., Chang, L.-L. (1997). Segmentation standard for Chinese language processing. Computational Linguistics and Chinese Language Processing, 2(2), 47-62.
Huang, C.-R., Tseng, E. I. J., Tsai, D. B. S., & Murphy, B. (2003). Cross-lingual portability of semantic relations: Bootstrapping Chinese Wordnet with English WordNet relations. Language and Linguistics, 4, 509-532.
Jurafsky, D., & Martin, J. H. (2009). Speech and language processing. New Jersey: Pearson.
Kilgarriff, A., & Rosenzweig, J. (2000). Framework and results for English Senseval. Computers and the Humanities, 34, 15-48.
Lesk, M. E. (1986). Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceeding of the Fifth International Conference on Systems Documentation, Toronto, CA, pp. 24-26. ACM.
Li, C. N., & Thompson, S. A. (1989). Mandarin Chinese: A functional reference grammar. Los Angeles, CA: University of California Press.
Li, X., Szpakowicz, S., & Matwin, S. (1995). A Wordnet-based algorithm for word sense disambiguation. In the 14th International Joint Conference on Artificial Intelligence IJCAI-95, Montreal, Canada.
Manning, C. D., & Schutze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
Masterson, M. (1967). Mechanical pidgin translation. In D. Booth (Ed.), Machine translation (p. 195-227). Amsterdam: North-Holland Publishing Company.
Miller, G. A., Bechwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3, 235-244.
Navigli, R. (2009). Word sense disambiguation: A survey. ACL Computing Surveys, 41(2), 10:1-10:69.
Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3, 417-424.
Walker, D. E. (1987). Knowledge resources tools for accessing large text files. In S. Nirenburg (Ed.), Machine translation: Theoretical and methodological issues (p. 247-261). Cambridge, MA: Cambridge University Press.
Wang, H. (2002). A study on noun sense disambiguation based on syntagmatic features. International Journal of Computational Linguistics and Chinese Language Processing, 7, 77-88.
Weaver, W. (1955). Translation. In W. N. Locke & A. D. Boothe (Eds.), Machine translation of languages. Cambridge, MA: MIT Press.
論文全文使用權限:同意授權於2011-09-01起公開