Title

中文机构名称的识别与分析

Alternative Title

Identification and analysis of Chinese organization and institution names

Document Type

Journal article

Source Publication

中文信息学报 = Journal of Chinese Information Processing

Publication Date

1-1-1997

Volume

11

Issue

4

First Page

21

Last Page

32

Publisher

中国科学院软件研究所

Keywords

机构名称, 专有名词, 短语分析, 自然语言处理, Organization and institution names, Proper nouns, Phrase analysis, Natural language processing

Abstract

中文机构名称数目庞大,层出不穷,绝大多数未能收入词典,给自然语言处理带来困扰。但是,从语言学的角度来看,机构名称是一种偏正复合式专有名词,同时又是一类较为简单的偏正名词词组,有自己的结构规律和形态标记。本文以高校名称为重点,以中国内地、香港和台湾三地实际语料为依据,从语言学和计算机技术两方面对机构名称的识别与分析展开讨论,并总结出相应的规则。根据这些规则,对六百多万字的三地语料库作高校名称识别,正确率(指前后界定位均正确)达97.3%,召回率为96.9%。这些规则还可应用于拼音汉字智能转换和机器翻译等其它领域。

As important proper nouns, Chinese names of organizations and institutions play an indispensable role in language communication. Unfortunately, due to their infinite quantity, constant creation and disappearance, and relative length and complexity, most of these names have failed to find their way into Chinese dictionaries of computer systems. Linguistically, however, these proper nouns can be viewed as a special group of compound nouns and as a simple category of noun phrase, possessing their own formation rules and physical markers. This paper presents a pioneer discussion on the analysis of Chinese names of organizations and institutions from the computational point of view. Useful linguistic rules has been drawn from the discussion and applied to the identification of names of organizations and institutions in the 6,000,000 character Mainland Hongkong Taiwan corpus of modern Chinese developed by Hong Kong Polytechnic University. Preliminary experiments show that both precision and recall rates for identifying names of colleges and universities are over 96%.

Language

Chinese (Simplified)

Print ISSN

10030077

Recommended Citation

张小衡、王玲玲 (1997)。中文机构名称的识别与分析。《中文信息学报》,11(4),21-32。

This document is currently not available here.

Share

COinS