语料库一词在语言学上意指大量的文本,通常经过整理,具有既定格式与标记;事实上,语料库英文 "text corpus" 的涵意即为 "body of text"。
语料库列表
多语
- 点通多语言语音语料库
- 宾州大学语料库
- XML 语料库
英文
- Collin's Cobuild Project - 成果:Collin's当代英语辞典、及当代英语文法。
中文
- 中央研究院平衡语料库
- LIVAC汉语共时语料库
- 北京大学语料库
- 兰开斯特大学汉语平衡语料库
- 兰开斯特-洛杉矶汉语口语语料库
- 语料库语言学在线
- 北京森林工作室汉语句义结构标注语料库[永久失效链接]
外部链接
- Free, web-based corpora (45-425 million words each): American (COCA, COHA, TIME), British (BNC), Spanish, Portuguese
- 开放式目录计划中和Computational Linguistics相关的内容
- ACL SIGLEX Resource Links: Text Corpora
- The Leipzig Glossing Rules: Conventions for interlinear morpheme-by-morpheme glosses
- Developing Linguistic Corpora: a Guide to Good Practice[永久失效链接]
- An interface for querying automatically-constructed virtual corpora[失效链接].
- TEP: Tehran English-Persian Parallel Corpus.
- 1 Building synchronous parallel corpora of the languages taught at the Faculty of Arts of Charles University.
- TS Corpus - A Turkish Corpus freely available for academic research.
- Turkish National Corpus - A general-purpose corpus for contemporary Turkish
- Free web-based English corpus to download (3 billion words)