The Corpus of Contemporary American English (COCA) is a more than 560-million-word corpus of American English. It was created by Mark Davies, Professor of Corpus Linguistics at Brigham Young University.
The corpus is composed of more than 560 million words from 220,225 texts, including 20 million words from each of the years 1990 through 2017. The most recent update was made in December 2017. The corpus is used by approximately tens of thousands of people each month, which may make it the most widely used "structured" corpus currently available.
For each year, the corpus is evenly divided between the following five genres: spoken, fiction, popular magazines, newspapers, and academic journals. The texts come from a variety of sources:
Spoken: (85 million words) Transcripts of unscripted conversation from nearly 150 different TV and radio programs.
Fiction: (81 million words) Short stories and plays, first chapters of books 1990–present, and movie scripts.
Popular magazines: (86 million words) Nearly 100 different magazines, from a range of domains such as news, health, home and gardening, women's, financial, religion, and sports.
Newspapers: (81 million words) Ten newspapers from across the US, with text from different sections of the newspapers, such as local news, opinion, sports, and the financial section.
Academic Journals: (81 million words) Nearly 100 different peer-reviewed journals. These were selected to cover the entire range of the Library of Congress Classification system.
The corpus is free to search through its web interface, with a limit on the number of queries per day, and less-restricted access is available at cost.
The full corpus texts are available for a further fee.
The interface is the same as the BYU-BNC interface for the 100 million word British National Corpus, the 100 million word TIME Magazine corpus, and the 400 million word Corpus of *Historical* American English (COHA), 1810s–2000s (see links below)
Queries by word, phrase, alternates, substring, part of speech, lemma, synonyms (see below), and customized lists (see below)
The corpus is tagged by CLAWS, the same part of speech tagger that was used for the BNC and the TIME corpus
Chart listings (totals for all matching forms in each genre or year, 1990–present, as well as for subgenres) and table listings (frequency for each matching form in each genre or year)
Full collocates searching (up to ten words left and right of node word)
Re-sortable concordances, showing the most common words/strings to the left and right of the searched word
Comparisons between genres or time periods (e.g. collocates of 'chair' in fiction or academic, nouns with 'break the [N]' in newspapers or academic, adjectives that occur primarily in sports magazines, or verbs that are more common 2005–2010 than previously)
One-step comparisons of collocates of related words, to study semantic or cultural differences between words (e.g. comparison of collocates of 'small' and 'little', or 'Democrats' and 'Republicans', or 'men' and 'women', or 'rob' vs 'steal')
Users can include semantic information from a 60,000 entry thesaurus directly as part of the query syntax (e.g. frequency and distribution of synonyms of 'beautiful', synonyms of 'strong' occurring in fiction but not academic, synonyms of 'clean' + noun ('clean the floor', 'washed the dishes'))
Users can also create their own 'customized' word lists, and then re-use these as part of subsequent queries (e.g. lists related to a particular semantic category (clothes, foods, emotions), or a user-defined part of speech)
Note that the corpus is only available through the web interface, due to copyright restrictions.
American National Corpus
British National Corpus
Bank of English
Kauhanen, Henri (2011-03-21). "The Corpus of Contemporary American English: Background and history". VARIENG. Retrieved 2011-10-13.
^"Corpus of Contemporary American English". Corpus of Contemporary American English. Retrieved 20 July 2017.
^"BYU corpora: Premium". BYU corpora. Retrieved 20 July 2017.
^"Corpus data: Purchase". Retrieved 20 July 2017.
Davies, Mark (2010). "The Corpus of Contemporary American English as the First Reliable Monitor Corpus of English". Literary and Linguistic Computing. 25 (4): 447–65. doi:10.1093/llc/fqq018.
Bennett, Gena R. (2010). Using Corpora in the Language Learning Classroom: Corpus Linguistics for Teachers. Ann Arbor, Michigan: University of Michigan. p. 144. ISBN 978-0-472-03385-0.
Davies, Mark (2010). "More than a peephole: Using large and diverse online corpora". International Journal of Corpus Linguistics. 15 (3): 405–11. doi:10.1075/ijcl.15.3.13dav.
Anderson, Wendy; Corbett, John (2009), Exploring English with Online Corpora, Palgrave Macmillan, p. 205, ISBN 978-0-230-55140-4
Davies, Mark (2009). "The 385+ Million Word Corpus of Contemporary American English (1990–present)". International Journal of Corpus Linguistics. John Benjamins Publishing Company. 14 (2): 159–190(32). doi:10.1075/ijcl.14.2.02dav.
Lindquist, Hans (2009). Corpus Linguistics and the Description of English. Edinburgh University Press. ISBN 978-0-7486-2615-1.
Davies, Mark (2005). "The advantage of using relational databases for large corpora: Speed, advanced queries, and unlimited annotation". International Journal of Corpus Linguistics. John Benjamins Publishing Company. 10 (3): 307–334(28). doi:10.1075/ijcl.10.3.02dav.
The corpus of Global Web-based English (GloWbE; pronounced "globe") contains about 1.9 billion words of text from twenty different countries. This makes it about 100 times as large as other corpora like the International Corpus of English, and it allows for many types of searches that would not be possible otherwise. In addition to this online interface, you can also download full-text data from the corpus.
it is unique in the way that it allows you to carry out comparisons between different varieties of English. GloWbE is related to the many other corpora of English. 
Text corpora, English
American National Corpus
Bank of English
Bergen Corpus of London Teenage Language
British National Corpus
Cambridge English Corpus
Corpus of Contemporary American English
International Corpus of English
Oxford English Corpus
Spoken English Corpus
Wellington Corpus of Spoken New Zealand English
Text corpora, non-English
Croatian Language Corpus
Croatian National Corpus
Czech National Corpus
German Reference Corpus
National Corpus of Polish
Neo-Assyrian Text Corpus Project
Quranic Arabic Corpus
Russian National Corpus
Scottish Corpus of Texts and Speech
Slovenian National Corpus
Tehran Monolingual Corpus
Tekstaro de Esperanto
TenTen Corpus Family
Thesaurus Linguae Graecae
Dictionaries of English
Old and Middle English
An Anglo-Saxon Dictionary
Dictionary of Old English
Middle English Dictionary
Catholicon Anglicum (1483)
The English Schoole-Master (1596)
The New World of English Words (1658)
A New English Dictionary (1702)
An Universal Etymological English Dictionary (1721)
A Dictionary of the English Language (1755)
Webster's Dictionary (1828)
Richardson's New Dictionary
Imperial Dictionary (1847–1850)
Century Dictionary (1889–1891)
Dictionary of American English
Dictionary of American Regional English
New Oxford American
Random House Webster's
Webster's New World
Webster's Third New International Dictionary
World Book Dictionary
Concise Oxford English
Compact Oxford English
Shorter Oxford English
Oxford Dictionary of English
Dictionary of Canadianisms
Collaborative International Dictionary of English
Learners / ESL
Cambridge Advanced Learner's
Collins COBUILD Advanced
Longman Dictionary of Contemporary English
Macmillan English Dictionary for Advanced Learners
Merriam-Webster's Advanced Learner's
Oxford Advanced Learner's
^"Corpus of Web-Based Global English". www.english-corpora.org. Retrieved 2019-12-18.