In linguistics and natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized...
8 KB (857 words) - 01:47, 23 August 2024
begin being deciphered. Large collections of parallel texts are called parallel corpora (see text corpus). Alignments of parallel corpora at sentence level...
12 KB (1,182 words) - 13:40, 27 July 2024
Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural corpora). Corpora are balanced, often stratified collections...
20 KB (2,338 words) - 05:52, 9 September 2024
Look up corpus, corpora, or corpuses in Wiktionary, the free dictionary. Corpus is Latin for "body". It may refer to: Text corpus, in linguistics, a large...
2 KB (315 words) - 06:44, 26 April 2024
Text corpora (singular: text corpus) are large and structured sets of texts, which have been systematically collected. Text corpora are used by corpus...
23 KB (2,444 words) - 07:31, 2 May 2024
University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American...
9 KB (1,056 words) - 13:22, 29 February 2024
The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University...
4 KB (345 words) - 10:40, 19 November 2022
British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British...
31 KB (3,894 words) - 01:18, 14 June 2024
The AsoSoft text corpus is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains...
1 KB (132 words) - 18:09, 24 November 2023
Spanish. Each estimate comes from an analysis of a different text corpus. A text corpus is a large collection of samples of written and/or spoken language...
27 KB (750 words) - 20:36, 21 August 2024
2019, the corpus had grown to 560 million words. As of November 2021, the Corpus of Contemporary American English is composed of 485,202 texts. According...
9 KB (1,135 words) - 01:14, 12 September 2024
The Neo-Assyrian Text Corpus Project is an international scholarly project aimed at collecting and publishing ancient Assyrian texts of the Neo-Assyrian...
9 KB (117 words) - 18:39, 22 August 2024
The Electronic Text Corpus of Sumerian Literature (ETCSL) is an online digital library of texts and translations of Sumerian literature that was created...
4 KB (368 words) - 11:40, 17 March 2024
Habeas corpus (/ˈheɪbiəs ˈkɔːrpəs/ ; from Medieval Latin, lit. 'that you have the body') is a recourse in law by which a report can be made to a court...
74 KB (9,435 words) - 03:36, 7 September 2024
The Enron Corpus is a database of over 600,000 emails generated by 158 employees of the Enron Corporation in the years leading up to the company's collapse...
7 KB (712 words) - 16:21, 20 May 2024
The Corpus of Electronic Texts, or CELT, is an online database of contemporary and historical documents relating to Irish history and culture. As of 8...
3 KB (196 words) - 22:31, 24 February 2024
Corpus (OEC), a massive text corpus that is written in the English language. In total, the texts in the Oxford English Corpus contain more than 2 billion...
16 KB (858 words) - 17:56, 20 March 2024
Word list (category Articles lacking in-text citations from December 2023)
of occurrence either by levels or as a ranked list) within some given text corpus, serving the purpose of vocabulary acquisition. A lexicon sorted by frequency...
26 KB (2,741 words) - 21:10, 28 August 2024
Sumerian literature (redirect from Sumerian texts)
Sumerian literature constitutes the earliest known corpus of recorded literature, including the religious writings and other traditional stories maintained...
9 KB (1,020 words) - 19:42, 19 August 2024
"The Electronic Text Corpus of Sumerian Literature". Etcsl.orinst.ox.ac.uk. Retrieved 30 December 2018. "The Electronic Text Corpus of Sumerian Literature"...
20 KB (2,185 words) - 14:12, 16 September 2024
Corpus of Sumerian Literature. Archived from the original on 2012-05-15. Retrieved 2010-02-20. "A balbale to Nanna (Nanna B)". Electronic Text Corpus...
41 KB (4,130 words) - 02:49, 4 July 2024
is also called the corpus cavernosum urethrae in older texts. The proximal part of the corpus spongiosum is expanded to form the urethral bulb, and lies...
4 KB (405 words) - 18:07, 15 July 2024
The corpus callosum (Latin for "tough body"), also callosal commissure, is a wide, thick nerve tract, consisting of a flat bundle of commissural fibers...
31 KB (3,601 words) - 14:01, 10 September 2024
digitization, ancient text corpora are more accessible than ever before. Tools such as the Perseus Digital Library and the Digital Corpus of Sanskrit have...
47 KB (5,403 words) - 08:13, 24 August 2024
The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a corpus of modern-day (post-1940) written and spoken texts in Scottish English...
3 KB (349 words) - 20:04, 14 April 2023
The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently...
5 KB (605 words) - 01:16, 14 June 2024
Avestan (category Articles containing Persian-language text)
boxes, or other symbols instead of Avestan characters. The Avestan text corpus was composed in the ancient Iranian satrapies of Arachosia, Aria, Bactria...
34 KB (3,261 words) - 06:38, 11 September 2024
measure of a language model's performance is its perplexity on a given text corpus. Perplexity is a measure of how well a model is able to predict the contents...
156 KB (13,448 words) - 12:30, 15 September 2024
Lydian language (category Articles containing Ancient Greek (to 1453)-language text)
Dictionary of the Ancient Anatolian Corpus Languages (eDiAna)". Ludwig-Maximilians-Universität München. Lydian Corpus Palaeolexicon - Word study tool of...
43 KB (3,541 words) - 11:42, 19 July 2024
The Corpus Hermeticum is a collection of 17 Greek writings whose authorship is traditionally attributed to the legendary Hellenistic figure Hermes Trismegistus...
10 KB (1,200 words) - 13:12, 1 June 2024