• In linguistics and natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized...
    8 KB (857 words) - 01:47, 23 August 2024
  • Thumbnail for Parallel text
    begin being deciphered. Large collections of parallel texts are called parallel corpora (see text corpus). Alignments of parallel corpora at sentence level...
    12 KB (1,182 words) - 13:40, 27 July 2024
  • Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural corpora). Corpora are balanced, often stratified collections...
    20 KB (2,338 words) - 05:52, 9 September 2024
  • Look up corpus, corpora, or corpuses in Wiktionary, the free dictionary. Corpus is Latin for "body". It may refer to: Text corpus, in linguistics, a large...
    2 KB (315 words) - 06:44, 26 April 2024
  • Text corpora (singular: text corpus) are large and structured sets of texts, which have been systematically collected. Text corpora are used by corpus...
    23 KB (2,444 words) - 07:31, 2 May 2024
  • Thumbnail for Brown Corpus
    University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American...
    9 KB (1,056 words) - 13:22, 29 February 2024
  • The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University...
    4 KB (345 words) - 10:40, 19 November 2022
  • British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British...
    31 KB (3,894 words) - 01:18, 14 June 2024
  • The AsoSoft text corpus is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains...
    1 KB (132 words) - 18:09, 24 November 2023
  • Spanish. Each estimate comes from an analysis of a different text corpus. A text corpus is a large collection of samples of written and/or spoken language...
    27 KB (750 words) - 20:36, 21 August 2024
  • 2019, the corpus had grown to 560 million words. As of November 2021, the Corpus of Contemporary American English is composed of 485,202 texts. According...
    9 KB (1,135 words) - 01:14, 12 September 2024
  • The Neo-Assyrian Text Corpus Project is an international scholarly project aimed at collecting and publishing ancient Assyrian texts of the Neo-Assyrian...
    9 KB (117 words) - 18:39, 22 August 2024
  • Thumbnail for Electronic Text Corpus of Sumerian Literature
    The Electronic Text Corpus of Sumerian Literature (ETCSL) is an online digital library of texts and translations of Sumerian literature that was created...
    4 KB (368 words) - 11:40, 17 March 2024
  • Habeas corpus (/ˈheɪbiəs ˈkɔːrpəs/ ; from Medieval Latin, lit. 'that you have the body') is a recourse in law by which a report can be made to a court...
    74 KB (9,435 words) - 03:36, 7 September 2024
  • The Enron Corpus is a database of over 600,000 emails generated by 158 employees of the Enron Corporation in the years leading up to the company's collapse...
    7 KB (712 words) - 16:21, 20 May 2024
  • The Corpus of Electronic Texts, or CELT, is an online database of contemporary and historical documents relating to Irish history and culture. As of 8...
    3 KB (196 words) - 22:31, 24 February 2024
  • Corpus (OEC), a massive text corpus that is written in the English language. In total, the texts in the Oxford English Corpus contain more than 2 billion...
    16 KB (858 words) - 17:56, 20 March 2024
  • Word list (category Articles lacking in-text citations from December 2023)
    of occurrence either by levels or as a ranked list) within some given text corpus, serving the purpose of vocabulary acquisition. A lexicon sorted by frequency...
    26 KB (2,741 words) - 21:10, 28 August 2024
  • Thumbnail for Sumerian literature
    Sumerian literature constitutes the earliest known corpus of recorded literature, including the religious writings and other traditional stories maintained...
    9 KB (1,020 words) - 19:42, 19 August 2024
  • Thumbnail for Aratta
    "The Electronic Text Corpus of Sumerian Literature". Etcsl.orinst.ox.ac.uk. Retrieved 30 December 2018. "The Electronic Text Corpus of Sumerian Literature"...
    20 KB (2,185 words) - 14:12, 16 September 2024
  • Thumbnail for Sumerian religion
    Corpus of Sumerian Literature. Archived from the original on 2012-05-15. Retrieved 2010-02-20. "A balbale to Nanna (Nanna B)". Electronic Text Corpus...
    41 KB (4,130 words) - 02:49, 4 July 2024
  • Thumbnail for Corpus spongiosum (penis)
    is also called the corpus cavernosum urethrae in older texts. The proximal part of the corpus spongiosum is expanded to form the urethral bulb, and lies...
    4 KB (405 words) - 18:07, 15 July 2024
  • Thumbnail for Corpus callosum
    The corpus callosum (Latin for "tough body"), also callosal commissure, is a wide, thick nerve tract, consisting of a flat bundle of commissural fibers...
    31 KB (3,601 words) - 14:01, 10 September 2024
  • Thumbnail for Ancient text corpora
    digitization, ancient text corpora are more accessible than ever before. Tools such as the Perseus Digital Library and the Digital Corpus of Sanskrit have...
    47 KB (5,403 words) - 08:13, 24 August 2024
  • The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a corpus of modern-day (post-1940) written and spoken texts in Scottish English...
    3 KB (349 words) - 20:04, 14 April 2023
  • The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently...
    5 KB (605 words) - 01:16, 14 June 2024
  • Thumbnail for Avestan
    Avestan (category Articles containing Persian-language text)
    boxes, or other symbols instead of Avestan characters. The Avestan text corpus was composed in the ancient Iranian satrapies of Arachosia, Aria, Bactria...
    34 KB (3,261 words) - 06:38, 11 September 2024
  • measure of a language model's performance is its perplexity on a given text corpus. Perplexity is a measure of how well a model is able to predict the contents...
    156 KB (13,448 words) - 12:30, 15 September 2024
  • Lydian language (category Articles containing Ancient Greek (to 1453)-language text)
    Dictionary of the Ancient Anatolian Corpus Languages (eDiAna)". Ludwig-Maximilians-Universität München. Lydian Corpus Palaeolexicon - Word study tool of...
    43 KB (3,541 words) - 11:42, 19 July 2024
  • Thumbnail for Corpus Hermeticum
    The Corpus Hermeticum is a collection of 17 Greek writings whose authorship is traditionally attributed to the legendary Hellenistic figure Hermes Trismegistus...
    10 KB (1,200 words) - 13:12, 1 June 2024