• Thumbnail for Data set
    Data set (redirect from DataSeT)
    A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column...
    9 KB (885 words) - 16:55, 4 May 2024
  • These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the...
    257 KB (14,298 words) - 20:40, 25 September 2024
  • Thumbnail for Democracy-Dictatorship Index
    index of democracy and dictatorship or simply the DD index or the DD datasets was the binary measure of democracy and dictatorship first proposed by...
    32 KB (1,708 words) - 21:58, 25 September 2024
  • The Pile is an 886.03 GB diverse, open-source dataset of English text created as a training dataset for large language models (LLMs). It was constructed...
    13 KB (1,268 words) - 09:47, 17 July 2024
  • Thumbnail for MNIST database
    MNIST database (redirect from MNIST dataset)
    original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken...
    22 KB (2,049 words) - 11:38, 29 September 2024
  • This is a list of datasets for machine learning research. It is part of the list of datasets for machine-learning research. These datasets consist primarily...
    102 KB (6,344 words) - 22:03, 27 September 2024
  • A national lidar dataset refers to a high-resolution lidar dataset comprising most—and ideally all—of a nation's terrain. Datasets of this type typically...
    4 KB (97 words) - 08:12, 31 May 2024
  • Thumbnail for EPSG Geodetic Parameter Dataset
    EPSG Geodetic Parameter Dataset (also EPSG registry) is a public registry of geodetic datums, spatial reference systems, Earth ellipsoids, coordinate...
    5 KB (450 words) - 10:37, 21 July 2024
  • Thumbnail for Apache Spark
    followed by the Dataset API. In Spark 1.x, the RDD was the primary application programming interface (API), but as of Spark 2.x use of the Dataset API is encouraged...
    30 KB (2,735 words) - 05:25, 30 September 2024
  • Google Dataset Search is a search engine from Google that helps researchers locate online data that is freely available for use. The company launched...
    4 KB (383 words) - 21:45, 14 August 2023
  • Thumbnail for CORA dataset
    database ReAnalysis) is a global oceanographic temperature and salinity dataset produced and maintained by the French institute IFREMER. Most of those...
    7 KB (571 words) - 21:48, 25 September 2023
  • datasets that describe qualities of different governments, annually published and publicly available for free. These datasets are a popular dataset among...
    12 KB (764 words) - 22:42, 19 September 2024
  • 80 Million Tiny Images is a dataset intended for training machine learning systems. It contains 79,302,017 32×32 pixel color images, scaled down from...
    3 KB (366 words) - 08:48, 23 May 2024
  • coordinating efforts across multiple agencies towards a National LIDAR Dataset. The first meeting, a National LIDAR Initiative Strategy Meeting, was held...
    18 KB (395 words) - 00:58, 20 June 2024
  • Thumbnail for Neural scaling law
    training dataset size, and training cost. In general, a neural model can be characterized by 4 parameters: size of the model, size of the training dataset, cost...
    37 KB (4,946 words) - 03:17, 1 October 2024
  • Thumbnail for Cross-validation (statistics)
    problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data)...
    42 KB (5,623 words) - 18:40, 25 June 2024
  • Thumbnail for Iris flower data set
    The iris data set is widely used as a beginner's dataset for machine learning purposes. The dataset is included in R base and Python in the machine learning...
    18 KB (930 words) - 11:28, 29 September 2024
  • COVID-19 datasets are public databases for sharing case data and medical information related to the COVID-19 pandemic. Johns Hopkins Coronavirus Resource...
    13 KB (880 words) - 04:28, 26 July 2024
  • became prevalent, some researchers constructed Internet-scale language datasets ("web as corpus"), upon which they trained statistical language models...
    156 KB (13,394 words) - 11:01, 4 October 2024
  • ImageNet (category Datasets in computer vision)
    these whitens the input data. There are various subsets of the ImageNet dataset used in various context, sometimes referred to as "versions". One of the...
    17 KB (1,841 words) - 04:07, 19 September 2024
  • machine learning. Klimt, Bryan; Yiming Yang (2004). "The Enron Corpus: A New Dataset for Email Classification Research". pp. 217–226. CiteSeerX 10.1.1.61.1645...
    7 KB (715 words) - 20:35, 28 September 2024
  • method of measuring how many different types (e.g. species) there are in a dataset (e.g. a community). Some more sophisticated indices also account for the...
    24 KB (3,321 words) - 15:34, 18 August 2024
  • IBM mainframe computers in the S/360 line, a data set (IBM preferred) or dataset is a computer file having a record organization. Use of this term began...
    14 KB (1,576 words) - 20:08, 17 May 2024
  • Thumbnail for Anscombe's quartet
    Anscombe's quartet comprises four datasets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very...
    10 KB (759 words) - 16:26, 3 October 2024
  • Thumbnail for LAION
    open-sourced artificial intelligence models and datasets. It is best known for releasing a number of large datasets of images and captions scraped from the web...
    11 KB (993 words) - 06:10, 31 August 2024
  • a sheep if located on a grassland. Statistical classification List of datasets for machine learning research Hierarchical classification Ron Kohavi; Foster...
    20 KB (2,209 words) - 21:56, 5 September 2024
  • Thumbnail for BioGRID
    The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions...
    19 KB (1,911 words) - 21:45, 26 August 2024
  • In health informatics, a national minimum dataset is a database of health encounters held by a central repository. "Minimum" implies that the data fields...
    1 KB (92 words) - 17:23, 20 August 2023
  • articles and then request a dataset containing word and n-gram frequencies and basic metadata. They are notified when the dataset is ready and may download...
    33 KB (2,992 words) - 03:41, 9 September 2024
  • Chinchilla-optimal dataset for Llama 3 8B is 200 billion tokens, but performance continued to scale log-linearly to the 75-times larger dataset of 15 trillion...
    35 KB (3,612 words) - 04:02, 29 September 2024