• Apache Hadoop ( /həˈduːp/) is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving...
    49 KB (5,093 words) - 03:30, 18 May 2024
  • MapReduce (redirect from Hadoop map)
    implementation that has support for distributed shuffles is part of Apache Hadoop. The name MapReduce originally referred to the proprietary Google technology...
    46 KB (5,491 words) - 21:02, 10 May 2024
  • Apache Parquet (category Hadoop)
    storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most...
    10 KB (851 words) - 09:27, 22 June 2024
  • Thumbnail for Apache ZooKeeper
    Apache ZooKeeper (category Hadoop)
    large distributed systems (see Use cases). ZooKeeper was a sub-project of Hadoop but is now a top-level Apache project in its own right. ZooKeeper's architecture...
    8 KB (714 words) - 15:45, 24 October 2023
  • the benefits of dimensional models on Hadoop and similar big data frameworks. However, some features of Hadoop require us to slightly adapt the standard...
    13 KB (1,656 words) - 19:36, 17 January 2024
  • Thumbnail for Data lake
    enterprises were "starting to extract and place data for analytics into a single, Hadoop-based repository." Many companies use cloud storage services such as Google...
    9 KB (1,047 words) - 20:26, 23 July 2024
  • Thumbnail for Hue (software)
    Hue (software) (redirect from Hue (Hadoop))
    Hue (Hadoop User Experience) is an open-source SQL Cloud Editor, licensed under the Apache License 2.0. Hue is an open-source SQL Assistant for querying...
    2 KB (119 words) - 17:42, 17 May 2023
  • Thumbnail for Apache Hive
    Apache Hive (category Hadoop)
    Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface...
    21 KB (2,300 words) - 14:27, 2 July 2024
  • Thumbnail for Apache Avro
    procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes...
    13 KB (1,326 words) - 18:53, 24 April 2024
  • Thumbnail for Apache Spark
    Apache Spark (category Hadoop)
    testing), Hadoop YARN, Apache Mesos or Kubernetes. For distributed storage, Spark can interface with a wide variety, including Alluxio, Hadoop Distributed...
    30 KB (2,732 words) - 13:14, 8 July 2024
  • Sqoop (category Hadoop)
    interface application for transferring data between relational databases and Hadoop. The Apache Sqoop project was retired in June 2021 and moved to the Apache...
    6 KB (439 words) - 19:04, 17 July 2024
  • learning algorithms implemented on Hadoop Apache Cassandra - a column-oriented database that supports access from Hadoop HPCC - LexisNexis Risk Solutions...
    30 KB (1,051 words) - 10:14, 17 August 2024
  • Apache Hadoop is an open source software project sponsored by The Apache Software Foundation which implements the MapReduce architecture. Hadoop now encompasses...
    25 KB (3,169 words) - 10:23, 12 December 2023
  • Thumbnail for Actian Vector
    processing version of Vector, in Hadoop with storage in HDFS. Actian Vortex was later renamed to Actian Vector in Hadoop. The basic architecture and design...
    22 KB (1,962 words) - 14:26, 11 August 2024
  • Thumbnail for Open source
    which supports community projects such as the open-source framework Apache Hadoop and the open-source HTTP server Apache HTTP. The sharing of technical information...
    105 KB (11,853 words) - 22:58, 12 August 2024
  • Cloudera (category Hadoop)
    in 2009 by Doug Cutting, a co-founder of Hadoop. Cloudera originally offered a free product based on Hadoop, earning revenue by selling support and consulting...
    14 KB (1,070 words) - 07:11, 7 August 2024
  • Thumbnail for Apache Nutch
    project. Nutch originated with Doug Cutting, creator of both Lucene and Hadoop, and Mike Cafarella. In June, 2003, a successful 100-million-page demonstration...
    13 KB (625 words) - 00:06, 12 June 2024
  • Thumbnail for Doug Cutting
    manages both projects. Cutting and Cafarella were also co-founders of Apache Hadoop. Cutting graduated from Stanford University in 1985 with a bachelor's degree...
    8 KB (686 words) - 15:33, 27 July 2024
  • big data: Alpine Data Labs, an analytics interface working with Apache Hadoop and big data Azure Data Lake is a highly scalable data storage and analytics...
    3 KB (329 words) - 13:38, 3 June 2024
  • Thumbnail for Ali Ghodsi
    resource management and scheduling design in distributed systems such as Hadoop. In 2013, he co-founded Databricks, a company that commercializes Spark...
    5 KB (353 words) - 19:57, 25 July 2024
  • Python-based open source implementation of a software forge Ambari: makes Hadoop cluster provisioning, managing, and monitoring dead simple Ant: Java-based...
    41 KB (4,631 words) - 23:08, 17 August 2024
  • Thumbnail for Lambda architecture
    updates completely replacing existing precomputed views.: 18  By 2014, Apache Hadoop was estimated to be a leading batch-processing system. Later, other, relational...
    11 KB (1,171 words) - 08:34, 1 August 2024
  • Thumbnail for Apache Solr
    as content management systems and enterprise content management systems. Hadoop distributions from Cloudera, Hortonworks and MapR all bundle Solr as the...
    15 KB (1,448 words) - 21:14, 17 June 2024
  • Cascading (software) (category Hadoop)
    abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any JVM-based...
    10 KB (776 words) - 19:08, 23 June 2023
  • heterogeneous cluster, disaster recovery, security, DMAPI, HSM and ILM. Hadoop's HDFS filesystem, is designed to store similar or greater quantities of...
    15 KB (1,677 words) - 01:50, 8 June 2024
  • Apache HBase (category Hadoop)
    Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System) or Alluxio, providing Bigtable-like capabilities for Hadoop. That is...
    10 KB (818 words) - 02:06, 12 April 2024
  • works with Apache Hadoop and other distributed file systems and Revolution Analytics has partnered with IBM to further integrate Hadoop into Revolution...
    18 KB (1,625 words) - 08:18, 5 August 2024
  • implementation called Hadoop used by Yahoo, Facebook, and others and the HPCC system architecture offered by LexisNexis Risk Solutions. Hadoop is an open source...
    12 KB (1,453 words) - 03:24, 31 July 2024
  • Thumbnail for JNBridge
    System for Hadoop Build an Excel add-in for HBase MapReduce Build a LINQ provider for HBase MapReduce Create .NET-based MapReducers for Hadoop Using a Java...
    10 KB (789 words) - 10:03, 23 August 2022
  • Apache Impala (category Hadoop)
    (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which...
    7 KB (577 words) - 15:30, 5 July 2024