• Apache Hadoop ( /həˈduːp/) is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework...
    49 KB (5,050 words) - 11:55, 23 January 2025
  • Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other...
    13 KB (1,068 words) - 16:52, 3 April 2025
  • Thumbnail for Apache Hive
    Apache Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface...
    21 KB (2,300 words) - 01:15, 14 March 2025
  • Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala...
    6 KB (555 words) - 13:30, 13 April 2025
  • Thumbnail for Apache Avro
    remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes...
    13 KB (1,326 words) - 05:49, 25 February 2025
  • MapReduce (redirect from Hadoop map)
    implementation that has support for distributed shuffles is part of Apache Hadoop. The name MapReduce originally referred to the proprietary Google technology...
    46 KB (5,480 words) - 18:47, 12 December 2024
  • Thumbnail for Apache ZooKeeper
    Apache Hadoop Apache Accumulo Apache HBase Apache Hive Apache Kafka Apache Drill Apache Solr Apache Spark Apache NiFi Apache Druid Apache Helix Apache Pinot...
    8 KB (718 words) - 22:38, 17 November 2024
  • Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System) or Alluxio...
    10 KB (833 words) - 12:42, 11 December 2024
  • past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark. Mahout also provides Java/Scala...
    8 KB (648 words) - 21:43, 7 July 2024
  • Cascading (software) (category Hadoop)
    abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any...
    10 KB (776 words) - 19:08, 23 June 2023
  • Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute...
    11 KB (979 words) - 18:51, 15 July 2022
  • Thumbnail for Apache Spark
    applications may be reduced by several orders of magnitude compared to Apache Hadoop MapReduce implementation. Among the class of iterative algorithms are...
    30 KB (2,752 words) - 16:06, 2 March 2025
  • platforms such as Apache Spark Beam, an uber-API for big data Bigtop: a project for the development of packaging and tests of the Apache Hadoop ecosystem. Bloodhound:...
    41 KB (4,631 words) - 16:59, 13 March 2025
  • Thumbnail for Apache ORC
    Hadoop ecosystem such as RCFile and Parquet. It is used by most of the data processing frameworks Apache Spark, Apache Hive, Apache Flink, and Apache...
    4 KB (246 words) - 11:36, 21 August 2024
  • Thumbnail for Apache Drill
    include: All Hadoop distributions (HDFS API 2.3+), including Apache Hadoop, MapR, CDH and Amazon EMR NoSQL: MongoDB, Apache HBase, Apache Cassandra Online...
    7 KB (700 words) - 15:30, 5 July 2024
  • Hortonworks (category Hadoop)
    Platform (HDP): based on Apache Hadoop, Apache Hive, Apache Spark Hortonworks DataFlow (HDF): based on Apache NiFi, Apache Storm, Apache Kafka Hortonworks DataPlane...
    7 KB (512 words) - 21:42, 17 January 2025
  • Thumbnail for Doug Cutting
    Mike Cafarella. The Apache Software Foundation now manages both projects. Cutting and Cafarella were also co-founders of Apache Hadoop. Cutting graduated...
    8 KB (686 words) - 15:33, 27 July 2024
  • Thumbnail for Apache Nutch
    have been spun out into their own subproject, called Hadoop. In January, 2005, Nutch joined the Apache Incubator, from which it graduated to become a subproject...
    13 KB (625 words) - 20:19, 5 January 2025
  • Thumbnail for XGBoost
    XGBoost (category Software using the Apache license)
    machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask. XGBoost gained much popularity and attention...
    14 KB (1,319 words) - 06:46, 25 March 2025
  • Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache...
    6 KB (575 words) - 06:58, 18 November 2024
  • Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix...
    5 KB (306 words) - 13:47, 12 November 2024
  • Apache Oozie is a server-based workflow scheduling system to manage Hadoop jobs. Workflows in Oozie are defined as a collection of control flow and action...
    3 KB (204 words) - 20:30, 27 March 2023
  • Sqoop (redirect from Apache Sqoop)
    between relational databases and Hadoop. The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic. Sqoop supports incremental...
    6 KB (439 words) - 19:04, 17 July 2024
  • Thumbnail for Lambda architecture
    data warehouse, Yahoo has taken a similar approach, also using Apache Storm, Apache Hadoop, and Druid.: 9, 16  The Netflix Suro project has separate processing...
    10 KB (1,145 words) - 02:33, 11 February 2025
  • sequence. Apache Hadoop is an open source software project sponsored by The Apache Software Foundation which implements the MapReduce architecture. Hadoop now...
    25 KB (3,139 words) - 02:30, 22 December 2024
  • term big data: Alpine Data Labs, an analytics interface working with Apache Hadoop and big data AvocaData, a two sided marketplace allowing consumers to...
    3 KB (334 words) - 16:29, 7 February 2025
  • sub-project of Hadoop, it became an Apache Software Foundation top level project in 2012. It was created by Edward J. Yoon, who named it (short for "Hadoop Matrix...
    6 KB (589 words) - 04:39, 6 January 2024
  • Thumbnail for Oracle NoSQL Database
    from OND natively into Hadoop MapReduce jobs. One use for this class is to read NoSQL database records into Oracle Loader for Hadoop. Oracle Big Data SQL...
    19 KB (2,000 words) - 04:31, 5 April 2025
  • integration: HBase and Rcfile__HadoopSummit2010". 2010-06-30. "Facebook has the world's largest Hadoop cluster!". 2010-05-09. "Apache Hadoop India Summit 2011 talk...
    12 KB (1,445 words) - 17:50, 2 August 2024
  • Cloudera (category Hadoop)
    Hadoop Development". The New York Times. VentureBeat. October 27, 2010. Rao, Leena (7 November 2011). "Ignition, Accel, Greylock Put $40M In Apache Hadoop...
    15 KB (1,093 words) - 23:18, 17 March 2025