Apache_Hadoop Search Results

Apache Hadoop

Apache Hadoop ( /həˈduːp/) is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework...

49 KB (5,050 words) - 11:55, 23 January 2025

Apache Parquet

Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other...

13 KB (1,068 words) - 16:52, 3 April 2025

Apache Hive

Apache Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface...

21 KB (2,300 words) - 01:15, 14 March 2025

Apache Impala

Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala...

6 KB (555 words) - 13:30, 13 April 2025

Apache Avro

remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes...

13 KB (1,326 words) - 05:49, 25 February 2025

MapReduce (redirect from Hadoop map)

implementation that has support for distributed shuffles is part of Apache Hadoop. The name MapReduce originally referred to the proprietary Google technology...

46 KB (5,480 words) - 18:47, 12 December 2024

Apache ZooKeeper

Apache Hadoop Apache Accumulo Apache HBase Apache Hive Apache Kafka Apache Drill Apache Solr Apache Spark Apache NiFi Apache Druid Apache Helix Apache Pinot...

8 KB (718 words) - 22:38, 17 November 2024

Apache HBase

Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System) or Alluxio...

10 KB (833 words) - 12:42, 11 December 2024

Apache Mahout

past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark. Mahout also provides Java/Scala...

8 KB (648 words) - 21:43, 7 July 2024

Cascading (software) (category Hadoop)

abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any...

10 KB (776 words) - 19:08, 23 June 2023

Apache Pig

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute...

11 KB (979 words) - 18:51, 15 July 2022

Apache Spark

applications may be reduced by several orders of magnitude compared to Apache Hadoop MapReduce implementation. Among the class of iterative algorithms are...

30 KB (2,752 words) - 16:06, 2 March 2025

List of Apache Software Foundation projects

platforms such as Apache Spark Beam, an uber-API for big data Bigtop: a project for the development of packaging and tests of the Apache Hadoop ecosystem. Bloodhound:...

41 KB (4,631 words) - 16:59, 13 March 2025

Apache ORC

Hadoop ecosystem such as RCFile and Parquet. It is used by most of the data processing frameworks Apache Spark, Apache Hive, Apache Flink, and Apache...

4 KB (246 words) - 11:36, 21 August 2024

Apache Drill

include: All Hadoop distributions (HDFS API 2.3+), including Apache Hadoop, MapR, CDH and Amazon EMR NoSQL: MongoDB, Apache HBase, Apache Cassandra Online...

7 KB (700 words) - 15:30, 5 July 2024

Hortonworks (category Hadoop)

Platform (HDP): based on Apache Hadoop, Apache Hive, Apache Spark Hortonworks DataFlow (HDF): based on Apache NiFi, Apache Storm, Apache Kafka Hortonworks DataPlane...

7 KB (512 words) - 21:42, 17 January 2025

Doug Cutting

Mike Cafarella. The Apache Software Foundation now manages both projects. Cutting and Cafarella were also co-founders of Apache Hadoop. Cutting graduated...

8 KB (686 words) - 15:33, 27 July 2024

Apache Nutch

have been spun out into their own subproject, called Hadoop. In January, 2005, Nutch joined the Apache Incubator, from which it graduated to become a subproject...

13 KB (625 words) - 20:19, 5 January 2025

XGBoost (category Software using the Apache license)

machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask. XGBoost gained much popularity and attention...

14 KB (1,319 words) - 06:46, 25 March 2025

Apache Accumulo

Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache...

6 KB (575 words) - 06:58, 18 November 2024

Apache Phoenix

Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix...

5 KB (306 words) - 13:47, 12 November 2024

Apache Oozie

Apache Oozie is a server-based workflow scheduling system to manage Hadoop jobs. Workflows in Oozie are defined as a collection of control flow and action...

3 KB (204 words) - 20:30, 27 March 2023

Sqoop (redirect from Apache Sqoop)

between relational databases and Hadoop. The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic. Sqoop supports incremental...

6 KB (439 words) - 19:04, 17 July 2024

Lambda architecture

data warehouse, Yahoo has taken a similar approach, also using Apache Storm, Apache Hadoop, and Druid.: 9, 16 The Netflix Suro project has separate processing...

10 KB (1,145 words) - 02:33, 11 February 2025

Data-intensive computing (section Hadoop)

sequence. Apache Hadoop is an open source software project sponsored by The Apache Software Foundation which implements the MapReduce architecture. Hadoop now...

25 KB (3,139 words) - 02:30, 22 December 2024

List of big data companies

term big data: Alpine Data Labs, an analytics interface working with Apache Hadoop and big data AvocaData, a two sided marketplace allowing consumers to...

3 KB (334 words) - 16:29, 7 February 2025

Apache Hama

sub-project of Hadoop, it became an Apache Software Foundation top level project in 2012. It was created by Edward J. Yoon, who named it (short for "Hadoop Matrix...

6 KB (589 words) - 04:39, 6 January 2024

Oracle NoSQL Database (section Apache Hadoop)

from OND natively into Hadoop MapReduce jobs. One use for this class is to read NoSQL database records into Oracle Loader for Hadoop. Oracle Big Data SQL...

19 KB (2,000 words) - 04:31, 5 April 2025

RCFile

integration: HBase and Rcfile__HadoopSummit2010". 2010-06-30. "Facebook has the world's largest Hadoop cluster!". 2010-05-09. "Apache Hadoop India Summit 2011 talk...

12 KB (1,445 words) - 17:50, 2 August 2024

Cloudera (category Hadoop)

Hadoop Development". The New York Times. VentureBeat. October 27, 2010. Rao, Leena (7 November 2011). "Ignition, Accel, Greylock Put $40M In Apache Hadoop...

15 KB (1,093 words) - 23:18, 17 March 2025