Hadoop Components

Components of Hadoop



File System
  • HDFS (Primary Data storage system)
  • Amazon S3
  • Azure
Execution Engine
  • MapReduce - launched since Hadoop version 1
  • Apache Spark - launched since Hadoop version 2 (Available as part of Hortonworks and Cloudera distributions)
  • Apache Tez - launched since Hadoop version 2 (Available as part of Hortonworks distribution but not in Cloudera)
Ecosystem Tools
  • Hive - Datawarehousing tool on HDFS (used primarily for Structured Data and to some extent for Semi-Structured Data)
  • Pig - ETL tool where source is other system and target is HDFS (used for Structured, Semi-Structured and Unstructured Data)
  • Sqoop - Tool for Importing/Exporting data with relational database like Oracle (used for Structured Data)
  • Oozie - Tool for monitoring Workflows and Scheduling Jobs
  • Zookeeper - Configuration Manager

Comments

Popular posts from this blog

Hadoop Architecture version 1.x

What is Data Analytics?