Hadoop Components
Components of Hadoop
File System
- HDFS (Primary Data storage system)
- Amazon S3
- Azure
- MapReduce - launched since Hadoop version 1
- Apache Spark - launched since Hadoop version 2 (Available as part of Hortonworks and Cloudera distributions)
- Apache Tez - launched since Hadoop version 2 (Available as part of Hortonworks distribution but not in Cloudera)
- Hive - Datawarehousing tool on HDFS (used primarily for Structured Data and to some extent for Semi-Structured Data)
- Pig - ETL tool where source is other system and target is HDFS (used for Structured, Semi-Structured and Unstructured Data)
- Sqoop - Tool for Importing/Exporting data with relational database like Oracle (used for Structured Data)
- Oozie - Tool for monitoring Workflows and Scheduling Jobs
- Zookeeper - Configuration Manager
Comments
Post a Comment