Our Big Data and Hadoop Developer certification training program is designed to give you expertise in building powerful data processing applications using Hadoop. Through our lectures and hands-on exercises, you’ll learn to install, configure, maintain, and scale your Hadoop 2.0 environment code.

The final project assignment towards the end of the course will give you a solid understanding of Big Data structuring as well as Hadoop deployment lifecycle for multi-node clusters.

Generic placeholder image

Course Highlights

  • Blended learning with instructor-led-online classrrom sessions and online self learning.
  • Hand-on Lab Exercises
  • Industry Specific Projects
  • Chapter Quizzes
  • Big Data & Hadoop Simulation Exams
  • Downloadable e-Book Included
  • Java Essentials for Hadoop Included
  • Hadoop Installation Procedure Included
  • Hand's on Hadoop training Certification
  • Hadoop Deployment and Maintenance Tips
  • Packed with Latest & Advanced modules like YARN, Flume, Oozie, Mahout & Chukwa

The ezPrep Approach

By the end of this program participants will have learnt to:

  • Master the concepts of Hadoop Distributed File System and MapReduce framework
  • Setup a Hadoop Cluster
  • Understand Data Loading Techniques using Sqoop and Flume
  • Program in MapReduce (Both MRv1 and MRv2)
  • Learn to write Complex MapReduce programs
  • Program in YARN (MRv2)
  • Perform Data Analytics using Pig and Hive
  • Implement HBase, MapReduce Integration, Advanced Usage and Advanced Indexing
  • Have a good understanding of ZooKeeper service
  • New features in Hadoop 2.0 -- YARN, HDFS Federation, NameNode High Availability
  • Implement best Practices for Hadoop Development and Debugging
United States
Training Type
Date, Time & Place
Currently no program is scheduled for the selected location.

Find this training in other cities

Online Self Learning


Introduction to Big Data & Hadoop

  • What is Big Data
  • Learn about the history and rise of Big Data
  • Why did Big Data suddenly become so prominent
  • Limitations of traditional large scale systems
  • Who are the main vendors in the space - Cloudera - Hortonworks
  • Introduction to Hadoop
  • History of Hadoop
  • Companies using Hadoop

Hadoop Architecture / Introduction to HDFS

  • Understanding Hadoop Master-Slave Architecture
  • Understanding HDFS and MapReduce framework
  • Regular file system vs HDFS
  • Learn about NameNode, DataNode, Secondary Node
  • Learn about JobTracker, TaskTracker
  • Understand how data is written and read from HDFS

Installing and setting up a Hadoop Cluster

  • Understand the important configuration files in a Hadoop Cluster
  • Deploy the Cloudera Hadoop distribution in a VM player
  • Run HDFS and Linux commands
  • Execute some examples to get a high level understanding
  • Hadoop deployment - Single node, Multinode
  • Learn how to setup and deploy a multinode Hadoop Cluster on AWS

Understanding Hadoop MapReduce Framework

  • Overview of the MapReduce Framework
  • Understand the concept of Mappers, Reducers, Partitioners, Combiners
  • Understand different Input Formats
  • Understand different Output Formats
  • Custom Data Types
  • Writing MapReduce Mappers, and Reducers in Java using Eclipse
  • Using writable interface
  • JUnit and MRUnit Testing Frameworks
  • Writing and running unit test


  • Introduction to PIG
  • Setting up and running PIG
  • Grunt
  • Pig Latin
  • Writing PIG Latin scripts

Cloudera Impala

  • Introduction to Impala
  • Installing and using impala
  • Create table using Impala
  • Query the Impala table
  • Impala SQL language reference
  • Impala shell commands

Hive and HiveQL

  • Understand the Hive architecture
  • Why need for another data warehousing system
  • Installing, congifuring and running Hive
  • HiveQL - Importing data, sorting and aggregating, joins, map joins
  • Writing join queries and inserting data back into Hive
  • Understand how queries are converted into MapReduce jobs
  • Hive Tables and storage formats
  • UDF and UDAF
  • Choosing between PIG, Hive and Impala


  • Overview of Zookeeper
  • Uses of Zookeeper
  • Zookeeper Service
  • Zookeeper Data Model
  • Building applications with Zookeeper


  • Overview of Sqoop
  • Where is Sqoop used - import/export structured data
  • Using Sqoop to import data from RDBMS into HDFS
  • Using Sqoop to import data from RDBMS into Hive
  • Using Sqoop to import data from RDBMS into HBase
  • Using Sqoop to export data from HDFS into RDMBS
  • Sqoop connectors


  • Overview of Flume
  • Where is Flume used - import/export unstructured data
  • Using Flume to load data into HDFS
  • Using Flume to load data into HBase
  • Using Flume to load data into Hive


  • Introduction to HBase
  • Why use HBase
  • HBase Architecture - read and write paths
  • HBase vs RDBMS
  • Installing and Configuration
  • Schema design in HBase - column families, hotspotting
  • Accessing data with HBase API - Reading, Adding, Updating data from the shell, JAVA API
  • SCAN and Advanced API
  • Using Zookeeper with HBase

Cassandra and MongoDB

  • Introduction to NoSQL database
  • Advantage of NoSQL vs traditional RDBMS
  • Introduction to Apache Cassandra
  • Overview of Cassandra - data model, reading/writing data, CQL
  • Introduction to MongoDB
  • MongoDB vs Cassandra
  • Introduction to Mahout

Apache Oozie

  • Introduction to Oozie
  • Oozie workflow jobs
  • Oozie coordinator jobs
  • Creating Oozie Workflows
  • Using HUE UI for Oozie
  • Using CLI to run and track workflows

Hadoop 2.0, YARN, MRv2

  • Understand new features in Hadoop 2.0
  • Learn advanced Hadoop concepts
  • Introduction to YARN
  • YARN architecture
  • Upgrading MRv1 to MRv2
  • Developing application using MapReduce version 2
Generic placeholder image

ezCourse Kit

  • Blended learning with instructor-led-online classrrom sessions and online self learning.
  • Course completion certificate to all the participants
  • Project on Big Data and Hadoop development
  • Downloadable e-book for future references
  • Big Data and Hadoop simulation papers
  • Java essentials for Hadoop included

Who is this Course for?

This course is designed for:

  • Software Professionals,
  • Analytics Professionals,
  • ETL developers,
  • Project Managers,
  • Testing Professionals
  • Other professionals who are looking forward to acquire a solid foundation of Hadoop Architecture can also opt for this course.
Generic placeholder image
Generic placeholder image


1. Who are the Instructors?

All our instructors are working professionals and industry top experts in Big Data and Hadoop Development. They have real world experience in Big Data and Hadoop.

2. How will be the practical done?

All our instructors are working professionals and experts in Big Data and Hadoop Development. They have real world experience in Big Data and Hadoop.