Ads not by this site
Hadoop Administrator Course Outline
Audience - Oracle Database Administrator, SQL DBA, windows administrator, Unix Administrator, and Network Administrators
Module 1
Big data Getting Started
|
What is Big Data?
What is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
|
Module 2
Hadoop Distributed File system
|
Eclipse Installation
Overview of HDFS
Communication Protocols
Hadoop cluster Topology Overview
Setting up SSH for Hadoop Cluster
Running Hadoop –
1. Pseudo-distributed mode
Linux basic commands
HDFS file commands
Reading and writing to HDFS programmatically
Hands-on Lab Exercises
|
Module 3
MapReduce Framework
|
Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
Hands-on Lab Exercises
|
Module 4
Advanced MapReduce Programming
|
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce, OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
Determining Optimal number of reducers, partitions
Hadoop cluster – Performance tuning
Hands-on Lab Exercises
|
Module 5 - Apache Hadoop Administration
| |
Level 1
|
Operating System Preparation
Deployment Setup
Software
Hostname, DNS, and Identification
Users, Groups and Privileges
Kernel Tuning
vm.overcommit_memory
Vm.swappiness
Best Practices for Hadoop setup and infrastructure
Hadoop cluster Installation preparation & Configuration
Ø Cluster network design
Ø Installation of Linux operating system
Ø Configuring SSH
Ø Walkthrough on Rack topology and set up
Managing Hadoop cluster
Ø HDFS cluster management
Ø Secondary Name node configuration
Ø Task Tracker management
Ø Configuring the HDFS quota
Ø Configuring Fair Scheduler
Ø Upgrading Hadoop
Ø Deploying and managing Hadoop clusters
with Ambari
Monitoring Hadoop cluster
Ø Monitoring Hadoop cluster with Ganglia
Ø Monitoring Hadoop cluster with Ambari
Ø Monitoring Hadoop cluster with Nagia
Hadoop Cluster Performance Tuning
Ø Benchmarking and profiling
Ø Using compression for input and output
Ø Configuring optimal map and reduce
slots for the TT
Ø Fine tuning Job Tracker config
Ø Fine tuning Task Tracker config
Ø Tuning Shuffle, merge and sort parameters
Security Implementation
Kerberos security Implementation
Workflow Scheduler
FIFO Scheduler Configuration
Capacity Scheduler Configuration
Fair Scheduler Configuration
dfsadmin & mradmin commands
Administration of Hcatalog and Hive
Backup and Recovery
-
|
Level 2 Cluster maintenance
|
Starting and stopping Processes with Init Scripts
Starting and Stopping processes manually
HDFS maintenance Tasks
- Data node failure & Recovery
- Name Node Failure & Recovery
- JT & TT failure & Recovery
- Removing data nodes
- Adding Data nodes
- Commissioning and decommissioning of nodes
Map Reduce maintenance Tasks
- Shared upon request
|
Level 3 Monitoring
|
Hadoop Metrics
Health-check
Hadoop Processes
Rest of them shared upon request
|
Level 4 Backup and Recovery
|
Data Backup
Name Node backup
|
Module 6
Pig and Pig Latin
|
Installation and configuration
Running Pig Lating through grunt
Working with Scripts
Lab Exercises
|
Module 7
HBase and ZooKeeper
|
NoSQL Vs SQL
Cap Theorem
Architecture
Installation
Configuration
Java API
Performance Tuning
Lab Exercises
|
Module 8
Hive
|
Features of Hive
Architecture
Installation and configuration
HiveQL
Lab Exercises
|
Module 9
Other Hadoop eco system components
|
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume
Lab Exercises
|
Module 10
Hadoop on Cloud
|
Hosting Hadoop on Amazon EC2
EMR Hands-on
|
No comments:
Post a Comment