Big Data Courses Syllabus

Posted by Manoj Singh rathore
224 Pageviews

Big data Hadoop is a software based program for storing and processing big data in companies. It is an open-source tool build on java platform

Big data Courses Content 

Introduction of Big Data & Hadoop
  • Big Data & Hadoop Introduction
  • What is Hadoop?
  • Why & Who use Hadoop?
  • What is Hadoop History?
  • How many Different types of Components in Hadoop?
  • Detailed information on HDFS, MapReduce, PIG, Hive, SQOOP, HBASE, OOZIE, Flume, Zookeeper and so on…
  • What is the scope of Hadoop in industry?
Deep Drive in HDFS (for Storing the Data)
  • HDFS Introduction
  • Design of HDFS
  • Role of HDFS in Hadoop
  • HDFS Feature
  • Intro of Hadoop Daemons and its functionality
    • Name Node
    • Secondary Name Node
    • Job Tracker
    • Data Node
    • Task Tracker
  • Anatomy of File Wright
  • Anatomy of File Read
  • Network Topology
    • Nodes
    • Racks
    • Data Center
  • Parallel Copying using DistCp
  • Basic Configuration for HDFS
  • Data Organization
    • Blocks and
    • Replication
  • Heartbeat Signal
  • How to Store the Data into HDFS
  • How to Read the Data from HDFS
  • Accessing HDFS (Introduction of Basic UNIX commands)
  • CLI commands
MapReduce using Java (Processing the Data)
  • The introduction of MapReduce.
  • MapReduce Architecture
  • Data flow in MapReduce
  • Splits
  • Mapper
  • Portioning
  • Sort and shuffle
  • Combiner
  • Reducer
  • Understand Difference Between Block and InputSplit
  • Role of RecordReader
  • Basic Configuration of MapReduce
  • MapReduce life cycle
  • Driver Code
  • Mapper and Reducer
  • How MapReduce Works
  • Writing and Executing the Basic MapReduce Program using Java
  • Submission & Initialization of MapReduce Job.
  • File Input/Output Formats in MapReduce Jobs
  • Text Input Format
  • Key Value Input Format
  • Sequence File Input Format
  • NLine Input Format
  • Joins
  • Map-side Joins
  • Reducer-side Joins
  • Word Count Example
  • Partition MapReduce Program
  • Side Data Distribution
  • Distributed Cache (with Program)
  • Counters (with Program)
  • Types of Counters
  • Task Counters
  • Job Counters
  • User Defined Counters
  • Propagation of Counters
  • Job Scheduling
PIG
  • Introduction to Apache PIG
  • Introduction to PIG Data Flow Engine
  • MapReduce vs. PIG in detail
  • When should PIG use?
  • Data Types in PIG
  • Basic PIG programming
  • Modes of Execution in PIG
  • Local Mode and
  • Execution Mechanisms
  • Grunt Shell
  • Script
  • Embedded
  • Operators/Transformations in PIG
  • PIG UDF's with Program
  • Word Count Example in PIG
  • The difference between the MapReduce and PIG
SQOOP
  • Introduction to SQOOP
  • Use of SQOOP
  • Connect to MySQL database
  • SQOOP commands
  • Import
  • Export
  • Evala
  • Joins in SQOOP
  • Export to MySQL
  • Export to HBase
OOZIE
  • Introduction to OOZIE
  • Use of OOZIE
  • Where to use?
Apache HIVE
  • Introduction to HIVE
  • HIVE Meta Store
  • HIVE Architecture
  • Tables in HIVE
  • Managed Tables
  • External Tables
  • Hive Data Types
  • Primitive Types
  • Partition
  • Joins in HIVE
  • HIVE UDF's and UADF's with Programs
  • Word Count Example
Mango DB
  • What is MongoDB?
  • Where to Use?
  • Configuration On Windows
  • Inserting the data into MongoDB?
  • Reading the MongoDB data.
Apache HBase
  • Introduction to HBASE
  • Basic Configurations of HBASE
  • Fundamentals of HBase
  • What is NoSQL?
  • HBase Data Model
  • Table and Row
  • Column Family and Column Qualifier
  • Categories of NoSQL Data Bases
  • Key-Value Database
  • Document Database
  • Column Family Database
  • HBASE Architecture
  • HMaster
  • Region Servers
  • Regions
  • MemStore
  • SQL vs. NOSQL
  • How HBASE is differed from RDBMS
  • HDFS vs. HBase
  • Client-side buffering or bulk uploads
  • HBase Designing Tables
  • HBase Operations
  • Get
  • Scan
  • Put
  • Delete
Cluster Setup
  • Downloading and installing the Ubuntu12.x
  • Installing Java
  • Installing Hadoop
  • Creating Cluster
  • Increasing Decreasing the Cluster size
  • Monitoring the Cluster Health
  • Starting and Stopping the Nodes
Zookeeper
  • Introduction Zookeeper
  • Data Modal
  • Operations
Flume
  • Introduction to Flume
  • Uses of Flume
  • Flume Architecture
  • Flume Master
  • Flume Collectors
  • Flume Agents