Bigdata(Hadoop+Spark & Scala) Course Content

Course Duration 40 Hours

                      HADOOP SYLLABUS

Module 1: Introduction to Big Data
Topics – What is Big Data and where it is produced? Rise of Big Data, Compare Hadoop vs
traditional systems, Limitations and Solutions of existing Data Analytics Architecture,
Attributes of Big Data, Types of data, other technologies vs Big Data.

Module 2: Hadoop Architecture and HDFS
Topics – What is Hadoop? Hadoop History, Distributing Processing System, Core
Components of Hadoop, HDFS Architecture, Hadoop Master – Slave Architecture, Daemon
types – Learn Name node, Data node, Secondary Name node.

Module 3: Hadoop Clusters and the Hadoop Ecosystem
Topics – What is Hadoop Cluster? Pseudo Distributed mode, Type of clusters, Hadoop Ecosystem,
Pig, Hive, Oozie, Flume, SQOOP.

Module 4: Hadoop MapReduce Framework
Topics – Overview of MapReduce Framework, MapReduce Architecture, Learn about Job
tracker and Task tracker, Use cases of MapReduce, Anatomy of MapReduce Program.

Module 5: MapReduce programs in Java
Topics – Basic MapReduce API Concepts, Writing MapReduce Driver, Mappers,
and Reducers in Java, Speeding up Hadoop Development by Using Eclipse, Unit
Testing MapReduce Programs, and Demo on word count example.

Module 6: Hive and HiveQL
Topics – What is Hive?, Hive vs MapReduce, Hive DDL – Create/Show/Drop Tables,
Internal and External Tables, Hive DML – Load Files & Insert Data, Hive Architecture
& Components, Difference between Hive and RDBMS, Partitions in Hive

             SPARK COURSE SYLLABUS

Module 1 – Introduction Spark
1. What is Spark?
2. Modes of Spark
3. Spark Installation
4. Spark Standalone Cluster
5. Capabilities and Ecosystem
6. Spark Components vs Hadoop
7. Loading a File in spark Shell
8. Performing Some Basic Operations on Files in Spark Shell

Module 2 – RDD Fundamentals
1. Purpose and Structure of RDDs.
2. Transformations, Actions, and DAG
3. RDD programming API

Module 3 Spark SQL / Dataframes
1. Spark SQL and DataFrame Uses
2. DataFrame / SQL APIs
3. Catalyst Query Optimization

Module 4 – Spark Job Execution
1. Jobs, Stages, and Tasks
2. Partitions and Shuffles
3. Data Locality
4. Job Performance

Module 5 – Spark Streaming
1. Streaming Sources and Tasks
2. DStream APIs and Stateful Streams
3. Reliability and Fault Recovery

Module 6 – Spark Mlib
1. Classification Algorithm
2. Clustering Algorithm

3. Sequence Mining Algorithm
4. Collbrative filtering

Module – 7 Spark GraphX
1. Graph analysis with Spark
2. GraphX for graphs
3. Graph

Apache Hive Training Course Syllabus

Introduction
• Hadoop
• What is Hive?
• Features of Hive
• Architecture of Hive
• Working of Hive
HIVE Installation
• Verifying JAVA Installation
• Verifying Hadoop Installation
• Downloading Hive
• Installing Hive
• Configuring Hive
• Downloading and Installing Apache
• Derby
• Configuring Metastore of Hive
• Verifying Hive Installation
HIVE Data Types
• Column Types
• Literals
• Null Value
• Complex Types
• Apache Hive
Create Database
• Create Dat
• abase Statement
DROP Database
• Drop Database Statement
CREATE Table
• Create Table Statement
• Load Data Statement

ALTER Table
• Alter Table Statement

• Rename To… Statement
• Change Statement
• Add Columns Statement
• Replace Statement
DROP Table
• Drop Table Statement
Partitioning
• Adding a Partition
• Renaming a Partition
• Dropping a Partition
BUILt-IN OPERATORS
• Relational Operators
• Arithmetic Operators
• Logical Operators
• Complex Operators
BUILT-IN FUNCTIONS
• Apache Hive
• Built-In Functions
• Aggregate Functions
Views And Indexes
• Creating a View
• Example
• Dropping a View
• Creating an Index
• Example
• Dropping an Index
HIVE QL SELECT…WHERE
• Syntax
• Example
HIVEQL SELECT…ORDER BY
• Syntax
• Example

HIVEQL GROUP BY
• Syntax
• Example
HIVEQL JOINS
• Syntax
• Example
• JOIN
• LEFT OUTER JOIN
• RIGHT OUTER JOIN
• FULL OUTER JOIN

SCALA COURSE SYLLABUS

 Module 1 – Introduction
1. Introduction to Scala
2. Creating a Scala Doc
3. Creating a Scala Project
4. The Scala REPL
5. Scala Documentation

 Module 2 – Basic Object Oriented Programming
1. Classes
2. Immutable and Mutable Fields
3. Methods
4. Default and Named Arguments
5. Objects
 Module 3 – Case Objects and Classes
1. Companion Objects
2. Case Classes and Case Objects
3. Apply and Unapply
4. Synthetic Methods
5. Immutability and Thread Safety
 Module 4 – Collections
1. Collections overview
2. Sequences and Sets
3. Options
4. Tuples and Maps
5. Higher Order Functions

 Module 5 – Idiomatic Scala
1. For expressions
2. Pattern Matching
3. Handling Options
4. Handling Failures
5. Handling Futures

SHARE