APACHE SPARK & SCALA Course Content

SCALA (Object Oriented and Functional Programming)

  • Getting started With Scala
  • Scala Background, Scala Vs Java and Basics
  • Interactive Scala – REPL, data types, variables, expressions, simple functions
  • Running the program with Scala Compiler
  • Explore the type lattice and use type inference
  • Define Methods and Pattern Matching

Scala Environment Set up

  • Scala set up on Windows and UNIX

Functional Programming

  • What is Functional Programming?
  • Differences between OOPS and FPP

Collections

  • Iterating, mapping, filtering, and counting
  • Regular expressions and matching with them
  • Maps, Sets, group By, Options, flatten, flat Map
  • Word count, IO operations, file access, flatMap

Object-Oriented Programming

  • Classes and Properties
  • Objects, Packaging, and Imports
  • Traits
  • Objects, classes, inheritance, Lists with multiple related types, apply

Integrations

  • What is SBT?
  • Integration of Scala in Eclipse IDE
  • Integration of SBT with Eclipse

SPARK CORE

  • Batch versus real-time data processing
  • Introduction to Spark, Spark versus Hadoop
  • The architecture of Spark
  • Coding Spark jobs in Scala
  • Exploring the Spark shell to  Creating Spark Context
  • RDD Programming
  • Operations on RDD
  • Transformations
  • Actions
  • Loading Data and Saving Data
  • Key Value Pair RDD
  • Broadcast variables

Persistence

  • Configuring and running the Spark cluster
  • Exploring to Multi-Node Spark Cluster
  • Cluster management
  • Submitting Spark jobs and running in the cluster mode
  • Developing Spark applications in Eclipse
  • Tuning and Debugging Spark

Spark Streaming

  • Introduction of Spark Streaming
  • Architecture of Spark Streaming
  • Processing Distributed Log Files in Real Time
  • Discretized streams RDD
  • Applying Transformations and Actions on Streaming Data
  • Integration with Flume and Kafka
  • Integration with Cassandra
  • Monitoring streaming jobs

Spark SQL

  • Introduction to Apache Spark SQL
  • The SQL context
  • Importing and saving data
  • Processing the Text files, JSON and Parquet Files
  • DataFrames
  • user-defined functions
  • Using Hive
  • Local Hive Metastore server

Spark MLLib

  • Introduction to Machine Learning
    Types of Machine Learning
  • Introduction to Apache Spark MLLib Algorithms
  • Machine Learning Data Types and working with MLLib
  • Regression and Classification Algorithms
  • Decision Trees in depth
  • Classification with SVM, Naive Bayes
  • Clustering with K-Means
  • Building the Spark server
SHARE