Big Data Comprehensive Training (Practical)

The Big Data foundation course provides you with an understanding of Big Data, potential data sources that can be used for solving real business problems, and an overview of data mining and the tools used in it.

Course Features

  • 4 Days Workshop
  • Completion Certificate awarded by GKK

Course Schedule

Course Outline

Module 1: Big Data – History, Overview, and Characteristics

  • History
  • Big Data Definition
  • Big Data Benefits
  • Big Data Characteristics
  • Volume
  • Velocity
  • Variety

Big Data Technologies – Overview

  • Big Data Success Stories

Big Data – Privacy and Ethics

  • Privacy – Compliance
  • Privacy – Challenges
  • Privacy – Approach
  • Ethics

Big Data Projects

  • Who Should Be Involved?
  • What Is Involved?

Module 2: Big Data Sources

2.1 Enterprise Data Sources

  • Enterprise Systems
  • Oracle
  • SAP
  • Microsoft
  • Data Warehouses
  • Unstructured Data – Introduction
  • Unstructured Data – Metadata

2.2 Social Media Data Source

  • Introduction
  • Facebook – Introduction
  • Facebook – Public Feed API
  • Facebook – Keyword Insights API
  • Facebook – Graph API
  • Twitter – Introduction
  • Twitter – Streaming APIs
  • Twitter – REST APIs
  • Other Social Media

2.3 Public Data Sources

  • Introduction
  • Weather
  • Economics
  • Finance
  • Regulatory Bodies

Module 3: Data Mining – Concepts and Tools

3.1 Data Mining – Introduction

  • Introduction
  • Types of Data Mining – Overview
  • Types of Data Mining – Classification
  • Types of Data Mining – Association
  • Types of Data Mining – Clustering

3.2 Data Mining – Tools

  • Introduction
  • Weka
  • Modules of Weka Applications
  • KNIME
  • KNIME – Example
  • R Language

Module 4: The Hadoop Distributed File System (HDFS)

4.1 Hadoop Fundamentals

  • Introduction
  • Main Components of Hadoop
  • Additional Components of Hadoop

4.2. The Hadoop Distributed File System (HDFS)

  • Overview of HDFS
  • Launching HDFS in Pseudo-Distributed Mode Core HDFS Services
  • Installing and Configuring HDFS
  • HDFS Commands
  • HDFS Safe Mode
  • Check Pointing HDFS
  • Federated and High Availability HDFS
  • Running a Fully-Distributed HDFS Cluster with Docker

4.3. MapReduce with Hadoop

  • MapReduce from the Linux Command Line Scaling MapReduce on a Cluster Introducing Apache Hadoop Overview of YARN
  • Launching YARN in Pseudo-Distributed Mode Demonstration of the Hadoop Streaming API Demonstration of MapReduce with Java

Module 5: Apache

5.1. Introduction to Apache Spark

  • Why Spark?
  • Spark Architecture
  • Spark Drivers and Executors
  • Spark on YARN
  • Spark and the Hive Metastore
  • Structured APIs, DataFrames, and Datasets
  • The Core API and Resilient Distributed Datasets (RDDs)
  • Overview of Functional Programming
  • MapReduce with Python

5.2. Apache Hive

  • Hive as a Data Warehouse
  • Hive Architecture
  • Understanding the Hive Metastore and HCatalog Interacting with Hive using the Beeline Interface Creating Hive Tables
  • Loading Text Data Files into Hive
  • Exploring the Hive Query Language
  • Partitions and Buckets
  • Built-in and Aggregation Functions Invoking MapReduce Scripts from Hive Common File Formats for Big Data Processing Creating Avro and Parquet Files with Hive Creating Hive Tables from Pig

Accessing Hive Tables with the Spark SQL Shell

5.3. Persisting Data with Apache HBase

  • Features and Use Cases
  • HBase Architecture
  • The Data Model
  • Command Line Shell
  • Schema Creation
  • Considerations for Row Key Design

5.4 Apache Storm

  • Processing Real-Time Streaming Data
  • Storm Architecture: Nimbus, Supervisors, and ZooKeeper
  • Application Design: Topologies, Spouts, and Bolts

Module 6: Data Modelling with Document Databases

6.1 MongoDB Fundamentals

  • Introduction
  • Replication
  • Sharding
  • Sharding and Replication
  • MongoDB Ecosystem – Languages and Drivers
  • MongoDB Ecosystem – Hadoop Integration
  • MongoDB Ecosystem – Tools

6.2 Install and Configure

  • Download
  • How to Install and Configure

6.3 Document Databases

  • Introduction
  • Documents
  • Document Design Considerations
  • Fields

6.4 Data Modelling with Document Databases

  • Introduction
  • Twitter Sentiment Analysis
  • Twitter Sentiment Analysis – Algorithm
  • Network Log Analysis
  • Network Log Analysis – Algorithm

FAQ

All trainees to have the following:

i) Required knowledge for attendees

  • Conversant with any imperative programming language like C
  • Knowledge of SQL query

ii) Hardware Requirement

— Minimum Configuration of Laptop

  • Memory/ RAM 8 GB
  • Free Disk Space 30 GB
  • 4 CPU cores

iii) Software Requirement:

Windows or Mac

Oracle Virtual Box (https://www.virtualbox.org/wiki/Downloads)

  1. Software developers
  2. IT managers
  3. Service management professionals
  4. Technology Managers

Call Us at 03-22014533

Or simply click the button below to whatsapp us directly!
CALL NOW

Payment Methods

We offer the following options:
  • Cash
  • HRDF Claimable
  • Maybank Ezpay (Up to 24 months @ 0% Interest)
  • CIMB Easy Pay (Up to 24 months @ 0% Interest)
  • Cash Installment (Case by case basis)

Yes, I'm Interested

X