LGIT Smart Solutions

Uplifting: Business - People - Community

Course 20773: Analyzing Big Data with Microsoft R

Duration: 90-days online access
ILT Classroom: 3 days
Audience: IT Pros / Developers
Certification: MCSA / MCSE
Exam: 70-773
Level: 300

Purchase now

Overview

About this course
The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis on a large dataset, and show how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.

Audience profile
The primary audience for this course is people who wish to analyze large datasets within a big data environment.
The secondary audience are developers who need to integrate R analyses into their solutions.

At course completion
After completing this course, students will be able to:

  • Explain how Microsoft R Server and Microsoft R Client work
  • Use R Client with R Server to explore big data held in different data stores
  • Visualize data by using graphs and plots
  • Transform and clean big data sets
  • Implement options for splitting analysis jobs into parallel tasks
  • Build and evaluate regression models generated from big data
  • Create, score, and deploy partitioning models generated from big data
  • Use R in the SQL Server and Hadoop environments

Pre-requisites
In addition to their professional experience, students who attend this course should have:

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices.
  • Basic knowledge of the Microsoft Windows operating system and its core functionality.

Course Outline

  • Module 1: Microsoft R Server and R Client
    Explain how Microsoft R Server and Microsoft R Client work.
    - What is Microsoft R server
    - Using Microsoft R client
    - The ScaleR functions
  • Module 2: Exploring Big Data
    At the end of this module the student will be able to use R Client with R Server to explore big data held in different data stores.
    - Understanding ScaleR data sources
    - Reading data into an XDF object
    - Summarizing data in an XDF object
  • Module 3: Visualizing Big Data
    Explain how to visualize data by using graphs and plots.
    - Visualizing In-memory data
    - Visualizing big data
  • Module 4: Processing Big Data
    Explain how to transform and clean big data sets.
    - Transforming Big Data
    - Managing datasets
  • Module 5: Parallelizing Analysis Operations
    Explain how to implement options for splitting analysis jobs into parallel tasks.
    - Using the RxLocalParallel compute context with rxExec
    - Using the revoPemaR package
  • Module 6: Creating and Evaluating Regression Models
    Explain how to build and evaluate regression models generated from big data
    - Clustering Big Data
    - Generating regression models and making predictions
  • Module 7: Creating and Evaluating Partitioning Models
    Explain how to create and score partitioning models generated from big data.
    - Creating partitioning models based on decision trees.
    - Test partitioning models by making and comparing predictions
  • Module 8: Processing Big Data in SQL Server and Hadoop
    Explain how to transform and clean big data sets.
    - Using R in SQL Server
    - Using Hadoop Map/Reduce
    - Using Hadoop Spar

Purchase now