This introduction to working with Apache Hadoop is designed for statisticians, as well as experienced SAS programmers with a background in statistics. The course is problem-driven and focuses on helping you understand what data scientists do, the problems they solve, and their methods. By taking a practical approach to the subject, including multiple hands-on exercises, you will leave class with skills that you can immediately apply to real-world problems. You will also learn how recommender systems can be leveraged in industries such as Health Care, Finance, Telecom, and so on.

Generic placeholder image

Course Highlights

Upon successful completion of this course, participants should be able to:

  • 5 live classes of 3 hrs each by Industry practitioners
  • describe the role and responsibilities of a data scientist
  • explain several ways in which data scientists create value for organizations across many industries
  • locate and acquire data from diverse sources
  • use transformation and normalization techniques on both structured and unstructured data
  • determine the most appropriate type of analysis and modeling tool to use for a given problem
  • be able to implement an automated recommendation system
  • develop, evaluate, and refine scoring systems for recommenders
  • understand the considerations involved in working at scale
  • identify meaningful, actionable, and business-oriented results from the analysis.
United States
Training Type
Date, Time & Place
Currently no program is scheduled for the selected location.

Find this training in other cities

Online Self Learning


  • Introduction
  • Data Science Overview
  • Apache Hadoop Overview
  • Use Cases
  • Project Lifecycle
  • Data Acquisition
  • Evaluating Input Data
  • Data Transformation
  • Fundamentals of Machine Learning
  • Recommender Overview
  • Implementing Recommenders with MapReduce and SAS
  • Experimentation and Evaluation

Who is this Course for?

Data scientists, SAS programmers, and statisticians who have some basic familiarity with Apache Hadoop

Before attending this course, you must:

  • be familiar with Base SAS software
  • be able to write basic SQL queries
  • have predictive modelling knowledge at the level acquired in Predictive Modelling Using Logistic Regression
  • have a basic understanding of Apache Hadoop at the level acquired in Introduction to SAS and Hadoop
  • This course addresses SAS/STAT software.
Generic placeholder image


Participant can attend the certifications exam.

It is mandatory that a participant to clear the online exam with minimum score of 80% to be Certified in Data Science with SAS.

Generic placeholder image


1. Who will be the trainer for the training?

Highly qualified and certified instructors with industry relevant experience deliver the training.