Data Science

Machine Learning & Analytics

Course Description

The Data Science course enables you to understand practical foundations, helping you effectively execute and take up Big Data and other analytics projects. The program covers topics from Big Data to the Data Analytics Life Cycle. Understanding these topics helps in addressing business challenges that leverage Big Data.

Another aspect of this course is that it covers basic as well as advanced analytic methods, and also introduces the participant to Big Data technologies with tools like MapR and Hadoop. Our state-of-the-art-infrastructure allows students to understand the applications of these methods and tools by getting hands-on experience working alongside real-time data scientists. This program has an open approach including a final lab session, which explains various Big Data Analytics challenges by applying the concepts covered during the program with respect to the Data Analytics Life Cycle.

30 Hours

  • Be a part of a data science team and work on Big Data and various other analytics projects
  • Deploy the Data Analytics Life Cycle for Big Data projects
  • Change the frame of a challenge from a business perspective to analytics
  • Understand which analytics techniques and tools will work in a specific Big Data analysis
  • Create statistical models and understand which insights can lead to actionable results
  • Select appropriate data visualizations, which would help in communicating analytics insights to business sponsors and analytics audience in a clearer manner
  • Use various Big Data tools like Hadoop, MapR, R, In-Database Analytics, and MADLib functions
  • Understand how to leverage advanced analytics to create a competitive advantage, and how the roles of data scientists and BI analysts are different from each othera
  • Good understanding of basic statistical concepts and a strong quantitative background
  • Knowledge of any scripting languages such as Java, Perl, Python, or R, as most of the modules in the course use R – an open-source statistical tool, and programming language.
  • Knowledge and experience of SQL
  • Knowledge of these pre-requisites will enable the participants to understand various advanced tools and methods covered during the program more effectively.
  • Managers from any field, as Analytics is the best tool for managers these days
  • Business Analysts and Data Analysts who wish to upscale their Data Analytics skills
  • Database professionals who aspire to venture into the field of Big Data by acquiring analytics skills
  • Fresh graduates who wish to make a career in the field of Big Data or Data Science

Data Science Overview

  • What is Data Science?
  • Skill set required
  • Job opportunities

Descriptive & Inferential Statistics

  • Continuous vs. Categorical variables
  • Mean, Median, Mode, Standard Deviation, Quartile, IQR
  • Hypothesis testing, z-test, t-test

Data Analytics using R Programming – Fundamentals

Installation of R Studio

  • Overview of R Studio components
  • Data Structures
  • Vector
  • List
  • Matrices
  • Data Frame
  • Factor
  • Slicing and Sub-setting
  • Vector
  • List
  • Matrix
  • Data Frame

 

Functions in R

  • In-built functions
  • User-defined functions

Loops in R

  • while
  • for
  • break
  • next

Data Import in R

Data Analytics using R Programming – Advanced

Apply Family of Functions

  • lapply
  • sapply
  • tapply

Data Manipulation Using dplyr

Data Visualization Using ggplot2

Machine Learning using R – Part 1

What is Machine Learning?

Supervised vs. Unsupervised Learning

Exploratory Data Analysis

  • Univariate analysis
  • Boxplot
  • Bivariate analysis
  • Scatterplot
  • Correlation
  • Outliers
  • Remove duplication
  • Missing value imputation

Underfitting vs. Overfitting

Linear Regression

  • Simple
  • Multiple
  • Assumptions of Linear Regression
  • Evaluating Accuracy of model: k-Fold Cross validation

Logistic Regression

  • Confusion Matrix
  • ROC Curve

Time Series Forecasting

  • Moving Average
  • Exponential smoothing
  • Holt Winter’s
  • ARIMA

Machine Learning using R – Part 2

  • Naïve Bayes
  • Support Vector Machine
  • K-Nearest Neighbor
  • Decision tree
  • Random Forest
  • K-Means Clustering

Big Data using Hadoop & Spark

  • Introduction to Big Data
  • Overview of Hadoop & its Ecosystem
  • Introduction to NoSQL
  • Overview of Apache Spark

Related Courses

Close Menu
error: Content is protected !!