#### Why Master Data Science Level III ?

If your experience is 5 plus years and if your career aspirations are high then this is the course meant for

you. This course prepares you with not just data science but also things which are needed around data science.

If you are a senior professional, you wont be just restricted to data science team but in fact would be interacting with folks from Big data engineering, data engineering , QAs, devOps , Full stack developers, UI/UX , Cloud , infrastructure and last but not the least senior management.

Data Science ++ prepares you to handle all the above folks and you will be able to interact with them properly because in senior roles, you need to take some key decisions and if you don’t know about the above mentioned careers, you wont be able to contribute in the discussions. We provide you precise way of learning all these so that you are fully equipped to join as a leader and not become a silent spectator !

This course will even help you stand a chance for more responsibilities in the team as well as promotion and whenever a new project comes up your CV would be the first one to be picked since you will become Jack of All trades after this course and you will be master of Data Science.

If you are 10 plus years experience you will understand the potential of this course to help you secure senior leadership roles in the team for Analytics projects ! Why should you compromise on your salary or designation when you can actually master all the content provided in data science as well for things around data science (mentioned in the curriculum bottom section).

## CURRICULUM

- Big data engineering team
- UI/UX team
- QA /Testing team
- Cloud team
- Senior stakeholders
- Full stack developers / Devops teams
- Following subjects you need to know as well as part of that

-Jenkins – CI/CD pipeline

-Unit testing – Junit, MRUnit, Runit test frameworks

-AWS , Azure components and usage

-Orchestration managers – Oozie , Airflow

-Automation scripts – Terraform , Cloud formation

-Docker container and related concepts

-Deep learning along with machine learning

-Spark along with Big data

-Tensorflow, Keras, H2O

- Introduction to Analytics
- What is business analytics
- Applications of analytics
- Importance of Business Analytics
- Steps of analytics
- Career opportunities and career path
- Companies using R for analytics extensively
- Role of a Data Scientist
- Problems solved by Data Science
- Roadmap to become Data Scientist

- Why R
- Installation procedure
- R interface and using R studio
- Setting Your Work Directory
- Downloading new packages
- Using R help
- Popular websites for R reference
- Installing Packages and Libraries in R Studio
- Source R script
- Overview of important R packages
- Data Mining GUI in R
- Graph GUI in R

- Basic Math functions
- Variables
- Data structures
- Understanding Vectors, data frames, Lists
- Loops, Control statement

- Data Types : Arrays & General Array Operations,Lists & General List Operations, Data Frame & General Data Frame Operations
- Factors, Data Acquisition (Import & Export), Subsetting Variables, Creating new variables

- Renaming and Recoding Variables ,Reshaping Data, Merging & Concatenating Datasets
- Using dply to manipulate data frames
- Data Type Conversion
- Data Values

- Control and Flow Operators
- Make a Script in R
- Writing Functions in R
- Creating R package

- Types of visualization
- Graphs in R
- Line Plots
- Bar Charts
- Pie Charts
- Histograms & Density Plots
- Scatter Plots
- 3-D, Parallel Coordinates

- Why study statistics
- Applications of statistics
- Types of statistics
- Population vs Sample
- Types of data
- Types of statistical variables
- Summarize the data
- Make decisions using summary statistics

Data extraction, cleaning and doing transformations and visualization in R

- Summarize the data
- Understand mean, median, mode,
- Standard deviation
- quartiles, box plot
- Correlation
- Probability
- Probability distribution
- Normal Distribution
- Skewness, Kurtosis
- Poison Distribution

- Random Variable
- Normal Distribution
- Sampling Concept
- Use statistical methods for managerial decision making
- Discuss applications of normal distribution

- Central limit theorem
- Confidence Interval
- How to interpret confidence interval
- Make statistical inferences through Confidence Intervals
- Calculate confidence intervals for population mean with known population standard deviation
- Calculate confidence intervals for population mean with unknown population standard deviation

- Learn how to state null and alternative hypotheses
- Business implications of hypothesis testing
- Understand type-I and type-II errors
- Conduct one-sided hypothesis test for population mean
- t test

- Understanding ANOVA
- One way Analysis of Variance (ANOVA)
- The ANOVA table in regression analysis
- F Ratio

- Introduction to regression methods
- Scatter plot Covariance Correlation coefficient
- Correlation and causality
- Linear regression
- Regressors
- Scatter plot matrix
- Ordinary Least Squares method (OLS)
- Assumptions of Linear Regression

- Interpretation of coefficient estimates
- Standard errors
- t-values and pvalues and adjusted R2, R2
- ANOVA table
- Residuals analysis
- Deletion diagnostics

- Partial correlation
- Plots – Fitted values vs Residuals, Regressors vs Residuals, Normal probability plot.
- Collinearity; Detection – correlation matrix, VIF, variance proportion table , AIC
- Subset selection, best subset
- Problem of insignificance of important regressors

- Generalized linear models
- Likelihood profiling
- Logistic regression on tabular data
- Prediction
- What are the types of Logistic Regression techniques ?
- How does Logistic Regression work ?
- How can you evaluate Logistic Regression’s model fit accuracy ?
- How is it different from linear regression?
- Why logistic regression is called regression not classification?

- What is machine learning?
- Learning system model
- Training and testing
- Performance
- Algorithms
- Machine learning structure
- What are we seeking?
- Learning techniques
- Applications
- KNN algorithm
- Instance Based Classifiers
- Nearest-Neighbor Classifiers
- Lazy vs. Eager Learning
- k-NN variations
- How to determine the good value for k
- When to Consider Nearest Neighbors
- Condensing
- Nearest Neighbour Issues

- Naïve Bayes Learning
- Conditional Probability
- Bayesian Theorem: Basics
- The Bayes Classifier
- Model Parameters
- Naïve Bayes Training
- Types of errors
- Sensitivity and Specificity
- ROC Curve
- Holdout estimation
- Cross-validation

- Key Requirements
- Decision Tree as a Rule Set
- How to Create a Decision Tree
- Choosing Attributes
- ID3 Heuristic
- Entropy
- Pruning Trees – Pre and Post
- Subtree Replacement, Raising

- Ensemble Approaches
- Bagging Model
- Boosting
- The AdaBoost Algorithm
- Gradient Boosting
- Random Forests
- Advantages, Disadvantages

- Background of Brain and Neuron
- Neural Networks
- Neurons Diagram
- Neuron Models- step function
- Perceptrons
- Network Architectures
- single-layer feed-forward

- Support Vector Machines for Classification
- Linear Discrimination
- Nonlinear Discrimination
- SVM Mathematically
- Extensions
- Application in Drug Design
- Data Classification
- Kernel Functions

- XgBoost for Classification
- What is XGBoost? Why is it so good?
- How does XGBoost work?
- Understanding XGBoost Tuning Parameters

- What is Cluster Analysis?
- Types of Data in Cluster Analysis
- A Categorization of Major Clustering Methods
- Partitioning Methods
- K means clustering
- Hierarchical Methods
- Density-Based Methods
- Grid-Based Methods
- Model-Based Clustering Methods
- Supervised Classification

- Curse of Dimensionality
- Dimension Reduction
- Why Factor or Component Analysis?
- Principal Component Analysis
- PCs, Variance and Least-Squares
- Eigenvectors of a Correlation Matrix
- Factor Analysis
- PCA process Steps

- Overfitting
- Regularization
- The L1 regularization (also called Lasso)
- The L2 regularization (also called Ridge)
- The L1/L2 regularization (also called Elastic net)
- Missing Data Imputation Techniques
- A/B Testing

- Basic Time Series and it’s components
- Moving Averages (Simple and Exponential)
- R’s inbuilt function ts()
- Plotting of time series
- Business Forecasting using moving average methods
- The ARIMA model
- Application of ARIMA model in Business

- NLP Resources,
- Language as a probabilistic phenomenon
- Zipf’s law, Word collocations
- NLP and text retrieval basics
- N-gram language models
- Hidden markov model (HMM)
- Part of speech tagging
- Decision trees, Naive Bayes, Support Vector Machines
- Feature selection schemes
- Latent semantics and clustering problem
- Introduction to Bayes nets and PGMs
- Latent Dirichlet Allocation
- Aspect extraction, Deception and Opinion spam
- Topic Modelling
- Word2Vec model
- Text preprocessing

• Noise Removal

• Lexicon Normalization - Lemmatization
- Stemming

• Syntactical Parsing - Part of Speech Tagging

• Entity Parsing - Phrase Detection
- Named Entity Recognition
- Topic Modelling
- N-Grams

• TF – IDF

• Text Matching - Levenshtein Distance
- Phonetic Matching
- Cosine Similarity

- Basic Introduction, Sets and Dates
- Conditional Field, Parameters and Logical Statement
- Tabular Calculation and Different Type of Graphs
- Level of Details
- Working with Multiple Files, Map Graphs and Dynamic Properties to Graphs
- Advance Analysis and Context
- Dashboards Story Board and Actions
- Tableau Public and Live Connection Data
- Dashboard Case Study

- SQL Overview
- SQL SELECT statements
- Functions and expressions
- SQL updating
- Joins
- SQL with multiple tables
- Summarization
- SQL: preparing for the real world

- Big Data – What, Why & Where?
- Big Data Technologies
- Hadoop and MapReduce
- Introduction to Spark
- Spark vs Hadoop
- Hadoop Basics – Hive and HDFS
- Spark using Scala and Scala Basics (Data Structures, Collections, Conditional Statements, Case Class)
- Data Import and Export – CSV files, XML, Oracle DB, Mysql DB data processing.
- SparkR and Pyspark
- What are RDD’s? – RDD Partitions / Lineage
- RDD Transformation and Actions
- RDD- Persisting and Caching
- Spark Context, SQL Context, Hive Context
- Data frame API’s, Dataset API’s
- AWS- RDS and EMR

- Python Overview
- Environment
- Basic Syntax
- Variable Type
- Basic Operator
- Decision Making
- Loops
- Numbers
- Strings
- Lists
- Unpacking a Sequence into Separate Variable
- Dictionary
- Calculation with Collections
- Extracting a subset of a Collection
- Manipulation of List and all Collections
- Iterators and generators
- Self-iterators
- Iterating in reverse
- Set theory
- Skipping values
- Tuples
- Dictionary
- Date and Time
- Functions
- Modules
- Files I/O
- Classes and Objects
- Regular Expressions
- CGI Programming
- Data Base Access
- Multithreading
- XML Processing
- GUI Programming
- Further Extensions