#### Why **Master Data Science II?**

**Master Data Science II **is a rigorous course and its complexity is higher because senior folks should have more knowledge than juniors.

So depth of content would be more and you have to prepare like a senior only!

If you are just looking for basics, that you can prepare yourself through MOOCs without paying anyone.

But if you are looking at going at senior roles, then you should have knowledge of all the things listed here.

You can even decide to master 2 streams later on out of the 5 streams – Big data , predictive modelling , machine learning, time series forecasting , NLP.

While other institutes show you all these courses as separate, we recommend you to study all the things which we have listed together to become an all round data science professional

The more knowledge you carry, the better you would be looked at.

Your CV would have higher chances of getting shortlisted compared to others since you would be knowing way more than them.

Also it would help you progress up in the corporate ladder due to the knowledge that you would carry.

## CURRICULUM

- What is business analytics
- Applications of analytics
- Importance of Business Analytics
- Steps of analytics
- Career opportunities and career path
- Companies using R for analytics extensively
- Role of a Data Scientist
- Problems solved by Data Science
- Roadmap to become Data Scientist

- Why R
- Installation procedure
- R interface and using R studio
- Setting Your Work Directory
- Downloading new packages
- Using R help
- Popular websites for R reference
- Installing Packages and Libraries in R Studio
- Source R script
- Overview of important R packages
- Data Mining GUI in R
- Graph GUI in R

- Basic Math functions
- Variables
- Data structures
- Understanding Vectors, data frames, Lists
- Loops, Contro Statement

- Data Types : Arrays & General Array Operations,Lists & General List Operations, Data Frame & General Data Frame Operations
- Factors, Data Acquisition (Import & Export), Subsetting Variables, Creating new variables

- Renaming and Recoding Variables ,Reshaping Data, Merging & Concatenating Datasets
- Using dply to manipulate data frames
- Data Type Conversion
- Data Values

- Control and Flow Operators
- Make a Script in R
- Writing Functions in R
- Creating R package

- Types of visualization
- Graphs in R
- Line Plots, Bar Charts, Pie Charts, Histograms & Density Plots, Scatter Plots 3-D, Parallel Coordinates

- Why study statistics
- Applications of statistics
- Types of statistics
- Population vs Sample
- Types of data
- Types of statistical variables
- Make decisions using summary statistics
- Summarize the data
- Data extraction, cleaning and doing transformations and visualization in R

- Basic shell scripting

- Why R
- Installation procedure
- R interface and using R studio
- Setting Your Work Directory
- Downloading new packages
- Using R help
- Popular websites for R reference
- Installing Packages and Libraries in R Studio
- Source R script
- Overview of important R packages
- Data Mining GUI in R
- Graph GUI in R

- Basic Math functions
- Variables
- Data structures
- Understanding Vectors
- Understanding data frames
- Understanding Lists
- Loops
- Control statement

- Data Types : Arrays & General Array Operations
- Data Types : Lists & General List Operations
- Data Types : Data Frame & General Data Frame Operations
- Factors
- Data Acquisition (Import & Export)
- Subsetting Variables
- Creating new variables

- Renaming and Recoding Variables
- Reshaping Data
- Merging & Concatenating Datasets
- Using dply to manipulate data frames
- Data Type Conversion
- Data Values

- Control and Flow Operators
- Make a Script in R
- Writing Functions in R
- Creating R package

- Types of visualization
- Graphs in R
- Line Plots
- Bar Charts
- Pie Charts
- Histograms & Density Plots
- Scatter Plots
- 3-D, Parallel Coordinates

- Why study statistics
- Applications of statistics
- Types of statistics
- Population vs Sample
- Types of data
- Types of statistical variables
- Summarize the data
- Make decisions using summary statistics

Data extraction, cleaning and doing transformations and visualization in R

- Summarize the data
- Understand mean, median, mode,
- Standard deviation
- quartiles, box plot
- Correlation
- Probability
- Probability distribution
- Normal Distribution
- Skewness, Kurtosis
- Poison Distribution

- Random Variable
- Normal Distribution
- Sampling Concept
- Use statistical methods for managerial decision making
- Discuss applications of normal distribution

- Central limit theorem
- Confidence Interval
- How to interpret confidence interval
- Make statistical inferences through Confidence Intervals
- Calculate confidence intervals for population mean with known population standard deviation
- Calculate confidence intervals for population mean with unknown population standard deviation

- Learn how to state null and alternative hypotheses
- Business implications of hypothesis testing
- Understand type-I and type-II errors
- Conduct one-sided hypothesis test for population mean
- t test

- Understanding ANOVA
- One way Analysis of Variance (ANOVA)
- The ANOVA table in regression analysis
- F Ratio

- Regression
- Scatter plot Covariance Correlation coefficient
- Correlation and causality
- Linear regression
- Regressors
- Scatter plot matrix
- Ordinary Least Squares method (OLS) Assumptions of Linear Regression

- Interpretation of coefficient estimates
- Standard errors
- t-values and pvalues and adjusted R2, R2
- ANOVA table
- Residuals analysis
- Deletion diagnostics

- Partial correlation
- Plots – Fitted values vs Residuals, Regressors vs Residuals, Normal probability plot.
- Collinearity; Detection – correlation matrix, VIF, variance proportion table , AIC
- Subset selection, best subset
- Problem of insignificance of important regressors

- Generalized linear models
- Likelihood profiling
- Logistic regression on tabular data
- Prediction
- What are the types of Logistic Regression techniques ? How does Logistic Regression work ? How can you evaluate Logistic Regression’s model fit accuracy ? How is it different from linear regression? Why logistic regression is called regression not classification?

- What is machine learning?
- Learning system model
- Training and testing
- Performance
- Algorithms
- Machine learning structure
- What are we seeking?
- Learning techniques
- Applications

- Data Mining- Cluster Analysis
- What is Cluster Analysis?
- Types of Data in Cluster Analysis
- A Categorization of Major Clustering Methods
- Partitioning Methods
- K means clustering
- Hierarchical Methods
- Density-Based Methods
- Grid-Based Methods
- Model-Based Clustering Methods
- Supervised Classification

Data Mining – Factor Analysis and PCA

- Curse of Dimensionality
- Dimension Reduction
- Why Factor or Component Analysis?
- Principal Component Analysis
- PCs, Variance and Least-Squares
- Eigenvectors of a Correlation Matrix
- Factor Analysis
- PCA process Steps

- Basic Time Series and it’s components
- Moving Averages (Simple and Exponential)
- R’s inbuilt function ts()
- Plotting of time series
- Business Forecasting using moving average methods
- The ARIMA model
- Application of ARIMA model in Business

- NLP Resources,Language as a probabilistic phenomenon
- Zipf’s law, Word collocations
- NLP and text retrieval basics
- N-gram language models
- Hidden markov model (HMM)
- Part of speech tagging
- Decision trees, Naive Bayes, Support Vector Machines
- Feature selection schemes
- Latent semantics and clustering problem
- Introduction to Bayes nets and PGMs
- Latent Dirichlet Allocation
- Aspect extraction, Deception and Opinion spam
- Topic Modelling
- Word2Vec model
- Text preprocessing

- Basic Introduction, Sets and Dates
- Conditional Field, Parameters and Logical Statement
- Tabular Calculation and Different Type of Graphs
- Level of Details
- Working with Multiple Files, Map Graphs and Dynamic Properties to Graphs
- Advance Analysis and Context
- Dashboards Story Board and Actions
- Tableau Public and Live Connection Data
- Dashboard Case Study

- SQL Overview
- SQL SELECT statements
- Functions and expressions
- SQL updating
- Joins
- SQL with multiple tables
- Summarization
- SQL: preparing for the real world

- Big Data – What, Why & Where?
- Big Data Technologies
- Hadoop and MapReduce
- Introduction to Spark
- Spark vs Hadoop
- Hadoop Basics – Hive and HDFS
- Spark using Scala and Scala Basics (Data Structures, Collections, Conditional Statements, Case Class)
- Data Import and Export – CSV files, XML, Oracle DB, Mysql DB data processing.
- SparkR and Pyspark
- What are RDD’s? – RDD Partitions / Lineage
- RDD Transformation and Actions
- RDD- Persisting and Caching
- Spark Context, SQL Context, Hive Context
- Data frame API’s, Dataset API’s
- AWS- RDS and EMR

- Python Overview
- Environment
- Basic Syntax
- Variable Type
- Basic Operator
- Decision Making
- Loops
- Numbers
- Strings
- Lists
- Unpacking a Sequence into Separate Variable
- Dictionary
- Calculation with Collections
- Extracting a subset of a Collection
- Manipulation of List and all Collections
- Iterators and generators
- Self-iterators
- Iterating in reverse
- Set theory
- Skipping values
- Tuples
- Dictionary
- Date and Time
- Functions
- Modules
- Files I/O
- Classes and Objects
- Regular Expressions
- CGI Programming
- Data Base Access
- Multithreading
- XML Processing
- GUI Programming
- Further Extensions
- Getting Started with Raw Data
- The world of arrays with NumPy
- Creating an array
- Mathematical operations
- Array subtraction
- Squaring an array
- A trigonometric function performed on the array
- Conditional operations
- Matrix multiplication
- Indexing and slicing
- Shape manipulation
- Empowering data analysis with pandas
- The data structure of pandas
- Series
- DataFrame
- Panel
- Inserting and exporting data
- CSV
- XLS
- JSON
- Database
- Checking the missing data
- Filling the missing data
- String operations
- Merging data
- Aggregation operations
- The inner join
- The left outer join
- The full outer join
- The groupby function