Data Science with Python

Curriculum for Data Science with Python

  • What is analytics & Data Science?
  • Common Terms in Analytics
  • Analytics vs. Data warehousing, OLAP, MIS Reporting
  • Relevance in industry and need of the hour
  • Types of problems and business objectives in various industries
  • How leading companies are harnessing the power of analytics?
  • Critical success drivers
  • Overview of analytics tools & their popularity
  • Analytics Methodology & problem solving framework
  • List of steps in Analytics projects
  • Identify the most appropriate solution design for the given problem statement
  • Project plan for Analytics project & key milestones based on effort estimates
  • Build Resource plan for analytics project
  • Why Python for data science?
  • Overview of Python- Starting with Python
  • Introduction to installation of Python
  • Introduction to Python Editors & IDE’s(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…)
  • Understand Jupyter notebook & Customize Settings
  • Concept of Packages/Libraries – Important packages(NumPy, SciPy, scikit-learn, Pandas, Matplotlib, etc)
  • Installing & loading Packages & Name Spaces
  • Data Types & Data objects/structures (strings, Tuples, Lists, Dictionaries)
  • List and Dictionary Comprehensions
  • Variable & Value Labels –  Date & Time Values
  • Basic Operations – Mathematical – string – date
  • Reading and writing data
  • Simple plotting
  • Control flow & conditional statements
  • Debugging & Code profiling
  • How to create class and modules and how to call them?
  • Numpy, scify, pandas, scikitlearn, statmodels, nltk etc
  • Importing Data from various sources (Csv, txt, excel, access etc)
  • Database Input (Connecting to database)
  • Viewing Data objects – subsetting, methods
  • Exporting Data to various formats
  • Important python modules: Pandas, beautifulsoup
  • Cleansing Data with Python
  • Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived variables, sampling, Data type conversions, renaming, formatting etc)
  • Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)
  • Python Built-in Functions (Text, numeric, date, utility functions)
  • Python User Defined Functions
  • Stripping out extraneous information
  • Normalizing data
  • Formatting data
  • Important Python modules for data manipulation (Pandas, Numpy, re, math, string, datetime etc)
  • Introduction exploratory data analysis
  • Descriptive statistics, Frequency Tables and summarization
  • Univariate Analysis (Distribution of data & Graphical Analysis)
  • Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
  • Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc)
  • Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, seaborn, Pandas and scipy.stats etc)
  • Basic Statistics – Measures of Central Tendencies and Variance
  • Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
  • Inferential Statistics -Sampling – Concept of Hypothesis Testing
  • Statistical Methods – Z/t-tests( One sample, independent, paired), Anova, Correlations and Chi-square
  • Important modules for statistical methods: Numpy, Scipy, Pandas
  • Concept of model in analytics and how it is used?
  • Common terminology used in analytics & modeling process
  • Popular modeling algorithms
  • Types of Business problems – Mapping of Techniques
  • Different Phases of Predictive Modeling
  • Need for structured exploratory data
  • EDA framework for exploring the data and identifying any problems with the data (Data Audit Report)
  • Identify missing data
  • Identify outliers data
  • Visualize the data trends and patterns
  • Need of Data preparation
  • Consolidation/Aggregation – Outlier treatment – Flat Liners – Missing values- Dummy creation – Variable Reduction
  • Variable Reduction Techniques – Factor & PCA Analysis
  • Need for structured exploratory data
  • EDA framework for exploring the data and identifying any problems with the data (Data Audit Report)
  • Identify missing data
  • Identify outliers data
  • Visualize the data trends and patterns
  • Introduction – Applications
  • Assumptions of Linear Regression
  • Building Linear Regression Model
  • Understanding standard metrics (Variable significance, R-square/Adjusted R-square, Global hypothesis ,etc)
  • Assess the overall effectiveness of the model
  • Validation of Models (Re running Vs. Scoring)
  • Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc.)
  • Interpretation of Results – Business Validation – Implementation on new data
  • Introduction – Applications
  • Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
  • Building Logistic Regression Model (Binary Logistic Model)
  • Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification, ROC Curve etc)
  • Validation of Logistic Regression Models (Re running Vs. Scoring)
  • Standard Business Outputs (Decile Analysis, ROC Curve, Probability Cut-offs, Lift charts, Model equation, Drivers or variable importance, etc)
  • Interpretation of Results – Business Validation – Implementation on new data
  • Introduction to Segmentation
  • Types of Segmentation (Subjective Vs Objective, Heuristic Vs. Statistical)
  • Heuristic Segmentation Techniques (Value Based, RFM Segmentation and Life Stage Segmentation)
  • Behavioral Segmentation Techniques (K-Means Cluster Analysis)
  • Cluster evaluation and profiling – Identify cluster characteristics
  • Interpretation of results – Implementation on new data
  • Introduction – Applications
  • Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition
  • Classification of Techniques(Pattern based – Pattern less)
  • Basic Techniques – Averages, Smoothening, etc
  • Advanced Techniques – AR Models, ARIMA, etc
  • Understanding Forecasting Accuracy – MAPE, MAD, MSE, etc
  • Introduction to Machine Learning & Predictive Modeling
  • Types of Business problems – Mapping of Techniques – Regression vs. classification vs. segmentation vs. Forecasting
  • Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
  • Different Phases of Predictive Modeling (Data Pre-processing, Sampling, Model Building, Validation)
  • Overfitting (Bias-Variance Trade off) & Performance Metrics
  • Feature engineering & dimension reduction
  • Concept of optimization & cost function
  • Overview of gradient descent algorithm
  • Overview of Cross validation(Bootstrapping, K-Fold validation etc)
  • Model performance metrics (R-square, Adjusted R-squre, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics )
  • What is segmentation & Role of ML in Segmentation?
  • Concept of Distance and related math background
  • K-Means Clustering
  • Expectation Maximization
  • Hierarchical Clustering
  • Spectral Clustering (DBSCAN)
  • Principle component Analysis (PCA)
  • Decision Trees – Introduction – Applications
  • Types of Decision Tree Algorithms
  • Construction of Decision Trees through Simplified Examples; Choosing the “Best” attribute at each Non-Leaf node; Entropy; Information Gain, Gini Index, Chi Square, Regression Trees
  • Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with Numerical Variables; other Measures of Randomness
  • Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as Rules
  • Decision Trees – Validation
  • Overfitting – Best Practices to avoid
  • Concept of Ensembling
  • Manual Ensembling Vs. Automated Ensembling
  • Methods of Ensembling (Stacking, Mixture of Experts)
  • Bagging (Logic, Practical Applications)
  • Random forest (Logic, Practical Applications)
  • Boosting (Logic, Practical Applications)
  • Ada Boost
  • Gradient Boosting Machines (GBM)
  • XGBoost
  • Motivation for Neural Networks and Its Applications
  • Perceptron and Single Layer Neural Network, and Hand Calculations
  • Learning In a Multi Layered Neural Net: Back Propagation and Conjugant Gradient Techniques
  • Neural Networks for Regression
  • Neural Networks for Classification
  • Interpretation of Outputs and Fine tune the models with hyper parameters
  • Validating ANN models
  • Motivation for Support Vector Machine & Applications
  • Support Vector Regression
  • Support vector classifier (Linear & Non-Linear)
  • Mathematical Intuition (Kernel Methods Revisited, Quadratic Optimization and Soft Constraints)
  • Interpretation of Outputs and Fine tune the models with hyper parameters
  • Validating SVM models
  • What is KNN & Applications?
  • KNN for missing treatment
  • KNN For solving regression problems
  • KNN for solving classification problems
  • Validating KNN model
  • Model fine tuning with hyper parameters
  • Concept of Conditional Probability
  • Bayes Theorem and Its Applications
  • Naïve Bayes for classification
  • Applications of Naïve Bayes in Classifications
  • Taming big text, Unstructured vs. Semi-structured Data; Fundamentals of information retrieval, Properties of words; Creating Term-Document (TxD);Matrices; Similarity measures, Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking)
  • Finding patterns in text: text mining, text as a graph
  • Natural Language processing (NLP)
  • Text Analytics – Sentiment Analysis using Python
  • Text Analytics – Word cloud analysis using Python
  • Text Analytics – Segmentation using K-Means/Hierarchical Clustering
  • Text Analytics – Classification (Spam/Not spam)
  • Applications of Social Media Analytics
  • Metrics(Measures Actions) in social media analytics
  • Examples & Actionable Insights using Social Media Analytics
  • Important python modules for Machine Learning (SciKit Learn, stats models, scipy, nltk etc)
  • Fine tuning the models using Hyper parameters, grid search, piping etc.
  • Banking
  • Healthcare
  • Tourism
  • Marketing
  • Retail
  • Telecom
  • HR
  • Energy
  • Insurance
  • Education