Codes to Sample, Explore, Transform, Train Models and Validate Models.
This repository is a collection of code snippets I created while studying Python and translating methods I previously used with SAS and R. It also reflects some of my professional experience working with these techniques.
Please note that this is not a Python package but rather a supplementary resource to the official documentation for libraries like scikit-learn, pandas, and NumPy. Each section typically includes a README file and a Python script with examples. In some cases, I've also added a Jupyter Notebook to demonstrate the functions and methods in action.
The README files usually explain the underlying technique, its practical applications, and guidance on when and how to tweak parameters for better results.
This resource is not focused on deep learning. Instead, it is better suited for scenarios where staying in touch with the data, exploring it, interpreting relationships, and building statistical models is the priority. It follows a structured framework:
- Sample: Access the data.
- Explore and Transform: Iterate within this loop to understand and prepare the data.
- Train: Build models.
- Validate: Assess and refine results.
- Table Overall Structure
- Univariate:
- Bivariate, pair-wise Analysis:
- Cluster Analysis
- Dimension Reduction:
- Merge and Append Tables
- Aggregate Functions
- Variable Recoding
- Standardization
- Math Transformations
- Missing Handling
- String Cleaning for Text Categorization
- Linear Regression
- Generalized Linear Models
- Logistic Regression
- Decision Trees
- Regression Trees
- Random Forest
- Boosting For Classification
- Gradient Boosting For Quantification
- Neural Networks
- Support Vector Machine
- Naive Bayes
- K-Nearest Neighbors