Note:Some codes have rendering issue on GitHub.Please, download them to open in local machine or use Codespaces
This repository provides step-by-step tutorials on Text Analysis and Natural Language Processing (NLP). It is designed to guide learners and practitioners from foundational concepts to advanced techniques like transformer-based models. Each tutorial includes power point slides, code examples, and practical applications.
- Tokenization
- Stemming and Lemmatization
- Stop-word Removal
- Handling special characters and punctuations
- Bag of Words Approach
- Vectorization
- TF-IDF
- Dictionary Methods
- Cosine Similarity
- Jaccard Index
- Euclidean Distance
- Understand word embeddings and their applications: Learn about Word2Vec, GloVe, and FastText.
- Use embeddings to calculate text similarity.
- Dimentionality: PCA and t-SNE
- Vectorizing text data using Bag-of-Words and TF-IDF
- Obtain text similarity and conduct regression analysis
- Building classifiers for text classification.
- Confusion matrix, metrics, and other model evaluation
- Regularization: LASSO, Ridge, Elastic Net
- Ensemble Learning: Boosting and Stacking
- Model optimization
- Understanding transformer architecture
- How transformer architecture revolutionized NLP
- Understand architectures like BERT, GPT, and RoBERTa.
- What can we achieve with Transformer-based models: Text classification, text generation, and semantic similarity.
- Text classification with LLMs.
- Comparison of ML-based and transformer-based methods.
- Practical applications: classifying costomer complaints with BERT, RoBERTa, and T5.
- Lexicon-Based Methods: Using sentiment dictionaries like VADER and SentiWordNet.
- Machine Learning-Based Models: Training classifiers using labeled sentiment datasets.
- Transformer-Based Methods: Fine-tuning models like BERT for advanced sentiment prediction.
- Practical applications: social media analysis with Twitter-roBERTa-base-sentiment-latest.
- Latent Dirichlet Allocation (LDA)
- Structural Topic Models
- BERTopic
- Extract structured entities like names, dates, and locations from text.
- pre-trained spaCy models
- custom NER models using transformer-based architectures.
- Build end-to-end NLP pipelines for real-world applications.
- Multi-step text processing workflows (e.g., preprocessing → classification → summarization).
- Integrating multiple NLP tasks into a single cohesive system.