Text Analysis and NLP Tutorials

Note:Some codes have rendering issue on GitHub.Please, download them to open in local machine or use Codespaces

Overview

This repository provides step-by-step tutorials on Text Analysis and Natural Language Processing (NLP). It is designed to guide learners and practitioners from foundational concepts to advanced techniques like transformer-based models. Each tutorial includes power point slides, code examples, and practical applications.

Topics

1. Text Preprocessing

Tokenization
Stemming and Lemmatization
Stop-word Removal
Handling special characters and punctuations
Bag of Words Approach
Vectorization
TF-IDF

2. Dictionary Methods and Text Similarity

Dictionary Methods
Cosine Similarity
Jaccard Index
Euclidean Distance

3. Word Embeddings

Understand word embeddings and their applications: Learn about Word2Vec, GloVe, and FastText.
Use embeddings to calculate text similarity.
Dimentionality: PCA and t-SNE

4. Text Classification with Traditional ML I

Vectorizing text data using Bag-of-Words and TF-IDF
Obtain text similarity and conduct regression analysis
Building classifiers for text classification.
Confusion matrix, metrics, and other model evaluation

5. Text Classification with Traditional ML II

Regularization: LASSO, Ridge, Elastic Net
Ensemble Learning: Boosting and Stacking
Model optimization

6. Transformer-Based Models for NLP

Understanding transformer architecture
How transformer architecture revolutionized NLP
Understand architectures like BERT, GPT, and RoBERTa.
What can we achieve with Transformer-based models: Text classification, text generation, and semantic similarity.

7. Text Classification with LLMs

Text classification with LLMs.
Comparison of ML-based and transformer-based methods.
Practical applications: classifying costomer complaints with BERT, RoBERTa, and T5.

8. Sentiment Analysis

Lexicon-Based Methods: Using sentiment dictionaries like VADER and SentiWordNet.
Machine Learning-Based Models: Training classifiers using labeled sentiment datasets.
Transformer-Based Methods: Fine-tuning models like BERT for advanced sentiment prediction.
Practical applications: social media analysis with Twitter-roBERTa-base-sentiment-latest.

9. Topic Modeling

Latent Dirichlet Allocation (LDA)
Structural Topic Models
BERTopic

10. Named Entity Recognition (NER)

Extract structured entities like names, dates, and locations from text.
pre-trained spaCy models
custom NER models using transformer-based architectures.

11. Custom NLP Pipelines from Hugging Face Models

Build end-to-end NLP pipelines for real-world applications.
Multi-step text processing workflows (e.g., preprocessing → classification → summarization).
Integrating multiple NLP tasks into a single cohesive system.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.vscode		.vscode
ACLED.ipynb		ACLED.ipynb
BERT_Topic_Modeling_UN_Speeches.ipynb		BERT_Topic_Modeling_UN_Speeches.ipynb
README.md		README.md
Text Analysis with ML.qmd		Text Analysis with ML.qmd
Text Preprocessing.qmd		Text Preprocessing.qmd
Text_Classification_with_LLMs.ipynb		Text_Classification_with_LLMs.ipynb
Topic Modeling.qmd		Topic Modeling.qmd
curate_faculty_pubs.ipynb		curate_faculty_pubs.ipynb
faculty_pubs .ipynb		faculty_pubs .ipynb
sentiment_analysis.ipynb		sentiment_analysis.ipynb
word embeddings.qmd		word embeddings.qmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Analysis and NLP Tutorials

Overview

Topics

1. Text Preprocessing

2. Dictionary Methods and Text Similarity

3. Word Embeddings

4. Text Classification with Traditional ML I

5. Text Classification with Traditional ML II

6. Transformer-Based Models for NLP

7. Text Classification with LLMs

8. Sentiment Analysis

9. Topic Modeling

10. Named Entity Recognition (NER)

11. Custom NLP Pipelines from Hugging Face Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Analysis and NLP Tutorials

Overview

Topics

1. Text Preprocessing

2. Dictionary Methods and Text Similarity

3. Word Embeddings

4. Text Classification with Traditional ML I

5. Text Classification with Traditional ML II

6. Transformer-Based Models for NLP

7. Text Classification with LLMs

8. Sentiment Analysis

9. Topic Modeling

10. Named Entity Recognition (NER)

11. Custom NLP Pipelines from Hugging Face Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages