added text classification tutorial

DangerStep · Mar 20, 2020 · d5a8a37 · d5a8a37
1 parent ec0f854
commit d5a8a37
Show file tree

Hide file tree

Showing 15 changed files with 100,691 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -31,6 +31,7 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
     - ### [Natural Language Processing](https://www.thepythoncode.com/topic/nlp)
         - [How to Build a Spam Classifier using Keras in Python](https://www.thepythoncode.com/article/build-spam-classifier-keras-python). ([code](machine-learning/nlp/spam-classifier))
         - [How to Build a Text Generator using Keras in Python](https://www.thepythoncode.com/article/text-generation-keras-python). ([code](machine-learning/nlp/text-generator))
+        - [How to Perform Text Classification in Python using Tensorflow 2 and Keras](https://www.thepythoncode.com/article/text-classification-using-tensorflow-2-and-keras-in-python). ([code](machine-learning/nlp/text-classification))
     - ### [Computer Vision](https://www.thepythoncode.com/topic/computer-vision)
         - [How to Detect Human Faces in Python using OpenCV](https://www.thepythoncode.com/article/detect-faces-opencv-python). ([code](machine-learning/face_detection))
         - [How to Make an Image Classifier in Python using Keras](https://www.thepythoncode.com/article/image-classification-keras-python). ([code](machine-learning/image-classifier))

diff --git a/machine-learning/nlp/text-classification/20_news_group_classification.py b/machine-learning/nlp/text-classification/20_news_group_classification.py
@@ -0,0 +1,43 @@
+from tensorflow.keras.callbacks import TensorBoard
+
+import os
+
+from parameters import *
+from utils import create_model, load_20_newsgroup_data
+
+# create these folders if they does not exist
+if not os.path.isdir("results"):
+    os.mkdir("results")
+
+if not os.path.isdir("logs"):
+    os.mkdir("logs")
+
+if not os.path.isdir("data"):
+    os.mkdir("data")
+
+# dataset name, IMDB movie reviews dataset
+dataset_name = "20_news_group"
+# get the unique model name based on hyper parameters on parameters.py
+model_name = get_model_name(dataset_name)
+
+# load the data
+data = load_20_newsgroup_data(N_WORDS, SEQUENCE_LENGTH, TEST_SIZE, oov_token=OOV_TOKEN)
+
+model = create_model(data["tokenizer"].word_index, units=UNITS, n_layers=N_LAYERS, 
+                    cell=RNN_CELL, bidirectional=IS_BIDIRECTIONAL, embedding_size=EMBEDDING_SIZE, 
+                    sequence_length=SEQUENCE_LENGTH, dropout=DROPOUT, 
+                    loss=LOSS, optimizer=OPTIMIZER, output_length=data["y_train"][0].shape[0])
+
+model.summary()
+
+tensorboard = TensorBoard(log_dir=os.path.join("logs", model_name))
+
+history = model.fit(data["X_train"], data["y_train"],
+                    batch_size=BATCH_SIZE,
+                    epochs=EPOCHS,
+                    validation_data=(data["X_test"], data["y_test"]),
+                    callbacks=[tensorboard],
+                    verbose=1)
+
+
+model.save(os.path.join("results", model_name) + ".h5")
diff --git a/machine-learning/nlp/text-classification/README.md b/machine-learning/nlp/text-classification/README.md
@@ -0,0 +1,6 @@
+# [How to Perform Text Classification in Python using Tensorflow 2 and Keras](https://www.thepythoncode.com/article/text-classification-using-tensorflow-2-and-keras-in-python)
+To use this:
+- `pip3 install -r requirements.txt`
+- Please read [this tutorial](https://www.thepythoncode.com/article/text-classification-using-tensorflow-2-and-keras-in-python) before using this.
+- Tweak the hyperparameters in `parameters.py` and train the model.
+- For testing, consider using `test.py`