Can not find a better model for this NLP Problem. Any help? #430

MiguelCanalGarcia · 2022-08-16T12:56:51Z

MiguelCanalGarcia
Aug 16, 2022

Hello, I am trying to use the imdb reviews dataset to get a 0.9 accuracy on the test set. However I am not able to think a about a model capable of that. Could someone help me?

LOADING DATA

import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np

(train, test), metadata = tfds.load('imdb_reviews',
split=['train', 'test'],
as_supervised=True,
with_info=True)

PREPARING DATA

training_sentences = []
training_labels = []

testing_sentences = []
testing_labels = []

for s, l in train:
training_sentences.append(s.numpy().decode('utf8'))
training_labels.append(l.numpy())

for s, l in test:
testing_sentences.append(s.numpy().decode('utf8'))
testing_labels.append(l.numpy())

training_labels_final = np.array(training_labels)
testing_labels_final = np.array(testing_labels)

TOKENIZATION

vocab_size = 1000
embedding_dim = 12
max_len = 120
trun_type = 'post'
oov_tok = ''

tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=vocab_size,
oov_token=oov_tok)

tokenizer.fit_on_texts(training_sentences)

sequences = tokenizer.texts_to_sequences(training_sentences)

padded_seq = tf.keras.preprocessing.sequence.pad_sequences(sequences,
maxlen=max_len,
truncating=trun_type)

test_sequences = tokenizer.texts_to_sequences(testing_sentences)
test_padded_seq = tf.keras.preprocessing.sequence.pad_sequences(test_sequences,
maxlen=max_len,
truncating=trun_type)

MODEL

model_3 = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(120,)),
tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_len ),
tf.keras.layers.Conv1D(filters=36, kernel_size=5, padding='same', activation='relu'),

tf.keras.layers.MaxPooling1D(),
tf.keras.layers.Bidirectional(tf.keras.layers.GRU(36, return_sequences=True)),
tf.keras.layers.LSTM(36),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(36,activation = 'relu'),
tf.keras.layers.Dense(1, activation='sigmoid')

])

model_3.compile(
loss='binary_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy']
)

history = model_3.fit(
padded_seq, training_labels_final,
validation_data=(test_padded_seq, testing_labels_final),
epochs=10,
)

hazam328 · 2022-08-18T07:20:51Z

hazam328
Aug 18, 2022

what accuracy you get when you build this model, and evaluate on test dataset?

Because if your training loss and validation loss isn't near to each other there is definitely overfitting or underfitting your model or if loss is okay but accuracy isn't as good then you must add more layers or tune some units of layers that must help as overall it helps for many people.
I recommend you should also try pre-trained model from tensorflow hub as it works extremely perfect on most NLP problems

1 reply

MiguelCanalGarcia Aug 18, 2022
Author

Hello @hazam328, I have added this preprocesion cleaning and the accuracy has risen to 0.84 more or less.

**def clean_up(review):
clean = re.sub('[^A-Za-z0-9 ]', " ", review)
# replace multiple space by a single
clean = re.sub(' +', ' ', clean)

word_tokens = clean.lower().split()

# 4. Remove stopwords
le = WordNetLemmatizer()
stop_words = set(stopwords.words("english"))
stop_words.add("co")
stop_words.add("http")
word_tokens = [le.lemmatize(w) for w in word_tokens if not w in stop_words]

cleaned = " ".join(word_tokens)




return cleaned**

I am not using thensorflow hub because I dont know if I can use it in the tensorflow exam, did you use it ? Also, have you got a better clean_up function? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can not find a better model for this NLP Problem. Any help? #430

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Can not find a better model for this NLP Problem. Any help? #430

Uh oh!

MiguelCanalGarcia Aug 16, 2022

Replies: 1 comment · 1 reply

Uh oh!

hazam328 Aug 18, 2022

Uh oh!

Uh oh!

MiguelCanalGarcia Aug 18, 2022 Author

MiguelCanalGarcia
Aug 16, 2022

Replies: 1 comment 1 reply

hazam328
Aug 18, 2022

MiguelCanalGarcia Aug 18, 2022
Author