This notebook was created by Jean de Dieu Nyandwi for the love of machine learning community. For any feedback, errors or suggestion, he can be reached on email (johnjw7084 at gmail dot com), Twitter, or LinkedIn.
Using Convolutional Neural Networks for Texts Classification¶
1. Intro to CNNs for Texts¶
Convolutional neural networks are remarkably a suitable network architectures for images, or computer vision tasks in general because of their ability to extract features from images using filters.
A recent studies(Text Understanding from Scratch by Xiang Zhang & Yann LeCun, Recurrent Convolutional Neural Networks for Text Classification by Siwei Lai & Liheng Xu & Kang Liu & Jun Zhao) has shown that CNNs can also perform wel on texts just as they do on images.
Quoting LeCun paper,
- CNNs don't require knowledge of words. And that means it is fine to pass characters to the network. We have previously been using words.
- CNNs do not require knowledge of syntax or semantic structures.
- While RNNs can be expensive to run, CNNs are away cheaper.
In text applications or any sequential datasets in general, we use Conv1D instead of Conv2D. The design of Conv1D is quite similar to Conv2D, the only difference is the dimension of the input. So, it is made of convolution and pooling layers.
Below is a typical CNN inspired design for texts. Source: Text Understanding from Scratch, Yann LeCun.
Here is the overview of the main parameters of Conv1D.
If you worked with CNNs before, you probably don't see any new thing. filters
specify the number of filters, kernel_size
denotes the size of each filter (remember it is one dimension), strides
is the number of steps the filter will take as it go through the input texts, padding
for ading zeros (valid
for not padding, same
for padding). And activation
denotes the activation function to use.
tf.keras.layers.Conv1D(
filters,
kernel_size,
strides=1,
padding="valid",
activation=None,
)
2. CNN's for Texts In Practice: News Classification¶
2.1 Getting the Data¶
In this practice, we will use news dataset, ag_news_subset
that is available on TensorFlow datasets.
TensorFlow datasets is a collection of awesome datasets that can be used right away with little preparations.
AG is a collection of more than 1 million news articles gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity.
The AG dataset contains 4 classes that are: World(0), Sports(1), Business(2), Sci/Tech(3). The total number of training samples is 120,000 and testing 7,600. Each class contains 30,000 training samples and 1,900 testing samples.
You can learn more about the dataset here, or read the orginal paper that used it to explore the use of character-level convolutional networks (ConvNets) for text classification by Xiang Zhang, Junbo Zhao, and Yann LeCun.
Let's get the dataset from TensorFlow datasets.
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
import numpy as np
import pandas as pd
The dataset that we are going to download has only one version so far. Let's specify it so that when it is updated, our lab will not be affected. If loading the data fails, run the cell again. It happens sometime when loading datasets from TF datasets.
Orginally, the training set contains 120.000 news samples, whereas test set contain 7600 samples. Let's take 10% percent of news from training set to validation set so we can increase it a little bit.
(train_data, val_data), info = tfds.load('ag_news_subset:1.0.0', #version 1.0.0
split=['train[:90%]', 'train[90%:]+test'],
with_info=True,
as_supervised=True
)
We can display the info
that we loaded with dataset.
print(info)
tfds.core.DatasetInfo( name='ag_news_subset', version=1.0.0, description='AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html . The AG's news topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015). The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600.', homepage='https://arxiv.org/abs/1509.01626', features=FeaturesDict({ 'description': Text(shape=(), dtype=tf.string), 'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=4), 'title': Text(shape=(), dtype=tf.string), }), total_num_examples=127600, splits={ 'test': 7600, 'train': 120000, }, supervised_keys=('description', 'label'), citation="""@misc{zhang2015characterlevel, title={Character-level Convolutional Networks for Text Classification}, author={Xiang Zhang and Junbo Zhao and Yann LeCun}, year={2015}, eprint={1509.01626}, archivePrefix={arXiv}, primaryClass={cs.LG} }""", redistribution_info=, )
# Displaying the classes
class_names = info.features['label'].names
num_classes = info.features['label'].num_classes
print(f'The news are grouped into {num_classes} classes that are :{class_names}')
The news are grouped into 4 classes that are :['World', 'Sports', 'Business', 'Sci/Tech']
num_train = info.splits['train'].num_examples
num_val = info.splits['test'].num_examples
print(f'The number of training samples: {num_train} \nThe number of validation samples: {num_val}')
The number of training samples: 120000 The number of validation samples: 7600
We can also display the first 10 news samples. We can use tfds.as_dataframe
to display them as dataframe.
news_df = tfds.as_dataframe(train_data.take(10), info)
news_df.head(10)
description | label | |
---|---|---|
0 | b'AMD #39;s new dual-core Opteron chip is desi... | 3 |
1 | b'Reuters - Major League Baseball\\Monday anno... | 1 |
2 | b'President Bush #39;s quot;revenue-neutral q... | 2 |
3 | b'Britain will run out of leading scientists u... | 3 |
4 | b'London, England (Sports Network) - England m... | 1 |
5 | b'TOKYO - Sony Corp. is banking on the \\$3 bi... | 0 |
6 | b'Giant pandas may well prefer bamboo to lapto... | 3 |
7 | b'VILNIUS, Lithuania - Lithuania #39;s main pa... | 0 |
8 | b'Witnesses in the trial of a US soldier charg... | 0 |
9 | b'Dan Olsen of Ponte Vedra Beach, Fla., shot a... | 1 |
Let's display some full news.
for i in range (0,4):
print(f"Sample news {i}\n \
Label: {news_df['label'][i]} {(class_names[i])}\n \
Description: {news_df['description'][i]}\n----------\n")
Sample news 0 Label: 3 World Description: b'AMD #39;s new dual-core Opteron chip is designed mainly for corporate computing applications, including databases, Web services, and financial transactions.' ---------- Sample news 1 Label: 1 Sports Description: b'Reuters - Major League Baseball\\Monday announced a decision on the appeal filed by Chicago Cubs\\pitcher Kerry Wood regarding a suspension stemming from an\\incident earlier this season.' ---------- Sample news 2 Label: 2 Business Description: b'President Bush #39;s quot;revenue-neutral quot; tax reform needs losers to balance its winners, and people claiming the federal deduction for state and local taxes may be in administration planners #39; sights, news reports say.' ---------- Sample news 3 Label: 3 Sci/Tech Description: b'Britain will run out of leading scientists unless science education is improved, says Professor Colin Pillinger.' ----------
2.2 Preparing the Data¶
We can not feed the raw texts that we loaded from the tensorflow datasets to the model. We have to do some preps works.
One of the major thing we have to do is to vectorize the texts or convert them into numeric tokens.
We will use TextVectorizer, but before we get there, let's first shuffle and batch the training data.
For validation data and test data, we don't shuffle. We only batch it.
buffer_size = 1000
batch_size = 32
train_data = train_data.shuffle(buffer_size)
train_data = train_data.batch(batch_size).prefetch(1)
val_data = val_data.batch(batch_size).prefetch(1)
for news, label in train_data.take(1):
print(f'Sample news\n----\n {news.numpy()[:4]} \n----\nCorresponding labels: {label.numpy()[:4]}')
Sample news ---- [b'About seven or eight years ago, Harry Shatel had that conversation with Charlie Weis about the future. Shatel, the Morristown High School baseball coach, doesn #39;t remember the exact date ' b"AFP - Australia's foreign minister will pay a rare visit to North Korea this week for talks on its nuclear programme after creating a stir here by warning a North Korean missile would be able to hit Sydney." b' quot;The Electoral Commission set the date of January 30 as the date of the election, quot; spokesman Farid Ayar told Reuters on Sunday.' b'That photo above is no longer accurate. I #39;ve shaven my head to protest the firing of ... Kevin from The Apprentice. Dude has a bachelor #39;s from Penn, an MBA from Emory and is working on a law degree from the '] ---- Corresponding labels: [1 0 0 1]
As you can see from the above, the training data is in batches of descriptions and their corresponding labels.
Now, we can use Keras TextVectorization layer to handle all required text preprocessing. It will convert the texts into tokens, convert them into sequences, padd the sequences. It also removes punctuations and lower the case.
That's all it does by default.
max_features = 10000
text_vectorizer = tf.keras.layers.TextVectorization(max_tokens=max_features)
After creating the layer, we can use adapt
to pass the dataset throught it. Notice that we use lambda function
to get the description separated from the label.
text_vectorizer.adapt(train_data.map(lambda description, label : description))
We can get the vocabulary. Vocabulary is the list of individual words making up a particular sentence.
vocab = text_vectorizer.get_vocabulary()
vocab[:10]
['', '[UNK]', 'the', 'a', 'to', 'of', 'in', 'and', 'on', 'for']
Let's pass some new sentences to text_vectorizer
. The vectorized sequences will be padded with the maximum sentences, but if you want to hava fixed size, you can set the output_sequence_length
to any value in the layer initialization.
sample_news = ['This weekend there is a sport match between Man U and Fc Barcelona',
'Tesla has unveiled its humanoid robot that appeared dancing during the show!']
vectorized_news = text_vectorizer(sample_news)
vectorized_news.numpy()
array([[ 40, 494, 186, 16, 3, 1567, 570, 159, 370, 1, 7, 7486, 2556], [ 1, 20, 876, 13, 1, 4845, 10, 1273, 1, 160, 2, 532, 0]])
If you can look on the above tensors, the second sentence was padded with 0. Also the words Tesla
and humanoid
have an indice of 1 because they were not a part of the training data (that we adapted to the text_vectorizer
).
So this means that the indice 1 is reserved for all words that are new to the layer. In the vocabulary, these kind of words are replaced by UNK
.
We will use text_vectorizer
as part of the model.
2.3 Creating and Training the Model¶
We are going to create a Keras Sequential model that takes the texts input and output the class of the input texts.
The model is going to be made of the following layers:
TextVectorization layer
for texts preprocessing.Embedding layer
for representing the tokens into a trainable feature vector of a high dimensional space. Because the feature vector is trainable, after training the words that have the same semantic meaning will end up having the same vectors (and pointing to the same directions).A Conv1D
for processing the sequences. Yes, Convnets can be used for texts processing. We will start off with only Convnets, and later use RNNs as well.Dense layer
for classification purpose. It takes the vector and convert it into a single logit output.
input_dim = len(text_vectorizer.get_vocabulary())
input_dim
10000
model = tf.keras.Sequential([
text_vectorizer,
tf.keras.layers.Embedding(input_dim=input_dim, output_dim=64, mask_zero=True),
tf.keras.layers.Conv1D(64, 5, activation='relu'),
tf.keras.layers.MaxPooling1D(),
tf.keras.layers.Conv1D(64, 5, activation='relu'),
tf.keras.layers.GlobalMaxPool1D(),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(4, activation='softmax')
])
Getting the model summary
model.summary()
Model: "sequential_7" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= text_vectorization (TextVect (None, None) 0 _________________________________________________________________ embedding_7 (Embedding) (None, None, 64) 640000 _________________________________________________________________ conv1d_8 (Conv1D) (None, None, 64) 20544 _________________________________________________________________ max_pooling1d_4 (MaxPooling1 (None, None, 64) 0 _________________________________________________________________ conv1d_9 (Conv1D) (None, None, 64) 20544 _________________________________________________________________ global_max_pooling1d_1 (Glob (None, 64) 0 _________________________________________________________________ dense_14 (Dense) (None, 32) 2080 _________________________________________________________________ dropout_7 (Dropout) (None, 32) 0 _________________________________________________________________ dense_15 (Dense) (None, 4) 132 ================================================================= Total params: 683,300 Trainable params: 683,300 Non-trainable params: 0 _________________________________________________________________
We can also plot the model with Keras util's plot_model
.
from tensorflow.keras.utils import plot_model
plot_model(model)
# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
batch_size = 32
train_steps = int(len(train_data)/batch_size)
val_steps = int(len(val_data)/batch_size)
# Train the model
history = model.fit(train_data,
epochs=25,
validation_data=val_data,
steps_per_epoch=train_steps,
validation_steps=val_steps
)
Epoch 1/25 105/105 [==============================] - 3s 23ms/step - loss: 1.3313 - accuracy: 0.3470 - val_loss: 1.0612 - val_accuracy: 0.4753 Epoch 2/25 105/105 [==============================] - 2s 22ms/step - loss: 0.9455 - accuracy: 0.5565 - val_loss: 0.7538 - val_accuracy: 0.7368 Epoch 3/25 105/105 [==============================] - 2s 21ms/step - loss: 0.6512 - accuracy: 0.7580 - val_loss: 0.4937 - val_accuracy: 0.8240 Epoch 4/25 105/105 [==============================] - 2s 21ms/step - loss: 0.4899 - accuracy: 0.8339 - val_loss: 0.4605 - val_accuracy: 0.8339 Epoch 5/25 105/105 [==============================] - 2s 21ms/step - loss: 0.4775 - accuracy: 0.8318 - val_loss: 0.4126 - val_accuracy: 0.8586 Epoch 6/25 105/105 [==============================] - 2s 21ms/step - loss: 0.4359 - accuracy: 0.8518 - val_loss: 0.3818 - val_accuracy: 0.8783 Epoch 7/25 105/105 [==============================] - 2s 22ms/step - loss: 0.4106 - accuracy: 0.8717 - val_loss: 0.3710 - val_accuracy: 0.8799 Epoch 8/25 105/105 [==============================] - 2s 21ms/step - loss: 0.4174 - accuracy: 0.8679 - val_loss: 0.3561 - val_accuracy: 0.8980 Epoch 9/25 105/105 [==============================] - 2s 21ms/step - loss: 0.3927 - accuracy: 0.8762 - val_loss: 0.3509 - val_accuracy: 0.8964 Epoch 10/25 105/105 [==============================] - 2s 21ms/step - loss: 0.4001 - accuracy: 0.8687 - val_loss: 0.3444 - val_accuracy: 0.8865 Epoch 11/25 105/105 [==============================] - 2s 21ms/step - loss: 0.3778 - accuracy: 0.8815 - val_loss: 0.3540 - val_accuracy: 0.8914 Epoch 12/25 105/105 [==============================] - 2s 21ms/step - loss: 0.3796 - accuracy: 0.8815 - val_loss: 0.3159 - val_accuracy: 0.8997 Epoch 13/25 105/105 [==============================] - 2s 21ms/step - loss: 0.3678 - accuracy: 0.8783 - val_loss: 0.3173 - val_accuracy: 0.9013 Epoch 14/25 105/105 [==============================] - 2s 22ms/step - loss: 0.3780 - accuracy: 0.8804 - val_loss: 0.3228 - val_accuracy: 0.9013 Epoch 15/25 105/105 [==============================] - 2s 21ms/step - loss: 0.3827 - accuracy: 0.8815 - val_loss: 0.2875 - val_accuracy: 0.8997 Epoch 16/25 105/105 [==============================] - 2s 21ms/step - loss: 0.3578 - accuracy: 0.8857 - val_loss: 0.2855 - val_accuracy: 0.9161 Epoch 17/25 105/105 [==============================] - 2s 21ms/step - loss: 0.3807 - accuracy: 0.8789 - val_loss: 0.2834 - val_accuracy: 0.9128 Epoch 18/25 105/105 [==============================] - 2s 21ms/step - loss: 0.3572 - accuracy: 0.8875 - val_loss: 0.3027 - val_accuracy: 0.9013 Epoch 19/25 105/105 [==============================] - 2s 22ms/step - loss: 0.3630 - accuracy: 0.8815 - val_loss: 0.2976 - val_accuracy: 0.9030 Epoch 20/25 105/105 [==============================] - 2s 22ms/step - loss: 0.3646 - accuracy: 0.8851 - val_loss: 0.2832 - val_accuracy: 0.9112 Epoch 21/25 105/105 [==============================] - 2s 22ms/step - loss: 0.3542 - accuracy: 0.8908 - val_loss: 0.2755 - val_accuracy: 0.9145 Epoch 22/25 105/105 [==============================] - 2s 21ms/step - loss: 0.3363 - accuracy: 0.8884 - val_loss: 0.2807 - val_accuracy: 0.9145 Epoch 23/25 105/105 [==============================] - 2s 21ms/step - loss: 0.2961 - accuracy: 0.9080 - val_loss: 0.2766 - val_accuracy: 0.9178 Epoch 24/25 105/105 [==============================] - 2s 21ms/step - loss: 0.3437 - accuracy: 0.8911 - val_loss: 0.2758 - val_accuracy: 0.9145 Epoch 25/25 105/105 [==============================] - 2s 21ms/step - loss: 0.3452 - accuracy: 0.8908 - val_loss: 0.2707 - val_accuracy: 0.9145
2.4 Visualizing the Results¶
It is always a good practice to plot the model results. When doing error analysis, plotting learning curve becomes helpful.
Let's plot the loss and accuracy over on the course of epochs.
import matplotlib.pyplot as plt
# function to plot accuracy and loss
def plot_acc_loss(history):
model_history = history.history
acc = model_history['accuracy']
val_acc = model_history['val_accuracy']
loss = model_history['loss']
val_loss = model_history['val_loss']
epochs = history.epoch
plt.figure(figsize=(10,5))
plt.plot(epochs, acc, 'r', label='Training Accuracy')
plt.plot(epochs, val_acc, 'g', label='Validation Accuracy')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc=0)
# Create a new figure with plt.figure()
plt.figure()
plt.figure(figsize=(10,5))
plt.plot(epochs, loss, 'b', label='Training Loss')
plt.plot(epochs, val_loss, 'y', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(loc=0)
plt.show()
plot_acc_loss(history)
<Figure size 432x288 with 0 Axes>
That's not pretty bad. Let's combine CNNs and LSTMs to see if we can improve the results further.
3. Combining CNNs and RNNs for Text Classification¶
3.1 Convnets and RNNs Model¶
Just like how CNNs extract the features in images, they are also able to learn the hierarchical representations of words, phrases and sentences in order to understand text
(Yann LeCun, 2016). The downside of using CNNs alone is that they don't maintain the order of timesteps.
We can also introduce Recurrent Neural Networks (LSTMs specifically) to give the network the sequence handling capability. So, Conv1D
can extract meaningful words in input sentences, and RNNs can preverve the sequence contained in the output features from the CNNs.
I am only going to add one bidirectional LSTM layer to handle the CNNs features from both directions.
conv_rnn_model = tf.keras.Sequential([
text_vectorizer,
tf.keras.layers.Embedding(input_dim=input_dim, output_dim=64, mask_zero=True),
tf.keras.layers.Conv1D(64, 5, activation='relu'),
tf.keras.layers.MaxPooling1D(),
tf.keras.layers.Conv1D(64, 5, activation='relu'),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(4, activation='softmax')
])
conv_rnn_model.summary()
Model: "sequential_9" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= text_vectorization (TextVect (None, None) 0 _________________________________________________________________ embedding_9 (Embedding) (None, None, 64) 640000 _________________________________________________________________ conv1d_12 (Conv1D) (None, None, 64) 20544 _________________________________________________________________ max_pooling1d_6 (MaxPooling1 (None, None, 64) 0 _________________________________________________________________ conv1d_13 (Conv1D) (None, None, 64) 20544 _________________________________________________________________ bidirectional_1 (Bidirection (None, 128) 66048 _________________________________________________________________ dense_18 (Dense) (None, 32) 4128 _________________________________________________________________ dropout_9 (Dropout) (None, 32) 0 _________________________________________________________________ dense_19 (Dense) (None, 4) 132 ================================================================= Total params: 751,396 Trainable params: 751,396 Non-trainable params: 0 _________________________________________________________________
# Compile the model
conv_rnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
batch_size = 32
train_steps = int(len(train_data)/batch_size)
val_steps = int(len(val_data)/batch_size)
# Train the model
history = conv_rnn_model.fit(train_data,
epochs=25,
validation_data=val_data,
steps_per_epoch=train_steps,
validation_steps=val_steps
)
Epoch 1/25 105/105 [==============================] - 9s 53ms/step - loss: 1.3051 - accuracy: 0.3119 - val_loss: 0.9660 - val_accuracy: 0.5164 Epoch 2/25 105/105 [==============================] - 5s 47ms/step - loss: 0.8398 - accuracy: 0.5753 - val_loss: 0.6016 - val_accuracy: 0.7122 Epoch 3/25 105/105 [==============================] - 5s 45ms/step - loss: 0.6768 - accuracy: 0.6732 - val_loss: 0.5675 - val_accuracy: 0.7023 Epoch 4/25 105/105 [==============================] - 5s 45ms/step - loss: 0.6061 - accuracy: 0.7125 - val_loss: 0.4932 - val_accuracy: 0.8092 Epoch 5/25 105/105 [==============================] - 5s 45ms/step - loss: 0.5031 - accuracy: 0.8176 - val_loss: 0.3908 - val_accuracy: 0.8618 Epoch 6/25 105/105 [==============================] - 5s 45ms/step - loss: 0.4422 - accuracy: 0.8530 - val_loss: 0.3892 - val_accuracy: 0.8651 Epoch 7/25 105/105 [==============================] - 5s 46ms/step - loss: 0.4099 - accuracy: 0.8682 - val_loss: 0.3633 - val_accuracy: 0.8766 Epoch 8/25 105/105 [==============================] - 5s 46ms/step - loss: 0.3913 - accuracy: 0.8723 - val_loss: 0.3623 - val_accuracy: 0.8931 Epoch 9/25 105/105 [==============================] - 5s 44ms/step - loss: 0.4082 - accuracy: 0.8729 - val_loss: 0.3184 - val_accuracy: 0.8914 Epoch 10/25 105/105 [==============================] - 5s 46ms/step - loss: 0.3846 - accuracy: 0.8774 - val_loss: 0.3063 - val_accuracy: 0.9062 Epoch 11/25 105/105 [==============================] - 5s 45ms/step - loss: 0.3521 - accuracy: 0.8893 - val_loss: 0.3091 - val_accuracy: 0.8947 Epoch 12/25 105/105 [==============================] - 5s 45ms/step - loss: 0.3430 - accuracy: 0.8860 - val_loss: 0.3341 - val_accuracy: 0.8832 Epoch 13/25 105/105 [==============================] - 5s 47ms/step - loss: 0.3542 - accuracy: 0.8824 - val_loss: 0.3040 - val_accuracy: 0.9013 Epoch 14/25 105/105 [==============================] - 5s 47ms/step - loss: 0.3754 - accuracy: 0.8783 - val_loss: 0.2929 - val_accuracy: 0.8931 Epoch 15/25 105/105 [==============================] - 5s 44ms/step - loss: 0.3500 - accuracy: 0.8911 - val_loss: 0.2831 - val_accuracy: 0.8997 Epoch 16/25 105/105 [==============================] - 5s 45ms/step - loss: 0.3486 - accuracy: 0.8872 - val_loss: 0.2908 - val_accuracy: 0.9079 Epoch 17/25 105/105 [==============================] - 5s 48ms/step - loss: 0.3634 - accuracy: 0.8869 - val_loss: 0.2785 - val_accuracy: 0.9013 Epoch 18/25 105/105 [==============================] - 5s 45ms/step - loss: 0.3377 - accuracy: 0.8923 - val_loss: 0.2645 - val_accuracy: 0.9095 Epoch 19/25 105/105 [==============================] - 5s 46ms/step - loss: 0.3374 - accuracy: 0.8881 - val_loss: 0.2703 - val_accuracy: 0.9145 Epoch 20/25 105/105 [==============================] - 5s 46ms/step - loss: 0.3429 - accuracy: 0.8872 - val_loss: 0.2572 - val_accuracy: 0.9095 Epoch 21/25 105/105 [==============================] - 5s 45ms/step - loss: 0.3282 - accuracy: 0.8955 - val_loss: 0.2549 - val_accuracy: 0.9030 Epoch 22/25 105/105 [==============================] - 5s 45ms/step - loss: 0.3187 - accuracy: 0.8914 - val_loss: 0.2502 - val_accuracy: 0.9095 Epoch 23/25 105/105 [==============================] - 5s 45ms/step - loss: 0.3061 - accuracy: 0.8970 - val_loss: 0.2420 - val_accuracy: 0.9161 Epoch 24/25 105/105 [==============================] - 5s 47ms/step - loss: 0.3204 - accuracy: 0.8988 - val_loss: 0.2544 - val_accuracy: 0.9128 Epoch 25/25 105/105 [==============================] - 5s 46ms/step - loss: 0.3244 - accuracy: 0.8973 - val_loss: 0.2571 - val_accuracy: 0.9062
3.2 Visualizing the Results¶
Plotting the model results...
plot_acc_loss(history)
<Figure size 432x288 with 0 Axes>
There is not a much difference with the former, but the results that we get by combining Convnets and RNNs are away good than using RNNs only. And in addition to that, it is cheaper to train it.
For now, let's test our later network on new texts.
3.3 Performing Inference on New Texts¶
def predict(model, sample_news, class_names):
# Convert sample news into array
sample_news = np.array(sample_news)
# Predict the news type
preds = model.predict(sample_news)
pred_class = np.argmax(preds[0])
print(f'predicted class: {pred_class} \nPredicted Class name: {class_names[pred_class]}')
sample_news = ['Tesla, a self driving car company is also planning to make a humanoid robot. This humanoid robot appeared dancing in the latest Tesla AI day']
predict(conv_rnn_model, sample_news, class_names)
predicted class: 3 Predicted Class name: Sci/Tech
sample_news = ["In the last weeks, there has been many transfer suprises in footbal. Ronaldo went back to Old Trafford, "
"while Messi went to Paris Saint Germain to join his former colleague Neymar."
"We can't wait to see these two clubs will perform in upcoming leagues"]
predict(conv_rnn_model, sample_news, class_names)
predicted class: 1 Predicted Class name: Sports
sample_news = ["In the latest business news: The tech giant NVIDIA has acquired ARM, a microproccessor company"]
predict(conv_rnn_model, sample_news, class_names)
predicted class: 2 Predicted Class name: Business
4. Further Learning¶
This notebook has been about using Convnets for text classification. We also brought RNNs into the picture.
If you would like to learn more about sequence models, I recommend the following courses:
- Deep Learning Specialization - Course 5 of Sequence models. This course is available on Coursera and Youtube.
- Intro to Deep Learning MIT - Lecture 2 of Recurrent Neural Networks, available on Youtube.
For ConvNets and RNNs, I recommend the following two papers: