This notebook was created by Jean de Dieu Nyandwi for the love of machine learning community. For any feedback, errors or suggestion, he can be reached on email (johnjw7084 at gmail dot com), Twitter, or LinkedIn.

Using Convolutional Neural Networks for Texts Classification¶

Contents¶

1. Intro Convolutional Neural Networks for Texts
2. CNN for Texts in Practice: News Classification
3.Combining ConvNets and RNNs
4. Further Learning

1. Intro to CNNs for Texts¶

Convolutional neural networks are remarkably a suitable network architectures for images, or computer vision tasks in general because of their ability to extract features from images using filters.

A recent studies(Text Understanding from Scratch by Xiang Zhang & Yann LeCun, Recurrent Convolutional Neural Networks for Text Classification by Siwei Lai & Liheng Xu & Kang Liu & Jun Zhao) has shown that CNNs can also perform wel on texts just as they do on images.

Quoting LeCun paper,

CNNs don't require knowledge of words. And that means it is fine to pass characters to the network. We have previously been using words.
CNNs do not require knowledge of syntax or semantic structures.
While RNNs can be expensive to run, CNNs are away cheaper.

In text applications or any sequential datasets in general, we use Conv1D instead of Conv2D. The design of Conv1D is quite similar to Conv2D, the only difference is the dimension of the input. So, it is made of convolution and pooling layers.

Below is a typical CNN inspired design for texts. Source: Text Understanding from Scratch, Yann LeCun.

Here is the overview of the main parameters of Conv1D.

If you worked with CNNs before, you probably don't see any new thing. filters specify the number of filters, kernel_size denotes the size of each filter (remember it is one dimension), strides is the number of steps the filter will take as it go through the input texts, padding for ading zeros (valid for not padding, same for padding). And activation denotes the activation function to use.

tf.keras.layers.Conv1D(
    filters,
    kernel_size,
    strides=1,
    padding="valid",
    activation=None,
)

2. CNN's for Texts In Practice: News Classification¶

2.1 Getting the Data¶

In this practice, we will use news dataset, ag_news_subset that is available on TensorFlow datasets.

TensorFlow datasets is a collection of awesome datasets that can be used right away with little preparations.

AG is a collection of more than 1 million news articles gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity.

The AG dataset contains 4 classes that are: World(0), Sports(1), Business(2), Sci/Tech(3). The total number of training samples is 120,000 and testing 7,600. Each class contains 30,000 training samples and 1,900 testing samples.

You can learn more about the dataset here, or read the orginal paper that used it to explore the use of character-level convolutional networks (ConvNets) for text classification by Xiang Zhang, Junbo Zhao, and Yann LeCun.

Let's get the dataset from TensorFlow datasets.

In [ ]:

            
                Copied!
                
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
import numpy as np
import pandas as pd

The dataset that we are going to download has only one version so far. Let's specify it so that when it is updated, our lab will not be affected. If loading the data fails, run the cell again. It happens sometime when loading datasets from TF datasets.

Orginally, the training set contains 120.000 news samples, whereas test set contain 7600 samples. Let's take 10% percent of news from training set to validation set so we can increase it a little bit.

In [ ]:

            
                Copied!
                
(train_data, val_data), info = tfds.load('ag_news_subset:1.0.0', #version 1.0.0
                                         split=['train[:90%]', 'train[90%:]+test'],
                                         with_info=True, 
                                         as_supervised=True
)
(train_data, val_data), info = tfds.load('ag_news_subset:1.0.0', #version 1.0.0
                                         split=['train[:90%]', 'train[90%:]+test'],
                                         with_info=True, 
                                         as_supervised=True
)

We can display the info that we loaded with dataset.

In [ ]:

            
                Copied!
                
print(info)
print(info)

tfds.core.DatasetInfo(
    name='ag_news_subset',
    version=1.0.0,
    description='AG is a collection of more than 1 million news articles.
News articles have been gathered from more than 2000  news sources by ComeToMyHead in more than 1 year of activity.
ComeToMyHead is an academic news search engine which has been running since July, 2004.
The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc),
information retrieval (ranking, search, etc), xml, data compression, data streaming,
and any other non-commercial activity.
For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .

The AG's news topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the dataset above.
It is used as a text classification benchmark in the following paper:
Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus.
Each class contains 30,000 training samples and 1,900 testing samples.
The total number of training samples is 120,000 and testing 7,600.',
    homepage='https://arxiv.org/abs/1509.01626',
    features=FeaturesDict({
        'description': Text(shape=(), dtype=tf.string),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=4),
        'title': Text(shape=(), dtype=tf.string),
    }),
    total_num_examples=127600,
    splits={
        'test': 7600,
        'train': 120000,
    },
    supervised_keys=('description', 'label'),
    citation="""@misc{zhang2015characterlevel,
        title={Character-level Convolutional Networks for Text Classification},
        author={Xiang Zhang and Junbo Zhao and Yann LeCun},
        year={2015},
        eprint={1509.01626},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
    }""",
    redistribution_info=,
)

In [ ]:

            
                Copied!
                
# Displaying the classes

class_names = info.features['label'].names
num_classes = info.features['label'].num_classes

print(f'The news are grouped into {num_classes} classes that are :{class_names}')
# Displaying the classes

class_names = info.features['label'].names
num_classes = info.features['label'].num_classes

print(f'The news are grouped into {num_classes} classes that are :{class_names}')

The news are grouped into 4 classes that are :['World', 'Sports', 'Business', 'Sci/Tech']

In [ ]:

            
                Copied!
                
num_train = info.splits['train'].num_examples
num_val = info.splits['test'].num_examples

print(f'The number of training samples: {num_train} \nThe number of validation samples: {num_val}')
num_train = info.splits['train'].num_examples
num_val = info.splits['test'].num_examples

print(f'The number of training samples: {num_train} \nThe number of validation samples: {num_val}')

The number of training samples: 120000 
The number of validation samples: 7600

We can also display the first 10 news samples. We can use tfds.as_dataframe to display them as dataframe.

In [ ]:

            
                Copied!
                
news_df = tfds.as_dataframe(train_data.take(10), info)

news_df.head(10)
news_df = tfds.as_dataframe(train_data.take(10), info)

news_df.head(10)

Out[ ]:

	description	label
0	b'AMD #39;s new dual-core Opteron chip is desi...	3
1	b'Reuters - Major League Baseball\\Monday anno...	1
2	b'President Bush #39;s quot;revenue-neutral q...	2
3	b'Britain will run out of leading scientists u...	3
4	b'London, England (Sports Network) - England m...	1
5	b'TOKYO - Sony Corp. is banking on the \\$3 bi...	0
6	b'Giant pandas may well prefer bamboo to lapto...	3
7	b'VILNIUS, Lithuania - Lithuania #39;s main pa...	0
8	b'Witnesses in the trial of a US soldier charg...	0
9	b'Dan Olsen of Ponte Vedra Beach, Fla., shot a...	1

Let's display some full news.

In [ ]:

            
                Copied!
                
for i in range (0,4):

  print(f"Sample news {i}\n \
  Label: {news_df['label'][i]} {(class_names[i])}\n \
  Description: {news_df['description'][i]}\n----------\n")
for i in range (0,4):

  print(f"Sample news {i}\n \
  Label: {news_df['label'][i]} {(class_names[i])}\n \
  Description: {news_df['description'][i]}\n----------\n")

Sample news 0
   Label: 3 World
   Description: b'AMD #39;s new dual-core Opteron chip is designed mainly for corporate computing applications, including databases, Web services, and financial transactions.'
----------

Sample news 1
   Label: 1 Sports
   Description: b'Reuters - Major League Baseball\\Monday announced a decision on the appeal filed by Chicago Cubs\\pitcher Kerry Wood regarding a suspension stemming from an\\incident earlier this season.'
----------

Sample news 2
   Label: 2 Business
   Description: b'President Bush #39;s  quot;revenue-neutral quot; tax reform needs losers to balance its winners, and people claiming the federal deduction for state and local taxes may be in administration planners #39; sights, news reports say.'
----------

Sample news 3
   Label: 3 Sci/Tech
   Description: b'Britain will run out of leading scientists unless science education is improved, says Professor Colin Pillinger.'
----------

2.2 Preparing the Data¶

We can not feed the raw texts that we loaded from the tensorflow datasets to the model. We have to do some preps works.

One of the major thing we have to do is to vectorize the texts or convert them into numeric tokens.

We will use TextVectorizer, but before we get there, let's first shuffle and batch the training data.

For validation data and test data, we don't shuffle. We only batch it.

In [ ]:

            
                Copied!
                
buffer_size = 1000
batch_size = 32

train_data = train_data.shuffle(buffer_size)
train_data = train_data.batch(batch_size).prefetch(1)
val_data = val_data.batch(batch_size).prefetch(1)
buffer_size = 1000
batch_size = 32

train_data = train_data.shuffle(buffer_size)
train_data = train_data.batch(batch_size).prefetch(1)
val_data = val_data.batch(batch_size).prefetch(1)

In [ ]:

            
                Copied!
                
for news, label in train_data.take(1):

  print(f'Sample news\n----\n {news.numpy()[:4]} \n----\nCorresponding labels: {label.numpy()[:4]}')
for news, label in train_data.take(1):

  print(f'Sample news\n----\n {news.numpy()[:4]} \n----\nCorresponding labels: {label.numpy()[:4]}')

Sample news
----
 [b'About seven or eight years ago, Harry Shatel had that conversation with Charlie Weis about the future. Shatel, the Morristown High School baseball coach, doesn #39;t remember the exact date '
 b"AFP - Australia's foreign minister will pay a rare visit to North Korea this week for talks on its nuclear programme after creating a stir here by warning a North Korean missile would be able to hit Sydney."
 b' quot;The Electoral Commission set the date of January 30 as the date of the election, quot; spokesman Farid Ayar told Reuters on Sunday.'
 b'That photo above is no longer accurate. I #39;ve shaven my head to protest the firing of ... Kevin from The Apprentice. Dude has a bachelor #39;s from Penn, an MBA from Emory and is working on a law degree from the '] 
----
Corresponding labels: [1 0 0 1]

As you can see from the above, the training data is in batches of descriptions and their corresponding labels.

Now, we can use Keras TextVectorization layer to handle all required text preprocessing. It will convert the texts into tokens, convert them into sequences, padd the sequences. It also removes punctuations and lower the case.

That's all it does by default.

In [ ]:

            
                Copied!
                
max_features = 10000

text_vectorizer = tf.keras.layers.TextVectorization(max_tokens=max_features)
max_features = 10000

text_vectorizer = tf.keras.layers.TextVectorization(max_tokens=max_features)

After creating the layer, we can use adapt to pass the dataset throught it. Notice that we use lambda function to get the description separated from the label.

In [ ]:

            
                Copied!
                
text_vectorizer.adapt(train_data.map(lambda description, label : description))
text_vectorizer.adapt(train_data.map(lambda description, label : description))

We can get the vocabulary. Vocabulary is the list of individual words making up a particular sentence.

In [ ]:

            
                Copied!
                
vocab = text_vectorizer.get_vocabulary()
vocab[:10]
vocab = text_vectorizer.get_vocabulary()
vocab[:10]

Out[ ]:

['', '[UNK]', 'the', 'a', 'to', 'of', 'in', 'and', 'on', 'for']

Let's pass some new sentences to text_vectorizer. The vectorized sequences will be padded with the maximum sentences, but if you want to hava fixed size, you can set the output_sequence_length to any value in the layer initialization.

In [ ]:

            
                Copied!
                
sample_news = ['This weekend there is a sport match between Man U and Fc Barcelona',
               'Tesla has unveiled its humanoid robot that appeared dancing during the show!']
sample_news = ['This weekend there is a sport match between Man U and Fc Barcelona',
               'Tesla has unveiled its humanoid robot that appeared dancing during the show!']

In [ ]:

            
                Copied!
                
vectorized_news = text_vectorizer(sample_news)
vectorized_news.numpy()
vectorized_news = text_vectorizer(sample_news)
vectorized_news.numpy()

Out[ ]:

array([[  40,  494,  186,   16,    3, 1567,  570,  159,  370,    1,    7,
        7486, 2556],
       [   1,   20,  876,   13,    1, 4845,   10, 1273,    1,  160,    2,
         532,    0]])

If you can look on the above tensors, the second sentence was padded with 0. Also the words Tesla and humanoid have an indice of 1 because they were not a part of the training data (that we adapted to the text_vectorizer).

So this means that the indice 1 is reserved for all words that are new to the layer. In the vocabulary, these kind of words are replaced by UNK.

We will use text_vectorizer as part of the model.

2.3 Creating and Training the Model¶

We are going to create a Keras Sequential model that takes the texts input and output the class of the input texts.

The model is going to be made of the following layers:

TextVectorization layer for texts preprocessing.
Embedding layer for representing the tokens into a trainable feature vector of a high dimensional space. Because the feature vector is trainable, after training the words that have the same semantic meaning will end up having the same vectors (and pointing to the same directions).
A Conv1D for processing the sequences. Yes, Convnets can be used for texts processing. We will start off with only Convnets, and later use RNNs as well.
Dense layer for classification purpose. It takes the vector and convert it into a single logit output.

In [ ]:

            
                Copied!
                
input_dim = len(text_vectorizer.get_vocabulary())
input_dim
input_dim = len(text_vectorizer.get_vocabulary())
input_dim

Out[ ]:

In [ ]:

            
                Copied!
                
                    
                    
                
                

        
model = tf.keras.Sequential([
    text_vectorizer,
    tf.keras.layers.Embedding(input_dim=input_dim, output_dim=64, mask_zero=True),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.MaxPooling1D(),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.GlobalMaxPool1D(),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(4, activation='softmax')
])
model = tf.keras.Sequential([
    text_vectorizer,
    tf.keras.layers.Embedding(input_dim=input_dim, output_dim=64, mask_zero=True),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.MaxPooling1D(),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.GlobalMaxPool1D(),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(4, activation='softmax')
])

Getting the model summary

In [ ]:

            
                Copied!
                
model.summary()
model.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
text_vectorization (TextVect (None, None)              0         
_________________________________________________________________
embedding_7 (Embedding)      (None, None, 64)          640000    
_________________________________________________________________
conv1d_8 (Conv1D)            (None, None, 64)          20544     
_________________________________________________________________
max_pooling1d_4 (MaxPooling1 (None, None, 64)          0         
_________________________________________________________________
conv1d_9 (Conv1D)            (None, None, 64)          20544     
_________________________________________________________________
global_max_pooling1d_1 (Glob (None, 64)                0         
_________________________________________________________________
dense_14 (Dense)             (None, 32)                2080      
_________________________________________________________________
dropout_7 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_15 (Dense)             (None, 4)                 132       
=================================================================
Total params: 683,300
Trainable params: 683,300
Non-trainable params: 0
_________________________________________________________________

We can also plot the model with Keras util's plot_model.

In [ ]:

            
                Copied!
                
from tensorflow.keras.utils import plot_model

plot_model(model)
from tensorflow.keras.utils import plot_model

plot_model(model)

Out[ ]:

In [ ]:

            
                Copied!
                
# Compile the model

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Compile the model

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [ ]:

            
                Copied!
                
batch_size = 32

train_steps = int(len(train_data)/batch_size)
val_steps = int(len(val_data)/batch_size)
batch_size = 32

train_steps = int(len(train_data)/batch_size)
val_steps = int(len(val_data)/batch_size)

In [ ]:

            
                Copied!
                
                    
                    
                
                

        
# Train the model

history = model.fit(train_data, 
                    epochs=25,
                    validation_data=val_data,
                    steps_per_epoch=train_steps,
                    validation_steps=val_steps
)
# Train the model

history = model.fit(train_data, 
                    epochs=25,
                    validation_data=val_data,
                    steps_per_epoch=train_steps,
                    validation_steps=val_steps
)

Epoch 1/25
105/105 [==============================] - 3s 23ms/step - loss: 1.3313 - accuracy: 0.3470 - val_loss: 1.0612 - val_accuracy: 0.4753
Epoch 2/25
105/105 [==============================] - 2s 22ms/step - loss: 0.9455 - accuracy: 0.5565 - val_loss: 0.7538 - val_accuracy: 0.7368
Epoch 3/25
105/105 [==============================] - 2s 21ms/step - loss: 0.6512 - accuracy: 0.7580 - val_loss: 0.4937 - val_accuracy: 0.8240
Epoch 4/25
105/105 [==============================] - 2s 21ms/step - loss: 0.4899 - accuracy: 0.8339 - val_loss: 0.4605 - val_accuracy: 0.8339
Epoch 5/25
105/105 [==============================] - 2s 21ms/step - loss: 0.4775 - accuracy: 0.8318 - val_loss: 0.4126 - val_accuracy: 0.8586
Epoch 6/25
105/105 [==============================] - 2s 21ms/step - loss: 0.4359 - accuracy: 0.8518 - val_loss: 0.3818 - val_accuracy: 0.8783
Epoch 7/25
105/105 [==============================] - 2s 22ms/step - loss: 0.4106 - accuracy: 0.8717 - val_loss: 0.3710 - val_accuracy: 0.8799
Epoch 8/25
105/105 [==============================] - 2s 21ms/step - loss: 0.4174 - accuracy: 0.8679 - val_loss: 0.3561 - val_accuracy: 0.8980
Epoch 9/25
105/105 [==============================] - 2s 21ms/step - loss: 0.3927 - accuracy: 0.8762 - val_loss: 0.3509 - val_accuracy: 0.8964
Epoch 10/25
105/105 [==============================] - 2s 21ms/step - loss: 0.4001 - accuracy: 0.8687 - val_loss: 0.3444 - val_accuracy: 0.8865
Epoch 11/25
105/105 [==============================] - 2s 21ms/step - loss: 0.3778 - accuracy: 0.8815 - val_loss: 0.3540 - val_accuracy: 0.8914
Epoch 12/25
105/105 [==============================] - 2s 21ms/step - loss: 0.3796 - accuracy: 0.8815 - val_loss: 0.3159 - val_accuracy: 0.8997
Epoch 13/25
105/105 [==============================] - 2s 21ms/step - loss: 0.3678 - accuracy: 0.8783 - val_loss: 0.3173 - val_accuracy: 0.9013
Epoch 14/25
105/105 [==============================] - 2s 22ms/step - loss: 0.3780 - accuracy: 0.8804 - val_loss: 0.3228 - val_accuracy: 0.9013
Epoch 15/25
105/105 [==============================] - 2s 21ms/step - loss: 0.3827 - accuracy: 0.8815 - val_loss: 0.2875 - val_accuracy: 0.8997
Epoch 16/25
105/105 [==============================] - 2s 21ms/step - loss: 0.3578 - accuracy: 0.8857 - val_loss: 0.2855 - val_accuracy: 0.9161
Epoch 17/25
105/105 [==============================] - 2s 21ms/step - loss: 0.3807 - accuracy: 0.8789 - val_loss: 0.2834 - val_accuracy: 0.9128
Epoch 18/25
105/105 [==============================] - 2s 21ms/step - loss: 0.3572 - accuracy: 0.8875 - val_loss: 0.3027 - val_accuracy: 0.9013
Epoch 19/25
105/105 [==============================] - 2s 22ms/step - loss: 0.3630 - accuracy: 0.8815 - val_loss: 0.2976 - val_accuracy: 0.9030
Epoch 20/25
105/105 [==============================] - 2s 22ms/step - loss: 0.3646 - accuracy: 0.8851 - val_loss: 0.2832 - val_accuracy: 0.9112
Epoch 21/25
105/105 [==============================] - 2s 22ms/step - loss: 0.3542 - accuracy: 0.8908 - val_loss: 0.2755 - val_accuracy: 0.9145
Epoch 22/25
105/105 [==============================] - 2s 21ms/step - loss: 0.3363 - accuracy: 0.8884 - val_loss: 0.2807 - val_accuracy: 0.9145
Epoch 23/25
105/105 [==============================] - 2s 21ms/step - loss: 0.2961 - accuracy: 0.9080 - val_loss: 0.2766 - val_accuracy: 0.9178
Epoch 24/25
105/105 [==============================] - 2s 21ms/step - loss: 0.3437 - accuracy: 0.8911 - val_loss: 0.2758 - val_accuracy: 0.9145
Epoch 25/25
105/105 [==============================] - 2s 21ms/step - loss: 0.3452 - accuracy: 0.8908 - val_loss: 0.2707 - val_accuracy: 0.9145

2.4 Visualizing the Results¶

It is always a good practice to plot the model results. When doing error analysis, plotting learning curve becomes helpful.

Let's plot the loss and accuracy over on the course of epochs.

In [ ]:

            
                Copied!
                
                    
                    
                
                

        
import matplotlib.pyplot as plt

# function to plot accuracy and loss

def plot_acc_loss(history):

  model_history = history.history
  acc = model_history['accuracy']
  val_acc = model_history['val_accuracy']
  loss = model_history['loss']
  val_loss = model_history['val_loss']
  epochs = history.epoch

  plt.figure(figsize=(10,5))
  plt.plot(epochs, acc, 'r', label='Training Accuracy')
  plt.plot(epochs, val_acc, 'g', label='Validation Accuracy')
  plt.title('Training and validation accuracy')
  plt.xlabel('Epochs')
  plt.ylabel('Accuracy')
  plt.legend(loc=0)

  # Create a new figure with plt.figure()
  plt.figure()

  plt.figure(figsize=(10,5))
  plt.plot(epochs, loss, 'b', label='Training Loss')
  plt.plot(epochs, val_loss, 'y', label='Validation Loss')
  plt.title('Training and Validation Loss')
  plt.xlabel('Epochs')
  plt.ylabel('Loss')
  plt.legend(loc=0)
  plt.show()
import matplotlib.pyplot as plt

# function to plot accuracy and loss

def plot_acc_loss(history):

  model_history = history.history
  acc = model_history['accuracy']
  val_acc = model_history['val_accuracy']
  loss = model_history['loss']
  val_loss = model_history['val_loss']
  epochs = history.epoch

  plt.figure(figsize=(10,5))
  plt.plot(epochs, acc, 'r', label='Training Accuracy')
  plt.plot(epochs, val_acc, 'g', label='Validation Accuracy')
  plt.title('Training and validation accuracy')
  plt.xlabel('Epochs')
  plt.ylabel('Accuracy')
  plt.legend(loc=0)

  # Create a new figure with plt.figure()
  plt.figure()

  plt.figure(figsize=(10,5))
  plt.plot(epochs, loss, 'b', label='Training Loss')
  plt.plot(epochs, val_loss, 'y', label='Validation Loss')
  plt.title('Training and Validation Loss')
  plt.xlabel('Epochs')
  plt.ylabel('Loss')
  plt.legend(loc=0)
  plt.show()

In [ ]:

            
                Copied!
                
plot_acc_loss(history)
plot_acc_loss(history)

<Figure size 432x288 with 0 Axes>

That's not pretty bad. Let's combine CNNs and LSTMs to see if we can improve the results further.

3. Combining CNNs and RNNs for Text Classification¶

3.1 Convnets and RNNs Model¶

Just like how CNNs extract the features in images, they are also able to learn the hierarchical representations of words, phrases and sentences in order to understand text(Yann LeCun, 2016). The downside of using CNNs alone is that they don't maintain the order of timesteps.

We can also introduce Recurrent Neural Networks (LSTMs specifically) to give the network the sequence handling capability. So, Conv1D can extract meaningful words in input sentences, and RNNs can preverve the sequence contained in the output features from the CNNs.

I am only going to add one bidirectional LSTM layer to handle the CNNs features from both directions.

In [ ]:

            
                Copied!
                
                    
                    
                
                

        
conv_rnn_model = tf.keras.Sequential([
    text_vectorizer,
    tf.keras.layers.Embedding(input_dim=input_dim, output_dim=64, mask_zero=True),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.MaxPooling1D(),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(4, activation='softmax')
])
conv_rnn_model = tf.keras.Sequential([
    text_vectorizer,
    tf.keras.layers.Embedding(input_dim=input_dim, output_dim=64, mask_zero=True),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.MaxPooling1D(),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(4, activation='softmax')
])

In [ ]:

            
                Copied!
                
conv_rnn_model.summary()
conv_rnn_model.summary()

Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
text_vectorization (TextVect (None, None)              0         
_________________________________________________________________
embedding_9 (Embedding)      (None, None, 64)          640000    
_________________________________________________________________
conv1d_12 (Conv1D)           (None, None, 64)          20544     
_________________________________________________________________
max_pooling1d_6 (MaxPooling1 (None, None, 64)          0         
_________________________________________________________________
conv1d_13 (Conv1D)           (None, None, 64)          20544     
_________________________________________________________________
bidirectional_1 (Bidirection (None, 128)               66048     
_________________________________________________________________
dense_18 (Dense)             (None, 32)                4128      
_________________________________________________________________
dropout_9 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_19 (Dense)             (None, 4)                 132       
=================================================================
Total params: 751,396
Trainable params: 751,396
Non-trainable params: 0
_________________________________________________________________

In [ ]:

            
                Copied!
                
# Compile the model

conv_rnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Compile the model

conv_rnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [ ]:

            
                Copied!
                
batch_size = 32

train_steps = int(len(train_data)/batch_size)
val_steps = int(len(val_data)/batch_size)

# Train the model

history = conv_rnn_model.fit(train_data, 
                    epochs=25,
                    validation_data=val_data,
                    steps_per_epoch=train_steps,
                    validation_steps=val_steps
)
batch_size = 32

train_steps = int(len(train_data)/batch_size)
val_steps = int(len(val_data)/batch_size)

# Train the model

history = conv_rnn_model.fit(train_data, 
                    epochs=25,
                    validation_data=val_data,
                    steps_per_epoch=train_steps,
                    validation_steps=val_steps
)

Epoch 1/25
105/105 [==============================] - 9s 53ms/step - loss: 1.3051 - accuracy: 0.3119 - val_loss: 0.9660 - val_accuracy: 0.5164
Epoch 2/25
105/105 [==============================] - 5s 47ms/step - loss: 0.8398 - accuracy: 0.5753 - val_loss: 0.6016 - val_accuracy: 0.7122
Epoch 3/25
105/105 [==============================] - 5s 45ms/step - loss: 0.6768 - accuracy: 0.6732 - val_loss: 0.5675 - val_accuracy: 0.7023
Epoch 4/25
105/105 [==============================] - 5s 45ms/step - loss: 0.6061 - accuracy: 0.7125 - val_loss: 0.4932 - val_accuracy: 0.8092
Epoch 5/25
105/105 [==============================] - 5s 45ms/step - loss: 0.5031 - accuracy: 0.8176 - val_loss: 0.3908 - val_accuracy: 0.8618
Epoch 6/25
105/105 [==============================] - 5s 45ms/step - loss: 0.4422 - accuracy: 0.8530 - val_loss: 0.3892 - val_accuracy: 0.8651
Epoch 7/25
105/105 [==============================] - 5s 46ms/step - loss: 0.4099 - accuracy: 0.8682 - val_loss: 0.3633 - val_accuracy: 0.8766
Epoch 8/25
105/105 [==============================] - 5s 46ms/step - loss: 0.3913 - accuracy: 0.8723 - val_loss: 0.3623 - val_accuracy: 0.8931
Epoch 9/25
105/105 [==============================] - 5s 44ms/step - loss: 0.4082 - accuracy: 0.8729 - val_loss: 0.3184 - val_accuracy: 0.8914
Epoch 10/25
105/105 [==============================] - 5s 46ms/step - loss: 0.3846 - accuracy: 0.8774 - val_loss: 0.3063 - val_accuracy: 0.9062
Epoch 11/25
105/105 [==============================] - 5s 45ms/step - loss: 0.3521 - accuracy: 0.8893 - val_loss: 0.3091 - val_accuracy: 0.8947
Epoch 12/25
105/105 [==============================] - 5s 45ms/step - loss: 0.3430 - accuracy: 0.8860 - val_loss: 0.3341 - val_accuracy: 0.8832
Epoch 13/25
105/105 [==============================] - 5s 47ms/step - loss: 0.3542 - accuracy: 0.8824 - val_loss: 0.3040 - val_accuracy: 0.9013
Epoch 14/25
105/105 [==============================] - 5s 47ms/step - loss: 0.3754 - accuracy: 0.8783 - val_loss: 0.2929 - val_accuracy: 0.8931
Epoch 15/25
105/105 [==============================] - 5s 44ms/step - loss: 0.3500 - accuracy: 0.8911 - val_loss: 0.2831 - val_accuracy: 0.8997
Epoch 16/25
105/105 [==============================] - 5s 45ms/step - loss: 0.3486 - accuracy: 0.8872 - val_loss: 0.2908 - val_accuracy: 0.9079
Epoch 17/25
105/105 [==============================] - 5s 48ms/step - loss: 0.3634 - accuracy: 0.8869 - val_loss: 0.2785 - val_accuracy: 0.9013
Epoch 18/25
105/105 [==============================] - 5s 45ms/step - loss: 0.3377 - accuracy: 0.8923 - val_loss: 0.2645 - val_accuracy: 0.9095
Epoch 19/25
105/105 [==============================] - 5s 46ms/step - loss: 0.3374 - accuracy: 0.8881 - val_loss: 0.2703 - val_accuracy: 0.9145
Epoch 20/25
105/105 [==============================] - 5s 46ms/step - loss: 0.3429 - accuracy: 0.8872 - val_loss: 0.2572 - val_accuracy: 0.9095
Epoch 21/25
105/105 [==============================] - 5s 45ms/step - loss: 0.3282 - accuracy: 0.8955 - val_loss: 0.2549 - val_accuracy: 0.9030
Epoch 22/25
105/105 [==============================] - 5s 45ms/step - loss: 0.3187 - accuracy: 0.8914 - val_loss: 0.2502 - val_accuracy: 0.9095
Epoch 23/25
105/105 [==============================] - 5s 45ms/step - loss: 0.3061 - accuracy: 0.8970 - val_loss: 0.2420 - val_accuracy: 0.9161
Epoch 24/25
105/105 [==============================] - 5s 47ms/step - loss: 0.3204 - accuracy: 0.8988 - val_loss: 0.2544 - val_accuracy: 0.9128
Epoch 25/25
105/105 [==============================] - 5s 46ms/step - loss: 0.3244 - accuracy: 0.8973 - val_loss: 0.2571 - val_accuracy: 0.9062

3.2 Visualizing the Results¶

Plotting the model results...

In [ ]:

            
                Copied!
                
plot_acc_loss(history)
plot_acc_loss(history)

<Figure size 432x288 with 0 Axes>

There is not a much difference with the former, but the results that we get by combining Convnets and RNNs are away good than using RNNs only. And in addition to that, it is cheaper to train it.

For now, let's test our later network on new texts.

3.3 Performing Inference on New Texts¶

In [ ]:

            
                Copied!
                
def predict(model, sample_news, class_names):

  # Convert sample news into array

  sample_news = np.array(sample_news)

  # Predict the news type

  preds = model.predict(sample_news)

  pred_class = np.argmax(preds[0])

  print(f'predicted class: {pred_class} \nPredicted Class name: {class_names[pred_class]}')
def predict(model, sample_news, class_names):

  # Convert sample news into array

  sample_news = np.array(sample_news)

  # Predict the news type

  preds = model.predict(sample_news)

  pred_class = np.argmax(preds[0])

  print(f'predicted class: {pred_class} \nPredicted Class name: {class_names[pred_class]}')

In [ ]:

            
                Copied!
                
sample_news = ['Tesla, a self driving car company is also planning to make a humanoid robot. This humanoid robot appeared dancing in the latest Tesla AI day']

predict(conv_rnn_model, sample_news, class_names)
sample_news = ['Tesla, a self driving car company is also planning to make a humanoid robot. This humanoid robot appeared dancing in the latest Tesla AI day']

predict(conv_rnn_model, sample_news, class_names)

predicted class: 3 
Predicted Class name: Sci/Tech

In [ ]:

            
                Copied!
                
sample_news = ["In the last weeks, there has been many transfer suprises in footbal. Ronaldo went back to Old Trafford, "
                "while Messi went to Paris Saint Germain to join his former colleague Neymar."
                "We can't wait to see these two clubs will perform in upcoming leagues"]
            
predict(conv_rnn_model, sample_news, class_names)
sample_news = ["In the last weeks, there has been many transfer suprises in footbal. Ronaldo went back to Old Trafford, "
                "while Messi went to Paris Saint Germain to join his former colleague Neymar."
                "We can't wait to see these two clubs will perform in upcoming leagues"]
            
predict(conv_rnn_model, sample_news, class_names)

predicted class: 1 
Predicted Class name: Sports

In [ ]:

            
                Copied!
                
sample_news = ["In the latest business news: The tech giant NVIDIA has acquired ARM, a microproccessor company"]
            
predict(conv_rnn_model, sample_news, class_names)
sample_news = ["In the latest business news: The tech giant NVIDIA has acquired ARM, a microproccessor company"]
            
predict(conv_rnn_model, sample_news, class_names)

predicted class: 2 
Predicted Class name: Business

4. Further Learning¶

This notebook has been about using Convnets for text classification. We also brought RNNs into the picture.

If you would like to learn more about sequence models, I recommend the following courses:

Deep Learning Specialization - Course 5 of Sequence models. This course is available on Coursera and Youtube.
Intro to Deep Learning MIT - Lecture 2 of Recurrent Neural Networks, available on Youtube.

For ConvNets and RNNs, I recommend the following two papers:

BACK TO TOP ¶

In [ ]:

Using Convolutional Neural Networks for Texts Classification¶

Contents¶

1. Intro to CNNs for Texts¶

2. CNN's for Texts In Practice: News Classification¶

2.1 Getting the Data¶

2.2 Preparing the Data¶

2.3 Creating and Training the Model¶

2.4 Visualizing the Results¶

3. Combining CNNs and RNNs for Text Classification¶

3.1 Convnets and RNNs Model¶

3.2 Visualizing the Results¶

3.3 Performing Inference on New Texts¶

4. Further Learning¶

BACK TO TOP¶

BACK TO TOP ¶