python-kerasHow can I use Python and Keras to perform text classification?

Text classification is a task that can be performed using Python and Keras. It involves assigning a category or label to a given piece of text. This can be done by first tokenizing the text into words, then creating a model using Keras layers and training it on a labeled dataset.

Example code

# Import necessary modules
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Flatten, Embedding

# Define parameters for tokenizer
num_words = 1000
tokenizer = Tokenizer(num_words=num_words)

# Fit the tokenizer on the training data
tokenizer.fit_on_texts(X_train)

# Create a sequence of tokens for each review
X_train_tokens = tokenizer.texts_to_sequences(X_train)

# Pad sequences so that all inputs have the same length
X_train_pad = pad_sequences(X_train_tokens, maxlen=max_tokens)

# Create a neural network model
model = Sequential()

# Add an embedding layer
model.add(Embedding(input_dim=num_words,
                    output_dim=8,
                    input_length=max_tokens,
                    name='layer_embedding'))

# Add a convolutional layer
model.add(Conv1D(16, 8, activation='relu'))

# Add a max pooling layer
model.add(MaxPooling1D(4))

# Add a flatten layer
model.add(Flatten())

# Add a fully connected layer
model.add(Dense(10, activation='relu'))

# Add a classifier layer
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(X_train_pad, y_train,
          epochs=5,
          batch_size=32,
          validation_split=0.2)

The code above uses the Keras library to perform text classification. It first tokenizes the text into words, then creates a model using Keras layers and trains it on a labeled dataset. The model consists of an embedding layer, a convolutional layer, a max pooling layer, a flatten layer, a fully connected layer, and a classifier layer. The model is then compiled and trained on the data.

Code explanation

from keras.preprocessing.text import Tokenizer: imports the Tokenizer module from Keras, which is used for tokenizing text into words.
from keras.preprocessing.sequence import pad_sequences: imports the pad_sequences module from Keras, which is used for padding sequences so that all inputs have the same length.
num_words = 1000: defines the number of words to be used for tokenizing the text.
tokenizer = Tokenizer(num_words=num_words): creates a Tokenizer object with the specified number of words.
tokenizer.fit_on_texts(X_train): fits the tokenizer on the training data.
X_train_tokens = tokenizer.texts_to_sequences(X_train): creates a sequence of tokens for each review.
X_train_pad = pad_sequences(X_train_tokens, maxlen=max_tokens): pads sequences so that all inputs have the same length.
model = Sequential(): creates a neural network model.
model.add(Embedding(input_dim=num_words, output_dim=8, input_length=max_tokens, name='layer_embedding')): adds an embedding layer to the model.
model.add(Conv1D(16, 8, activation='relu')): adds a convolutional layer to the model.
model.add(MaxPooling1D(4)): adds a max pooling layer to the model.
model.add(Flatten()): adds a flatten layer to the model.
model.add(Dense(10, activation='relu')): adds a fully connected layer to the model.
model.add(Dense(1, activation='sigmoid')): adds a classifier layer to the model.
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']): compiles the model.
model.fit(X_train_pad, y_train, epochs=5, batch_size=32, validation_split=0.2): trains the model.

Helpful links

Edit this code on GitHub

More of Python Keras

See more codes...