Classification

Open In Colab Medium

Table of contents

  1. Introduction to the task

  2. Get started with the model

  3. Use the model for prediction

    3.1. Predict using Python

    3.2. Predict using Python pipeline

    3.3. Predict using CLI

  4. Train the model on your data

    4.1. from Python

    4.2. from CLI

  5. Evaluate

    5.1. from Python

    5.2. from CLI

  6. Models list

1. Introduction to the task

This section describes a family of BERT-based models that solve a variety of different classification tasks.

Insults detection is a binary classification task of identying wether a given sequence is an insult of another participant of communication.

Sentiment analysis is a task of classifying the polarity of the the given sequence. The number of classes may vary depending on the data: positive/negative binary classification, multiclass classification with a neutral class added or with a number of different emotions.

The models trained for the paraphrase detection task identify whether two sentences expressed with different words convey the same meaning.

2. Get started with the model

First make sure you have the DeepPavlov Library installed. More info about the first installation

[ ]:
!pip install --q deeppavlov

Then make sure that all the required packages for the model are installed.

[ ]:
!python -m deeppavlov install insults_kaggle_bert_torch

insults_kaggle_bert_torch here is the name of the model’s config_file. What is a Config File?

Configuration file defines the model and describes its hyperparameters. To use another model, change the name of the config_file here and further. The full list of NER models with their config names can be found in the table.

3. Use the model for prediction

3.1 Predict using Python

After installing the model, build it from the config and predict.

[ ]:
from deeppavlov import configs, build_model

model = build_model(configs.classifiers.insults_kaggle_bert_torch, download=True)

Input format: List[sentences]

Output format: List[labels]

[4]:
model(['You are kind of stupid', 'You are a wonderful person!'])
[4]:
['Insult', 'Not Insult']

3.2 Predict using CLI

You can also get predictions in an interactive mode through CLI.

[ ]:
! python deeppavlov interact insults_kaggle_bert_torch -d

-d is an optional download key (alternative to download=True in Python code). The key -d is used to download the pre-trained model along with embeddings and all other files needed to run the model.

Or make predictions for samples from stdin.

[ ]:
! python deeppavlov predict insults_kaggle_bert_torch -f <file-name>

4. Evaluation

4.1 Evaluate from Python

[ ]:
from deeppavlov import evaluate_model

model = evaluate_model(configs.classifiers.insults_kaggle_bert_torch, download=True)

4.1 Evaluate from CLI

[ ]:
! python -m deeppavlov evaluate insults_kaggle_bert_torch -d

5. Train the model on your data

5.1 Train your model from Python

Provide your data path

To train the model on your data, you need to change the path to the training data in the config_file.

Parse the config_file and change the path to your data from Python.

[6]:
from deeppavlov import configs, train_model
from deeppavlov.core.commands.utils import parse_config

model_config = parse_config(configs.classifiers.insults_kaggle_bert_torch)

#  dataset that the model was trained on
print(model_config['dataset_reader']['data_path'])
~/.deeppavlov/downloads/insults_data

Provide a data_path to your own dataset. You can also change any of the hyperparameters of the model.

[ ]:
# download and unzip a new example dataset
!wget http://files.deeppavlov.ai/datasets/insults_data.tar.gz
!tar -xzvf "insults_data.tar.gz"
[ ]:
# provide a path to the directory with your train, valid and test files
model_config["dataset_reader"]["data_path"] = "./contents/"

Train dataset format

Train the model using new config

[ ]:
model = train_model(model_config)

Use your model for prediction.

[5]:
model(['You are kind of stupid', 'You are a wonderful person!'])
[5]:
['Insult', 'Not Insult']

5.2 Train your model from CLI

To train the model on your data, create a copy of a config file and change the data_path variable in it. After that, train the model using your new config_file. You can also change any of the hyperparameters of the model.

[ ]:
! python -m deeppavlov train model_config.json

6. Models list

The table presents a list of all of the classification models available in DeepPavlov Library.

Config name

Task

Dataset

Language

Model Size

Score

insults_kaggle_bert

Insult Detection

Insults

En

1.1 GB

ROC-AUC: 0.877

paraphraser_bert

Paraphrase Detection

?

En

?

?

paraphraser_convers_distilrubert_2L

Paraphrase Detection

?

En

?

?

paraphraser_convers_distilrubert_6L

Paraphrase Detection

?

En

?

?

paraphraser_rubert

Paraphrase Detection

?

Ru

?

?

sentiment_sst_conv_bert

Sentiment Analysis

SST

En

?

?

rusentiment_bert

Sentiment Analysis

RuSentiment

Ru

?

?

rusentiment_convers_bert

Sentiment Analysis

RuSentiment

Ru

?

?

rusentiment_convers_distilrubert_2L

Sentiment Analysis

RuSentiment

Ru

?

?

rusentiment_convers_distilrubert_6L

Sentiment Analysis

RuSentiment

Ru

?

?

sentiment_twitter

Sentiment Analysis

Twitter Data

Ru?

?

?