What libraries can I use for Tabular Classification?

The and sklearn library is compatible with Tabular Classification.

What models can I use for Tabular Classification?

The and scikit-learn/cancer-prediction-trees model can be used for Tabular Classification.

What datasets can I use for Tabular Classification?

The and inria-soda/tabular-benchmark dataset can be used for Tabular Classification.

What metrics can I use for Tabular Classification?

The accuracy, recall, precision, and f1 metrics can be used for Tabular Classification.

Tasks

Tabular Classification

Tabular classification is the task of classifying a target category (a group) based on set of attributes.

Inputs

Glucose	Blood Pressure	Skin Thickness	Insulin	BMI
148	72	35	0	33.6
150	50	30	0	35.1
141	60	29	1	39.2

Tabular Classification Model

Output

Diabetes
1
1
0

About Tabular Classification

About the Task

Tabular classification is the task of assigning a label or class given a limited number of attributes. For example, the input can be data related to a customer (balance of the customer, the time being a customer, or more) and the output can be whether the customer will churn from the service or not. There are three types of categorical variables:

Binary variables: Variables that can take two values, like yes or no, open or closed. The task of predicting binary variables is called binary classification.
Ordinal variables: Variables with a ranking relationship, e.g., good, insignificant, and bad product reviews. The task of predicting ordinal variables is called ordinal classification.
Nominal variables: Variables with no ranking relationship among them, e.g., predicting an animal from their weight and height, where categories are cat, dog, or bird. The task of predicting nominal variables is called multinomial classification.

Use Cases

Fraud Detection

Tabular classification models can be used in detecting fraudulent credit card transactions, where the features could be the amount of the transaction and the account balance, and the target to predict could be whether the transaction is fraudulent or not. This is an example of binary classification.

Churn Prediction

Tabular classification models can be used in predicting customer churn in telecommunication. An example dataset for the task is hosted here.

Model Hosting and Inference

You can use skops for model hosting and inference on the Hugging Face Hub. This library is built to improve production workflows of various libraries that are used to train tabular models, including sklearn and xgboost. Using skops you can:

Easily use Inference Endpoints
Build neat UIs with one line of code,
Programmatically create model cards,
Securely serialize your scikit-learn model. (See limitations of using pickle here.)

You can push your model as follows:

from skops import hub_utils
# initialize a repository with a trained model
local_repo = "/path_to_new_repo"
hub_utils.init(model, dst=local_repo)
# push to Hub!
hub_utils.push("username/my-awesome-model", source=local_repo)

Once the model is pushed, you can infer easily.

import skops.hub_utils as hub_utils
import pandas as pd
data = pd.DataFrame(your_data)
# Load the model from the Hub
res = hub_utils.get_model_output("username/my-awesome-model", data)

You can launch a UI for your model with only one line of code!

import gradio as gr
gr.Interface.load("huggingface/username/my-awesome-model").launch()

Useful Resources

Check out the scikit-learn organization to learn more about different algorithms used for this task.
Skops documentation
Skops announcement blog
Notebook: Persisting your scikit-learn model using skops
Check out interactive sklearn examples built with ❤️ using Gradio.

Training your own model in just a few seconds

We have built a baseline trainer application to which you can drag and drop your dataset. It will train a baseline and push it to your Hugging Face Hub profile with a model card containing information about the model.

Available in

Compatible libraries

Scikit-learn

Tabular Classification demo

using scikit-learn/tabular-playground

Models for Tabular Classification

Browse Models (193)

scikit-learn/cancer-prediction-trees

Tabular Classification • Updated Dec 1, 2022

Note Breast cancer prediction model based on decision trees.

Datasets for Tabular Classification

Browse Datasets (225)

inria-soda/tabular-benchmark

Viewer • Updated Sep 4, 2023 • 50.4k • 24

Note A comprehensive curation of datasets covering all benchmarks.

Spaces using Tabular Classification

👁

scikit-learn/tabular-playground

Note An application that can predict defective products on a production line.

Metrics for Tabular Classification

accuracy: Accuracy is the proportion of correct predictions among the total number of cases processed. It can be computed with: Accuracy = (TP + TN) / (TP + TN + FP + FN) Where: TP: True positive TN: True negative FP: False positive FN: False negative

recall: Recall is the fraction of the positive examples that were correctly labeled by the model as positive. It can be computed with the equation: Recall = TP / (TP + FN) Where TP is the true positives and FN is the false negatives.

precision: Precision is the fraction of correctly labeled positive examples out of all of the examples that were labeled as positive. It is computed via the equation: Precision = TP / (TP + FP) where TP is the True positives (i.e. the examples correctly labeled as positive) and FP is the False positive examples (i.e. the examples incorrectly labeled as positive).

f1: The F1 score is the harmonic mean of the precision and recall. It can be computed with the equation: F1 = 2 * (precision * recall) / (precision + recall)