Image Classification
Image classification is the task of assigning a label or class to an entire image. Images are expected to have only one class for each image. Image classification models take an image as input and return a prediction about which class the image belongs to.
About Image Classification
Use Cases
Image classification models can be used when we are not interested in specific instances of objects with location information or their shape.
Keyword Classification
Image classification models are used widely in stock photography to assign each image a keyword.
Image Search
Models trained in image classification can improve user experience by organizing and categorizing photo galleries on the phone or in the cloud, on multiple keywords or tags.
Inference
With the transformers
library, you can use the image-classification
pipeline to infer with image classification models. You can initialize the pipeline with a model id from the Hub. If you do not provide a model id it will initialize with google/vit-base-patch16-224 by default. When calling the pipeline you just need to specify a path, http link or an image loaded in PIL. You can also provide a top_k
parameter which determines how many results it should return.
from transformers import pipeline
clf = pipeline("image-classification")
clf("path_to_a_cat_image")
[{'label': 'tabby cat', 'score': 0.731},
...
]
You can use huggingface.js to classify images using models on Hugging Face Hub.
import { HfInference } from "@huggingface/inference";
const inference = new HfInference(HF_TOKEN);
await inference.imageClassification({
data: await (await fetch("https://picsum.photos/300/300")).blob(),
model: "microsoft/resnet-50",
});
Useful Resources
- Let's Play Pictionary with Machine Learning!
- Fine-Tune ViT for Image Classification with 🤗Transformers
- Walkthrough of Computer Vision Ecosystem in Hugging Face - CV Study Group
- Computer Vision Study Group: Swin Transformer
- Computer Vision Study Group: Masked Autoencoders Paper Walkthrough
- Image classification task guide
Creating your own image classifier in just a few minutes
With HuggingPics, you can fine-tune Vision Transformers for anything using images found on the web. This project downloads images of classes defined by you, trains a model, and pushes it to the Hub. You even get to try out the model directly with a working widget in the browser, ready to be shared with all your friends!
Compatible libraries
Note A strong image classification model.
Note A robust image classification model.
Note A strong image classification model.
Note Benchmark dataset used for image classification with images that belong to 100 classes.
Note Dataset consisting of images of garments.
Note An application that classifies what a given image is about.
- accuracy
- Accuracy is the proportion of correct predictions among the total number of cases processed. It can be computed with: Accuracy = (TP + TN) / (TP + TN + FP + FN) Where: TP: True positive TN: True negative FP: False positive FN: False negative
- recall
- Recall is the fraction of the positive examples that were correctly labeled by the model as positive. It can be computed with the equation: Recall = TP / (TP + FN) Where TP is the true positives and FN is the false negatives.
- precision
- Precision is the fraction of correctly labeled positive examples out of all of the examples that were labeled as positive. It is computed via the equation: Precision = TP / (TP + FP) where TP is the True positives (i.e. the examples correctly labeled as positive) and FP is the False positive examples (i.e. the examples incorrectly labeled as positive).
- f1
- The F1 score is the harmonic mean of the precision and recall. It can be computed with the equation: F1 = 2 * (precision * recall) / (precision + recall)