🤗 Dataset viewer

The dataset viewer is a lightweight web API for visualizing and exploring all types of datasets - computer vision, speech, text, and tabular - stored on the Hugging Face Hub.

The main feature of the dataset viewer is to auto-convert all the Hub datasets to Parquet. Read more in the Parquet section.

As datasets increase in size and data type richness, the cost of preprocessing (storage and compute) these datasets can be challenging and time-consuming. To help users access these modern datasets, The dataset viewer runs a server behind the scenes to generate the API responses ahead of time and stores them in a database so they are instantly returned when you make a query through the API.

Let the dataset viewer take care of the heavy lifting so you can use a simple REST API on any of the 30,000+ datasets on Hugging Face to:

List the dataset splits, column names and data types
Get the dataset size (in number of rows or bytes)
Download and view rows at any index in the dataset
Search a word in the dataset
Filter rows based on a query string
Get insightful statistics about the data
Access the dataset as parquet files to use in your favorite processing or analytics framework

Dataset viewer of the OpenBookQA dataset

Join the growing community on the forum or Discord today, and give the dataset viewer repository a ⭐️ if you’re interested in the latest updates!

< > Update on GitHub