Datasets:

Languages:
English
Multilinguality:
monolingual
Size Categories:
10K<n<100K
Language Creators:
other
Source Datasets:
original
ArXiv:
License:

The viewer is disabled because this dataset repo requires arbitrary Python code execution. Please consider removing the loading script and relying on automated data support (you can use convert_to_parquet from the datasets library). If this is not possible, please open a discussion for direct help.

Dataset Card for DOCCI

Dataset Summary

DOCCI (Descriptions of Connected and Contrasting Images) is a collection of images paired with detailed descriptions. The descriptions explain the key elements of the images, as well as secondary information such as background, lighting, and settings. The images are specifically taken to help assess the precise visual properties of images. DOCCI also includes many related images that vary in having key differences from the others. All descriptions are manually annotated to ensure they adequately distinguish each image from its counterparts.

Supported Tasks

Text-to-Image and Image-to-Text generation

Languages

English

Dataset Structure

Data Instances

{
    'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1536x2048>,
    'example_id': 'qual_dev_00000',
    'description': 'An indoor angled down medium close-up front view of a real sized stuffed dog with white and black colored fur wearing a blue hard hat with a light on it. A couple inches to the right of the dog is a real sized black and white penguin that is also wearing a blue hard hat with a light on it. The dog is sitting, and is facing slightly towards the right while looking to its right with its mouth slightly open, showing its pink tongue. The dog and penguin are placed on a gray and white carpet, and placed against a white drawer that has a large gray cushion on top of it. Behind the gray cushion is a transparent window showing green trees on the outside.'
}

Data Fields

Name Explanation
image PIL.JpegImagePlugin.JpegImageFile
example_id The unique ID of an example follows this format: <SPLIT_NAME>_<EXAMPLE_NUMBER>.
description Text description of the associated image.

Data Splits

Dataset Train Test Qual Dev Qual Test
DOCCI 9,647 5,000 100 100
DOCCI-AAR 4,932 5,000 -- --

Dataset Creation

Curation Rationale

DOCCI is designed as an evaluation dataset for both text-to-image (T2I) and image-to-text (I2T) generation. Please see our paper for more details.

Source Data

Initial Data Collection

All images were taken by one of the authors and their family.

Annotations

Annotation process

All text descriptions were written by human annotators. We do not rely on any automated process in our data annotation pipeline.

Personal and Sensitive Information

We manually reviewed all images for personally identifiable information (PII), removing some images and blurring detected faces, phone numbers, and URLs to protect privacy. For text descriptions, we instructed annotators to exclude any PII, such as people's names, phone numbers, and URLs. After the annotation phase, we employed automatic tools to scan for PII, ensuring the descriptions remained free of such information.

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Licensing Information

CC BY 4.0

Citation Information

@inproceedings{OnoeDocci2024,
  author        = {Yasumasa Onoe and Sunayana Rane and Zachary Berger and Yonatan Bitton and Jaemin Cho and Roopal Garg and
    Alexander Ku and Zarana Parekh and Jordi Pont-Tuset and Garrett Tanzer and Su Wang and Jason Baldridge},
  title         = {{DOCCI: Descriptions of Connected and Contrasting Images}},
  booktitle     = {arXiv},
  year          = {2024}
}
Downloads last month
124
Edit dataset card