BigBirdPegasus model (large)

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.

BigBird was introduced in this paper and first released in this repository.

Disclaimer: The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description

BigBird relies on block sparse attention instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.

How to use

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")

# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv")

# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv", block_size=16, num_random_blocks=2)

text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)

Training Procedure

This checkpoint is obtained after fine-tuning BigBirdPegasusForConditionalGeneration for summarization on arxiv dataset from scientific_papers.

BibTeX entry and citation info

@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Downloads last month: 16,679

Dataset used to train google/bigbird-pegasus-large-arxiv

Spaces using google/bigbird-pegasus-large-arxiv 4

Evaluation results

ROUGE-1 on scientific_papers
test set self-reported

36.028
ROUGE-2 on scientific_papers
test set self-reported

13.417
ROUGE-L on scientific_papers
test set self-reported

21.961
ROUGE-LSUM on scientific_papers
test set self-reported

29.648
loss on scientific_papers
test set self-reported

2.774
meteor on scientific_papers
test set self-reported

0.282
gen_len on scientific_papers
test set self-reported

209.254
ROUGE-1 on cnn_dailymail
test set self-reported

9.088
ROUGE-2 on cnn_dailymail
test set self-reported

1.032
ROUGE-L on cnn_dailymail
test set self-reported

7.318
ROUGE-LSUM on cnn_dailymail
test set self-reported

8.146
loss on cnn_dailymail
test set self-reported

NaN
gen_len on cnn_dailymail
test set self-reported

210.476
ROUGE-1 on xsum
test set self-reported

4.979
ROUGE-2 on xsum
test set self-reported

0.353
ROUGE-L on xsum
test set self-reported

4.368
ROUGE-LSUM on xsum
test set self-reported

4.172
loss on xsum
test set self-reported

NaN
gen_len on xsum
test set self-reported

230.489
ROUGE-1 on scientific_papers
test set self-reported

43.470
ROUGE-2 on scientific_papers
test set self-reported

17.430
ROUGE-L on scientific_papers
test set self-reported

26.259
ROUGE-LSUM on scientific_papers
test set self-reported

35.559
loss on scientific_papers
test set self-reported

2.111
gen_len on scientific_papers
test set self-reported

183.370
ROUGE-1 on samsum
test set self-reported

3.621
ROUGE-2 on samsum
test set self-reported

0.170
ROUGE-L on samsum
test set self-reported

3.202
ROUGE-LSUM on samsum
test set self-reported

3.327
loss on samsum
test set self-reported

7.664
gen_len on samsum
test set self-reported

233.811

View on Papers With Code