PAIR
/

Text-to-Video
Diffusers
StableDiffusionPipeline
text-to-image
Inference Endpoints
Edit model card

Text2Video-Zero Model Card - ControlNet Canny GTA-5 Style

Text2Video-Zero is a zero-shot text to video generator. It can perform zero-shot text-to-video generation, Video Instruct Pix2Pix (instruction-guided video editing), text and pose conditional video generation, text and canny-edge conditional video generation, and text, canny-edge and dreambooth conditional video generation. For more information about this work, please have a look at our paper and our demo: Hugging Face Spaces Our code works with any StableDiffusion base model.

This model provides DreamBooth weights for the GTA-5 style to be used with edge guidance (using ControlNet) in text2video zero.

Weights for Text2Video-Zero

We converted the original weights into diffusers and made them usable for ControlNet with edge guidance using: https://github.com/lllyasviel/ControlNet/discussions/12.

Model Details

  • Developed by: Levon Khachatryan, Andranik Movsisyan, Vahram Tadevosyan, Roberto Henschel, Zhangyang Wang, Shant Navasardyan and Humphrey Shi

  • Model type: Dreambooth text-to-image and text-to-video generation model with edge control for text2video zero

  • Language(s): English

  • License: The CreativeML OpenRAIL M license.

  • Model Description: This is a model for text2video zero with edge guidance and gta-5 style. It can be used also with ControlNet in a text-to-image setup with edge guidance.

  • DreamBoth Keyword: gtav style

  • Resources for more information: GitHub, Paper, CIVITAI.

  • Cite as:

    @article{text2video-zero,
      title={Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators},
      author={Khachatryan, Levon and Movsisyan, Andranik and Tadevosyan, Vahram and Henschel, Roberto and Wang, Zhangyang and Navasardyan, Shant and Shi, Humphrey},
      journal={arXiv preprint arXiv:2303.13439},
      year={2023}
    }
    

Original Weights

The Dreambooth weights for the GTA-5 style were taken from CIVITAI.

Model Details

  • Developed by: Quiet_Joker (Username listed on CIVITAI)
  • Model type: Dreambooth text-to-image generation model
  • Language(s): English
  • License: The CreativeML OpenRAIL M license.
  • Model Description: This is a model that was created using DreamBooth to generate images with GTA-5 style, based on text prompts.
  • DreamBoth Keyword: gtav style
  • Resources for more information: CIVITAI.

Biases content acknowledgement:

Beware that Text2Video-Zero may output content that reinforces or exacerbates societal biases, as well as realistic faces, pornography, and violence. Text2Video-Zero in this demo is meant only for research purposes.

Citation

  @article{text2video-zero,
    title={Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators},
    author={Khachatryan, Levon and Movsisyan, Andranik and Tadevosyan, Vahram and Henschel, Roberto and Wang, Zhangyang and Navasardyan, Shant and Shi, Humphrey},
    journal={arXiv preprint arXiv:2303.13439},
    year={2023}
  }
Downloads last month
15
Inference API (serverless) does not yet support diffusers models for this pipeline type.

Spaces using PAIR/text2video-zero-controlnet-canny-gta5 4