Diffusers documentation

ScoreSdeVeScheduler

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

ScoreSdeVeScheduler

ScoreSdeVeScheduler is a variance exploding stochastic differential equation (SDE) scheduler. It was introduced in the Score-Based Generative Modeling through Stochastic Differential Equations paper by Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole.

The abstract from the paper is:

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.

ScoreSdeVeScheduler

class diffusers.ScoreSdeVeScheduler

< >

( num_train_timesteps: int = 2000 snr: float = 0.15 sigma_min: float = 0.01 sigma_max: float = 1348.0 sampling_eps: float = 1e-05 correct_steps: int = 1 )

Parameters

  • num_train_timesteps (int, defaults to 1000) — The number of diffusion steps to train the model.
  • snr (float, defaults to 0.15) — A coefficient weighting the step from the model_output sample (from the network) to the random noise.
  • sigma_min (float, defaults to 0.01) — The initial noise scale for the sigma sequence in the sampling procedure. The minimum sigma should mirror the distribution of the data.
  • sigma_max (float, defaults to 1348.0) — The maximum value used for the range of continuous timesteps passed into the model.
  • sampling_eps (float, defaults to 1e-5) — The end value of sampling where timesteps decrease progressively from 1 to epsilon.
  • correct_steps (int, defaults to 1) — The number of correction steps performed on a produced sample.

ScoreSdeVeScheduler is a variance exploding stochastic differential equation (SDE) scheduler.

This model inherits from SchedulerMixin and ConfigMixin. Check the superclass documentation for the generic methods the library implements for all schedulers such as loading and saving.

scale_model_input

< >

( sample: FloatTensor timestep: Optional = None ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — The input sample.
  • timestep (int, optional) — The current timestep in the diffusion chain.

Returns

torch.FloatTensor

A scaled input sample.

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_sigmas

< >

( num_inference_steps: int sigma_min: float = None sigma_max: float = None sampling_eps: float = None )

Parameters

  • num_inference_steps (int) — The number of diffusion steps used when generating samples with a pre-trained model.
  • sigma_min (float, optional) — The initial noise scale value (overrides value given during scheduler instantiation).
  • sigma_max (float, optional) — The final noise scale value (overrides value given during scheduler instantiation).
  • sampling_eps (float, optional) — The final timestep value (overrides value given during scheduler instantiation).

Sets the noise scales used for the diffusion chain (to be run before inference). The sigmas control the weight of the drift and diffusion components of the sample update.

set_timesteps

< >

( num_inference_steps: int sampling_eps: float = None device: Union = None )

Parameters

  • num_inference_steps (int) — The number of diffusion steps used when generating samples with a pre-trained model.
  • sampling_eps (float, optional) — The final timestep value (overrides value given during scheduler instantiation).
  • device (str or torch.device, optional) — The device to which the timesteps should be moved to. If None, the timesteps are not moved.

Sets the continuous timesteps used for the diffusion chain (to be run before inference).

step_correct

< >

( model_output: FloatTensor sample: FloatTensor generator: Optional = None return_dict: bool = True ) SdeVeOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — The direct output from learned diffusion model.
  • sample (torch.FloatTensor) — A current instance of a sample created by the diffusion process.
  • generator (torch.Generator, optional) — A random number generator.
  • return_dict (bool, optional, defaults to True) — Whether or not to return a SdeVeOutput or tuple.

Returns

SdeVeOutput or tuple

If return_dict is True, SdeVeOutput is returned, otherwise a tuple is returned where the first element is the sample tensor.

Correct the predicted sample based on the model_output of the network. This is often run repeatedly after making the prediction for the previous timestep.

step_pred

< >

( model_output: FloatTensor timestep: int sample: FloatTensor generator: Optional = None return_dict: bool = True ) SdeVeOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — The direct output from learned diffusion model.
  • timestep (int) — The current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — A current instance of a sample created by the diffusion process.
  • generator (torch.Generator, optional) — A random number generator.
  • return_dict (bool, optional, defaults to True) — Whether or not to return a SdeVeOutput or tuple.

Returns

SdeVeOutput or tuple

If return_dict is True, SdeVeOutput is returned, otherwise a tuple is returned where the first element is the sample tensor.

Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion process from the learned model outputs (most often the predicted noise).

SdeVeOutput

class diffusers.schedulers.scheduling_sde_ve.SdeVeOutput

< >

( prev_sample: FloatTensor prev_sample_mean: FloatTensor )

Parameters

  • prev_sample (torch.FloatTensor of shape (batch_size, num_channels, height, width) for images) — Computed sample (x_{t-1}) of previous timestep. prev_sample should be used as next model input in the denoising loop.
  • prev_sample_mean (torch.FloatTensor of shape (batch_size, num_channels, height, width) for images) — Mean averaged prev_sample over previous timesteps.

Output class for the scheduler’s step function output.