Transformers.js documentation

processors

Transformers.js

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

processors

Processors are used to prepare non-textual inputs (e.g., image or audio) for a model.

Example: Using a WhisperProcessor to prepare an audio input for a model.

import { AutoProcessor, read_audio } from '@xenova/transformers';

let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');
let audio = await read_audio('https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac', 16000);
let { input_features } = await processor(audio);
// Tensor {
//   data: Float32Array(240000) [0.4752984642982483, 0.5597258806228638, 0.56434166431427, ...],
//   dims: [1, 80, 3000],
//   type: 'float32',
//   size: 240000,
// }

processors
- static
  - .FeatureExtractor ⇐ Callable
    - new FeatureExtractor(config)
  - .ImageFeatureExtractor ⇐ FeatureExtractor
    - new ImageFeatureExtractor(config)
    - .thumbnail(image, size, [resample]) ⇒ Promise.<RawImage>
    - .crop_margin(image, gray_threshold) ⇒ Promise.<RawImage>
    - .pad_image(pixelData, imgDims, padSize, options) ⇒ *
    - .rescale(pixelData) ⇒ void
    - .get_resize_output_image_size(image, size) ⇒ *
    - .resize(image) ⇒ Promise.<RawImage>
    - .preprocess(image, overrides) ⇒ Promise.<PreprocessedImage>
    - ._call(images, ...args) ⇒ Promise.<ImageFeatureExtractorResult>
  - .DetrFeatureExtractor ⇐ ImageFeatureExtractor
    - ._call(images) ⇒ Promise.<DetrFeatureExtractorResult>
    - .post_process_object_detection() : post_process_object_detection
    - .remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels) ⇒ *
    - .check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold) ⇒ *
    - .compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size) ⇒ *
    - .post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) ⇒ Array.<{segmentation: Tensor, segments_info: Array<{id: number, label_id: number, score: number}>}>
  - .Processor ⇐ Callable
    - new Processor(feature_extractor)
    - ._call(input, ...args) ⇒ Promise.<any>
  - .WhisperProcessor ⇐ Processor
    - ._call(audio) ⇒ Promise.<any>
  - .AutoProcessor
    - .from_pretrained(pretrained_model_name_or_path, options) ⇒ Promise.<Processor>
- inner
  - ~center_to_corners_format(arr) ⇒ Array.<number>
  - ~enforce_size_divisibility(size, divisor) ⇒ *
  - ~HeightWidth : *
  - ~ImageFeatureExtractorResult : object
  - ~PreprocessedImage : object
  - ~DetrFeatureExtractorResult : object
  - ~SamImageProcessorResult : object

processors.FeatureExtractor ⇐ <code> Callable </code>

Base class for feature extractors.

Kind: static class of processors
Extends: Callable

new FeatureExtractor(config)

Constructs a new FeatureExtractor instance.

Param	Type	Description
config	`Object`	The configuration for the feature extractor.

processors.ImageFeatureExtractor ⇐ <code> FeatureExtractor </code>

Feature extractor for image models.

Kind: static class of processors
Extends: FeatureExtractor

.ImageFeatureExtractor ⇐ FeatureExtractor
- new ImageFeatureExtractor(config)
- .thumbnail(image, size, [resample]) ⇒ Promise.<RawImage>
- .crop_margin(image, gray_threshold) ⇒ Promise.<RawImage>
- .pad_image(pixelData, imgDims, padSize, options) ⇒ *
- .rescale(pixelData) ⇒ void
- .get_resize_output_image_size(image, size) ⇒ *
- .resize(image) ⇒ Promise.<RawImage>
- .preprocess(image, overrides) ⇒ Promise.<PreprocessedImage>
- ._call(images, ...args) ⇒ Promise.<ImageFeatureExtractorResult>

new ImageFeatureExtractor(config)

Constructs a new ImageFeatureExtractor instance.

Param	Type	Default	Description
config	`Object`		The configuration for the feature extractor.
config.image_mean	`Array.<number>`		The mean values for image normalization.
config.image_std	`Array.<number>`		The standard deviation values for image normalization.
config.do_rescale	`boolean`		Whether to rescale the image pixel values to the [0,1] range.
config.rescale_factor	`number`		The factor to use for rescaling the image pixel values.
config.do_normalize	`boolean`		Whether to normalize the image pixel values.
config.do_resize	`boolean`		Whether to resize the image.
config.resample	`number`		What method to use for resampling.
config.size	`number` \| `Object`		The size to resize the image to.
[config.do_flip_channel_order]	`boolean`	`false`	Whether to flip the color channels from RGB to BGR. Can be overridden by the `do_flip_channel_order` parameter in the `preprocess` method.

imageFeatureExtractor.thumbnail(image, size, [resample]) ⇒ <code> Promise. < RawImage > </code>

Resize the image to make a thumbnail. The image is resized so that no dimension is larger than any corresponding dimension of the specified size.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage> - The resized image.

Param	Type	Default	Description
image	`RawImage`		The image to be resized.
size	`Object`		The size `{"height": h, "width": w}` to resize the image to.
[resample]	`string` \| `0` \| `1` \| `2` \| `3` \| `4` \| `5`	`2`	The resampling filter to use.

imageFeatureExtractor.crop_margin(image, gray_threshold) ⇒ <code> Promise. < RawImage > </code>

Crops the margin of the image. Gray pixels are considered margin (i.e., pixels with a value below the threshold).

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage> - The cropped image.

Param	Type	Default	Description
image	`RawImage`		The image to be cropped.
gray_threshold	`number`	`200`	Value below which pixels are considered to be gray.

imageFeatureExtractor.pad_image(pixelData, imgDims, padSize, options) ⇒ <code> * </code>

Pad the image by a certain amount.

Kind: instance method of ImageFeatureExtractor
Returns: * - The padded pixel data and image dimensions.

Param	Type	Default	Description
pixelData	`Float32Array`		The pixel data to pad.
imgDims	`Array.<number>`		The dimensions of the image (height, width, channels).
padSize	`*`		The dimensions of the padded image.
options	`Object`		The options for padding.
[options.mode]	`'constant'` \| `'symmetric'`	`'constant'`	The type of padding to add.
[options.center]	`boolean`	`false`	Whether to center the image.
[options.constant_values]	`number`	`0`	The constant value to use for padding.

imageFeatureExtractor.rescale(pixelData) ⇒ <code> void </code>

Rescale the image’ pixel values by this.rescale_factor.

Kind: instance method of ImageFeatureExtractor

Param	Type	Description
pixelData	`Float32Array`	The pixel data to rescale.

imageFeatureExtractor.get_resize_output_image_size(image, size) ⇒ <code> * </code>

Find the target (width, height) dimension of the output image after resizing given the input image and the desired size.

Kind: instance method of ImageFeatureExtractor
Returns: * - The target (width, height) dimension of the output image after resizing.

Param	Type	Description
image	`RawImage`	The image to resize.
size	`any`	The size to use for resizing the image.

imageFeatureExtractor.resize(image) ⇒ <code> Promise. < RawImage > </code>

Resizes the image.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage> - The resized image.

Param	Type	Description
image	`RawImage`	The image to resize.

imageFeatureExtractor.preprocess(image, overrides) ⇒ <code> Promise. < PreprocessedImage > </code>

Preprocesses the given image.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<PreprocessedImage> - The preprocessed image.

Param	Type	Description
image	`RawImage`	The image to preprocess.
overrides	`Object`	The overrides for the preprocessing options.

imageFeatureExtractor._call(images, ...args) ⇒ <code> Promise. < ImageFeatureExtractorResult > </code>

Calls the feature extraction process on an array of images, preprocesses each image, and concatenates the resulting features into a single Tensor.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<ImageFeatureExtractorResult> - An object containing the concatenated pixel values (and other metadata) of the preprocessed images.

Param	Type	Description
images	`Array.<RawImage>`	The image(s) to extract features from.
...args	`any`	Additional arguments.

processors.DetrFeatureExtractor ⇐ <code> ImageFeatureExtractor </code>

Detr Feature Extractor.

Kind: static class of processors
Extends: ImageFeatureExtractor

.DetrFeatureExtractor ⇐ ImageFeatureExtractor
- ._call(images) ⇒ Promise.<DetrFeatureExtractorResult>
- .post_process_object_detection() : post_process_object_detection
- .remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels) ⇒ *
- .check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold) ⇒ *
- .compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size) ⇒ *
- .post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) ⇒ Array.<{segmentation: Tensor, segments_info: Array<{id: number, label_id: number, score: number}>}>

detrFeatureExtractor._call(images) ⇒ <code> Promise. < DetrFeatureExtractorResult > </code>

Calls the feature extraction process on an array of images, preprocesses each image, and concatenates the resulting features into a single Tensor.

Kind: instance method of DetrFeatureExtractor
Returns: Promise.<DetrFeatureExtractorResult> - An object containing the concatenated pixel values of the preprocessed images.

Param	Type	Description
images	`Array.<RawImage>`	The image(s) to extract features from.

detrFeatureExtractor.post_process_object_detection() : <code> post_process_object_detection </code>

Kind: instance method of DetrFeatureExtractor

detrFeatureExtractor.remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels) ⇒ <code> * </code>

Binarize the given masks using object_mask_threshold, it returns the associated values of masks, scores and labels.

Kind: instance method of DetrFeatureExtractor
Returns: * - The binarized masks, the scores, and the labels.

Param	Type	Description
class_logits	`Tensor`	The class logits.
mask_logits	`Tensor`	The mask logits.
object_mask_threshold	`number`	A number between 0 and 1 used to binarize the masks.
num_labels	`number`	The number of labels.

detrFeatureExtractor.check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold) ⇒ <code> * </code>

Checks whether the segment is valid or not.

Kind: instance method of DetrFeatureExtractor
Returns: * - Whether the segment is valid or not, and the indices of the valid labels.

Param	Type	Default	Description
mask_labels	`Int32Array`		Labels for each pixel in the mask.
mask_probs	`Array.<Tensor>`		Probabilities for each pixel in the masks.
k	`number`		The class id of the segment.
mask_threshold	`number`	`0.5`	The mask threshold.
overlap_mask_area_threshold	`number`	`0.8`	The overlap mask area threshold.

detrFeatureExtractor.compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size) ⇒ <code> * </code>

Computes the segments.

Kind: instance method of DetrFeatureExtractor
Returns: * - The computed segments.

Param	Type	Description
mask_probs	`Array.<Tensor>`	The mask probabilities.
pred_scores	`Array.<number>`	The predicted scores.
pred_labels	`Array.<number>`	The predicted labels.
mask_threshold	`number`	The mask threshold.
overlap_mask_area_threshold	`number`	The overlap mask area threshold.
label_ids_to_fuse	`Set.<number>`	The label ids to fuse.
target_size	`Array.<number>`	The target size of the image.

detrFeatureExtractor.post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) ⇒ <code> Array. < {segmentation: Tensor, segments_info: Array < {id: number, label_id: number, score: number} > } > </code>

Post-process the model output to generate the final panoptic segmentation.

Kind: instance method of DetrFeatureExtractor

Param	Type	Default	Description
outputs	`*`		The model output to post process
[threshold]	`number`	`0.5`	The probability score threshold to keep predicted instance masks.
[mask_threshold]	`number`	`0.5`	Threshold to use when turning the predicted masks into binary values.
[overlap_mask_area_threshold]	`number`	`0.8`	The overlap mask area threshold to merge or discard small disconnected parts within each binary instance mask.
[label_ids_to_fuse]	`Set.<number>`		The labels in this state will have all their instances be fused together.
[target_sizes]	`Array.<Array<number>>`		The target sizes to resize the masks to.

processors.Processor ⇐ <code> Callable </code>

Represents a Processor that extracts features from an input.

Kind: static class of processors
Extends: Callable

.Processor ⇐ Callable
- new Processor(feature_extractor)
- ._call(input, ...args) ⇒ Promise.<any>

new Processor(feature_extractor)

Creates a new Processor with the given feature extractor.

Param	Type	Description
feature_extractor	`FeatureExtractor`	The function used to extract features from the input.

processor._call(input, ...args) ⇒ <code> Promise. < any > </code>

Calls the feature_extractor function with the given input.

Kind: instance method of Processor
Returns: Promise.<any> - A Promise that resolves with the extracted features.

Param	Type	Description
input	`any`	The input to extract features from.
...args	`any`	Additional arguments.

processors.WhisperProcessor ⇐ <code> Processor </code>

Represents a WhisperProcessor that extracts features from an audio input.

Kind: static class of processors
Extends: Processor

whisperProcessor._call(audio) ⇒ <code> Promise. < any > </code>

Calls the feature_extractor function with the given audio input.

Kind: instance method of WhisperProcessor
Returns: Promise.<any> - A Promise that resolves with the extracted features.

Param	Type	Description
audio	`any`	The audio input to extract features from.

processors.AutoProcessor

Helper class which is used to instantiate pretrained processors with the from_pretrained function. The chosen processor class is determined by the type specified in the processor config.

Example: Load a processor using from_pretrained.

let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');

Example: Run an image through a processor.

let processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);
// {
//   "pixel_values": {
//     "dims": [ 1, 3, 224, 224 ],
//     "type": "float32",
//     "data": Float32Array [ -1.558687686920166, -1.558687686920166, -1.5440893173217773, ... ],
//     "size": 150528
//   },
//   "original_sizes": [
//     [ 533, 800 ]
//   ],
//   "reshaped_input_sizes": [
//     [ 224, 224 ]
//   ]
// }

Kind: static class of processors

AutoProcessor.from_pretrained(pretrained_model_name_or_path, options) ⇒ <code> Promise. < Processor > </code>

Instantiate one of the processor classes of the library from a pretrained model.

The processor class to instantiate is selected based on the feature_extractor_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible)

Kind: static method of AutoProcessor
Returns: Promise.<Processor> - A new instance of the Processor class.

Param Type Description

pretrained_model_name_or_path

Param	Type	Description
pretrained_model_name_or_path	`string`	The name or path of the pretrained model. Can be either: A string, the model id of a pretrained processor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`. A path to a directory containing processor files, e.g., `./my_model_directory/`.
options	`*`	Additional options for loading the processor.

string

The name or path of the pretrained model. Can be either:

A string, the model id of a pretrained processor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.
A path to a directory containing processor files, e.g., ./my_model_directory/.

options

*

Additional options for loading the processor.

processors~center_to_corners_format(arr) ⇒ <code> Array. < number > </code>

Converts bounding boxes from center format to corners format.

Kind: inner method of processors
Returns: Array.<number> - The coodinates for the top-left and bottom-right corners of the box (top_left_x, top_left_y, bottom_right_x, bottom_right_y)

Param	Type	Description
arr	`Array.<number>`	The coordinate for the center of the box and its width, height dimensions (center_x, center_y, width, height)

processors~enforce_size_divisibility(size, divisor) ⇒ <code> * </code>

Rounds the height and width down to the closest multiple of size_divisibility

Kind: inner method of processors
Returns: * - The rounded size.

Param	Type	Description
size	`*`	The size of the image
divisor	`number`	The divisor to use.

processors~HeightWidth : <code> * </code>

Named tuple to indicate the order we are using is (height x width), even though the Graphics’ industry standard is (width x height).

Kind: inner typedef of processors

processors~ImageFeatureExtractorResult : <code> object </code>

Kind: inner typedef of processors
Properties

Name	Type	Description
pixel_values	`Tensor`	The pixel values of the batched preprocessed images.
original_sizes	`Array.<HeightWidth>`	Array of two-dimensional tuples like [[480, 640]].
reshaped_input_sizes	`Array.<HeightWidth>`	Array of two-dimensional tuples like [[1000, 1330]].

processors~PreprocessedImage : <code> object </code>

Kind: inner typedef of processors
Properties

Name	Type	Description
original_size	`HeightWidth`	The original size of the image.
reshaped_input_size	`HeightWidth`	The reshaped input size of the image.
pixel_values	`Tensor`	The pixel values of the preprocessed image.

processors~DetrFeatureExtractorResult : <code> object </code>

Kind: inner typedef of processors
Properties

Name	Type
pixel_mask	`Tensor`

processors~SamImageProcessorResult : <code> object </code>

Kind: inner typedef of processors
Properties

Name	Type
pixel_values	`Tensor`
original_sizes	`Array.<HeightWidth>`
reshaped_input_sizes	`Array.<HeightWidth>`
[input_points]	`Tensor`
[input_labels]	`Tensor`

< > Update on GitHub

←Tokenizers Configs→