Transformers.js documentation

processors

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

processors

Processors are used to prepare non-textual inputs (e.g., image or audio) for a model.

Example: Using a WhisperProcessor to prepare an audio input for a model.

import { AutoProcessor, read_audio } from '@xenova/transformers';

let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');
let audio = await read_audio('https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac', 16000);
let { input_features } = await processor(audio);
// Tensor {
//   data: Float32Array(240000) [0.4752984642982483, 0.5597258806228638, 0.56434166431427, ...],
//   dims: [1, 80, 3000],
//   type: 'float32',
//   size: 240000,
// }

processors.FeatureExtractor ⇐ <code> Callable </code>

Base class for feature extractors.

Kind: static class of processors
Extends: Callable


new FeatureExtractor(config)

Constructs a new FeatureExtractor instance.

ParamTypeDescription
configObject

The configuration for the feature extractor.


processors.ImageFeatureExtractor ⇐ <code> FeatureExtractor </code>

Feature extractor for image models.

Kind: static class of processors
Extends: FeatureExtractor


new ImageFeatureExtractor(config)

Constructs a new ImageFeatureExtractor instance.

ParamTypeDefaultDescription
configObject

The configuration for the feature extractor.

config.image_meanArray.<number>

The mean values for image normalization.

config.image_stdArray.<number>

The standard deviation values for image normalization.

config.do_rescaleboolean

Whether to rescale the image pixel values to the [0,1] range.

config.rescale_factornumber

The factor to use for rescaling the image pixel values.

config.do_normalizeboolean

Whether to normalize the image pixel values.

config.do_resizeboolean

Whether to resize the image.

config.resamplenumber

What method to use for resampling.

config.sizenumber | Object

The size to resize the image to.

[config.do_flip_channel_order]booleanfalse

Whether to flip the color channels from RGB to BGR. Can be overridden by the do_flip_channel_order parameter in the preprocess method.


imageFeatureExtractor.thumbnail(image, size, [resample]) β‡’ <code> Promise. < RawImage > </code>

Resize the image to make a thumbnail. The image is resized so that no dimension is larger than any corresponding dimension of the specified size.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage> - The resized image.

ParamTypeDefaultDescription
imageRawImage

The image to be resized.

sizeObject

The size {"height": h, "width": w} to resize the image to.

[resample]string | 0 | 1 | 2 | 3 | 4 | 52

The resampling filter to use.


imageFeatureExtractor.crop_margin(image, gray_threshold) β‡’ <code> Promise. < RawImage > </code>

Crops the margin of the image. Gray pixels are considered margin (i.e., pixels with a value below the threshold).

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage> - The cropped image.

ParamTypeDefaultDescription
imageRawImage

The image to be cropped.

gray_thresholdnumber200

Value below which pixels are considered to be gray.


imageFeatureExtractor.pad_image(pixelData, imgDims, padSize, options) β‡’ <code> * </code>

Pad the image by a certain amount.

Kind: instance method of ImageFeatureExtractor
Returns: * - The padded pixel data and image dimensions.

ParamTypeDefaultDescription
pixelDataFloat32Array

The pixel data to pad.

imgDimsArray.<number>

The dimensions of the image (height, width, channels).

padSize*

The dimensions of the padded image.

optionsObject

The options for padding.

[options.mode]'constant' | 'symmetric''constant'

The type of padding to add.

[options.center]booleanfalse

Whether to center the image.

[options.constant_values]number0

The constant value to use for padding.


imageFeatureExtractor.rescale(pixelData) β‡’ <code> void </code>

Rescale the image’ pixel values by this.rescale_factor.

Kind: instance method of ImageFeatureExtractor

ParamTypeDescription
pixelDataFloat32Array

The pixel data to rescale.


imageFeatureExtractor.get_resize_output_image_size(image, size) β‡’ <code> * </code>

Find the target (width, height) dimension of the output image after resizing given the input image and the desired size.

Kind: instance method of ImageFeatureExtractor
Returns: * - The target (width, height) dimension of the output image after resizing.

ParamTypeDescription
imageRawImage

The image to resize.

sizeany

The size to use for resizing the image.


imageFeatureExtractor.resize(image) β‡’ <code> Promise. < RawImage > </code>

Resizes the image.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage> - The resized image.

ParamTypeDescription
imageRawImage

The image to resize.


imageFeatureExtractor.preprocess(image, overrides) β‡’ <code> Promise. < PreprocessedImage > </code>

Preprocesses the given image.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<PreprocessedImage> - The preprocessed image.

ParamTypeDescription
imageRawImage

The image to preprocess.

overridesObject

The overrides for the preprocessing options.


imageFeatureExtractor._call(images, ...args) β‡’ <code> Promise. < ImageFeatureExtractorResult > </code>

Calls the feature extraction process on an array of images, preprocesses each image, and concatenates the resulting features into a single Tensor.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<ImageFeatureExtractorResult> - An object containing the concatenated pixel values (and other metadata) of the preprocessed images.

ParamTypeDescription
imagesArray.<RawImage>

The image(s) to extract features from.

...argsany

Additional arguments.


processors.DetrFeatureExtractor ⇐ <code> ImageFeatureExtractor </code>

Detr Feature Extractor.

Kind: static class of processors
Extends: ImageFeatureExtractor


detrFeatureExtractor._call(images) β‡’ <code> Promise. < DetrFeatureExtractorResult > </code>

Calls the feature extraction process on an array of images, preprocesses each image, and concatenates the resulting features into a single Tensor.

Kind: instance method of DetrFeatureExtractor
Returns: Promise.<DetrFeatureExtractorResult> - An object containing the concatenated pixel values of the preprocessed images.

ParamTypeDescription
imagesArray.<RawImage>

The image(s) to extract features from.


detrFeatureExtractor.post_process_object_detection() : <code> post_process_object_detection </code>

Kind: instance method of DetrFeatureExtractor


detrFeatureExtractor.remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels) β‡’ <code> * </code>

Binarize the given masks using object_mask_threshold, it returns the associated values of masks, scores and labels.

Kind: instance method of DetrFeatureExtractor
Returns: * - The binarized masks, the scores, and the labels.

ParamTypeDescription
class_logitsTensor

The class logits.

mask_logitsTensor

The mask logits.

object_mask_thresholdnumber

A number between 0 and 1 used to binarize the masks.

num_labelsnumber

The number of labels.


detrFeatureExtractor.check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold) β‡’ <code> * </code>

Checks whether the segment is valid or not.

Kind: instance method of DetrFeatureExtractor
Returns: * - Whether the segment is valid or not, and the indices of the valid labels.

ParamTypeDefaultDescription
mask_labelsInt32Array

Labels for each pixel in the mask.

mask_probsArray.<Tensor>

Probabilities for each pixel in the masks.

knumber

The class id of the segment.

mask_thresholdnumber0.5

The mask threshold.

overlap_mask_area_thresholdnumber0.8

The overlap mask area threshold.


detrFeatureExtractor.compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size) β‡’ <code> * </code>

Computes the segments.

Kind: instance method of DetrFeatureExtractor
Returns: * - The computed segments.

ParamTypeDefaultDescription
mask_probsArray.<Tensor>

The mask probabilities.

pred_scoresArray.<number>

The predicted scores.

pred_labelsArray.<number>

The predicted labels.

mask_thresholdnumber

The mask threshold.

overlap_mask_area_thresholdnumber

The overlap mask area threshold.

label_ids_to_fuseSet.<number>

The label ids to fuse.

target_sizeArray.<number>

The target size of the image.


detrFeatureExtractor.post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) β‡’ <code> Array. < {segmentation: Tensor, segments_info: Array < {id: number, label_id: number, score: number} > } > </code>

Post-process the model output to generate the final panoptic segmentation.

Kind: instance method of DetrFeatureExtractor

ParamTypeDefaultDescription
outputs*

The model output to post process

[threshold]number0.5

The probability score threshold to keep predicted instance masks.

[mask_threshold]number0.5

Threshold to use when turning the predicted masks into binary values.

[overlap_mask_area_threshold]number0.8

The overlap mask area threshold to merge or discard small disconnected parts within each binary instance mask.

[label_ids_to_fuse]Set.<number>

The labels in this state will have all their instances be fused together.

[target_sizes]Array.<Array<number>>

The target sizes to resize the masks to.


processors.Processor ⇐ <code> Callable </code>

Represents a Processor that extracts features from an input.

Kind: static class of processors
Extends: Callable


new Processor(feature_extractor)

Creates a new Processor with the given feature extractor.

ParamTypeDescription
feature_extractorFeatureExtractor

The function used to extract features from the input.


processor._call(input, ...args) β‡’ <code> Promise. < any > </code>

Calls the feature_extractor function with the given input.

Kind: instance method of Processor
Returns: Promise.<any> - A Promise that resolves with the extracted features.

ParamTypeDescription
inputany

The input to extract features from.

...argsany

Additional arguments.


processors.WhisperProcessor ⇐ <code> Processor </code>

Represents a WhisperProcessor that extracts features from an audio input.

Kind: static class of processors
Extends: Processor


whisperProcessor._call(audio) β‡’ <code> Promise. < any > </code>

Calls the feature_extractor function with the given audio input.

Kind: instance method of WhisperProcessor
Returns: Promise.<any> - A Promise that resolves with the extracted features.

ParamTypeDescription
audioany

The audio input to extract features from.


processors.AutoProcessor

Helper class which is used to instantiate pretrained processors with the from_pretrained function. The chosen processor class is determined by the type specified in the processor config.

Example: Load a processor using from_pretrained.

let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');

Example: Run an image through a processor.

let processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);
// {
//   "pixel_values": {
//     "dims": [ 1, 3, 224, 224 ],
//     "type": "float32",
//     "data": Float32Array [ -1.558687686920166, -1.558687686920166, -1.5440893173217773, ... ],
//     "size": 150528
//   },
//   "original_sizes": [
//     [ 533, 800 ]
//   ],
//   "reshaped_input_sizes": [
//     [ 224, 224 ]
//   ]
// }

Kind: static class of processors


AutoProcessor.from_pretrained(pretrained_model_name_or_path, options) β‡’ <code> Promise. < Processor > </code>

Instantiate one of the processor classes of the library from a pretrained model.

The processor class to instantiate is selected based on the feature_extractor_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible)

Kind: static method of AutoProcessor
Returns: Promise.<Processor> - A new instance of the Processor class.

ParamTypeDescription
pretrained_model_name_or_pathstring

The name or path of the pretrained model. Can be either:

  • A string, the model id of a pretrained processor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.
  • A path to a directory containing processor files, e.g., ./my_model_directory/.
options*

Additional options for loading the processor.


processors~center_to_corners_format(arr) β‡’ <code> Array. < number > </code>

Converts bounding boxes from center format to corners format.

Kind: inner method of processors
Returns: Array.<number> - The coodinates for the top-left and bottom-right corners of the box (top_left_x, top_left_y, bottom_right_x, bottom_right_y)

ParamTypeDescription
arrArray.<number>

The coordinate for the center of the box and its width, height dimensions (center_x, center_y, width, height)


processors~enforce_size_divisibility(size, divisor) β‡’ <code> * </code>

Rounds the height and width down to the closest multiple of size_divisibility

Kind: inner method of processors
Returns: * - The rounded size.

ParamTypeDescription
size*

The size of the image

divisornumber

The divisor to use.


processors~HeightWidth : <code> * </code>

Named tuple to indicate the order we are using is (height x width), even though the Graphics’ industry standard is (width x height).

Kind: inner typedef of processors


processors~ImageFeatureExtractorResult : <code> object </code>

Kind: inner typedef of processors
Properties

NameTypeDescription
pixel_valuesTensor

The pixel values of the batched preprocessed images.

original_sizesArray.<HeightWidth>

Array of two-dimensional tuples like [[480, 640]].

reshaped_input_sizesArray.<HeightWidth>

Array of two-dimensional tuples like [[1000, 1330]].


processors~PreprocessedImage : <code> object </code>

Kind: inner typedef of processors
Properties

NameTypeDescription
original_sizeHeightWidth

The original size of the image.

reshaped_input_sizeHeightWidth

The reshaped input size of the image.

pixel_valuesTensor

The pixel values of the preprocessed image.


processors~DetrFeatureExtractorResult : <code> object </code>

Kind: inner typedef of processors
Properties

NameType
pixel_maskTensor

processors~SamImageProcessorResult : <code> object </code>

Kind: inner typedef of processors
Properties

NameType
pixel_valuesTensor
original_sizesArray.<HeightWidth>
reshaped_input_sizesArray.<HeightWidth>
[input_points]Tensor
[input_labels]Tensor

< > Update on GitHub