processors
Processors are used to prepare non-textual inputs (e.g., image or audio) for a model.
Example: Using a WhisperProcessor
to prepare an audio input for a model.
import { AutoProcessor, read_audio } from '@xenova/transformers';
let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');
let audio = await read_audio('https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac', 16000);
let { input_features } = await processor(audio);
// Tensor {
// data: Float32Array(240000) [0.4752984642982483, 0.5597258806228638, 0.56434166431427, ...],
// dims: [1, 80, 3000],
// type: 'float32',
// size: 240000,
// }
- processors
- static
- .FeatureExtractor β
Callable
- .ImageFeatureExtractor β
FeatureExtractor
new ImageFeatureExtractor(config)
.thumbnail(image, size, [resample])
βPromise.<RawImage>
.crop_margin(image, gray_threshold)
βPromise.<RawImage>
.pad_image(pixelData, imgDims, padSize, options)
β*
.rescale(pixelData)
βvoid
.get_resize_output_image_size(image, size)
β*
.resize(image)
βPromise.<RawImage>
.preprocess(image, overrides)
βPromise.<PreprocessedImage>
._call(images, ...args)
βPromise.<ImageFeatureExtractorResult>
- .DetrFeatureExtractor β
ImageFeatureExtractor
._call(images)
βPromise.<DetrFeatureExtractorResult>
.post_process_object_detection()
:post_process_object_detection
.remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels)
β*
.check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold)
β*
.compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size)
β*
.post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes])
βArray.<{segmentation: Tensor, segments_info: Array<{id: number, label_id: number, score: number}>}>
- .Processor β
Callable
new Processor(feature_extractor)
._call(input, ...args)
βPromise.<any>
- .WhisperProcessor β
Processor
._call(audio)
βPromise.<any>
- .AutoProcessor
.from_pretrained(pretrained_model_name_or_path, options)
βPromise.<Processor>
- .FeatureExtractor β
- inner
~center_to_corners_format(arr)
βArray.<number>
~enforce_size_divisibility(size, divisor)
β*
~HeightWidth
:*
~ImageFeatureExtractorResult
:object
~PreprocessedImage
:object
~DetrFeatureExtractorResult
:object
~SamImageProcessorResult
:object
- static
processors.FeatureExtractor β <code> Callable </code>
Base class for feature extractors.
Kind: static class of processors
Extends: Callable
new FeatureExtractor(config)
Constructs a new FeatureExtractor instance.
Param | Type | Description |
---|---|---|
config | Object | The configuration for the feature extractor. |
processors.ImageFeatureExtractor β <code> FeatureExtractor </code>
Feature extractor for image models.
Kind: static class of processors
Extends: FeatureExtractor
- .ImageFeatureExtractor β
FeatureExtractor
new ImageFeatureExtractor(config)
.thumbnail(image, size, [resample])
βPromise.<RawImage>
.crop_margin(image, gray_threshold)
βPromise.<RawImage>
.pad_image(pixelData, imgDims, padSize, options)
β*
.rescale(pixelData)
βvoid
.get_resize_output_image_size(image, size)
β*
.resize(image)
βPromise.<RawImage>
.preprocess(image, overrides)
βPromise.<PreprocessedImage>
._call(images, ...args)
βPromise.<ImageFeatureExtractorResult>
new ImageFeatureExtractor(config)
Constructs a new ImageFeatureExtractor instance.
Param | Type | Default | Description |
---|---|---|---|
config | Object | The configuration for the feature extractor. | |
config.image_mean | Array.<number> | The mean values for image normalization. | |
config.image_std | Array.<number> | The standard deviation values for image normalization. | |
config.do_rescale | boolean | Whether to rescale the image pixel values to the [0,1] range. | |
config.rescale_factor | number | The factor to use for rescaling the image pixel values. | |
config.do_normalize | boolean | Whether to normalize the image pixel values. | |
config.do_resize | boolean | Whether to resize the image. | |
config.resample | number | What method to use for resampling. | |
config.size | number | Object | The size to resize the image to. | |
[config.do_flip_channel_order] | boolean | false | Whether to flip the color channels from RGB to BGR.
Can be overridden by the |
imageFeatureExtractor.thumbnail(image, size, [resample]) β <code> Promise. < RawImage > </code>
Resize the image to make a thumbnail. The image is resized so that no dimension is larger than any corresponding dimension of the specified size.
Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage>
- The resized image.
Param | Type | Default | Description |
---|---|---|---|
image | RawImage | The image to be resized. | |
size | Object | The size | |
[resample] | string | 0 | 1 | 2 | 3 | 4 | 5 | 2 | The resampling filter to use. |
imageFeatureExtractor.crop_margin(image, gray_threshold) β <code> Promise. < RawImage > </code>
Crops the margin of the image. Gray pixels are considered margin (i.e., pixels with a value below the threshold).
Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage>
- The cropped image.
Param | Type | Default | Description |
---|---|---|---|
image | RawImage | The image to be cropped. | |
gray_threshold | number | 200 | Value below which pixels are considered to be gray. |
imageFeatureExtractor.pad_image(pixelData, imgDims, padSize, options) β <code> * </code>
Pad the image by a certain amount.
Kind: instance method of ImageFeatureExtractor
Returns: *
- The padded pixel data and image dimensions.
Param | Type | Default | Description |
---|---|---|---|
pixelData | Float32Array | The pixel data to pad. | |
imgDims | Array.<number> | The dimensions of the image (height, width, channels). | |
padSize | * | The dimensions of the padded image. | |
options | Object | The options for padding. | |
[options.mode] | 'constant' | 'symmetric' | 'constant' | The type of padding to add. |
[options.center] | boolean | false | Whether to center the image. |
[options.constant_values] | number | 0 | The constant value to use for padding. |
imageFeatureExtractor.rescale(pixelData) β <code> void </code>
Rescale the imageβ pixel values by this.rescale_factor
.
Kind: instance method of ImageFeatureExtractor
Param | Type | Description |
---|---|---|
pixelData | Float32Array | The pixel data to rescale. |
imageFeatureExtractor.get_resize_output_image_size(image, size) β <code> * </code>
Find the target (width, height) dimension of the output image after resizing given the input image and the desired size.
Kind: instance method of ImageFeatureExtractor
Returns: *
- The target (width, height) dimension of the output image after resizing.
Param | Type | Description |
---|---|---|
image | RawImage | The image to resize. |
size | any | The size to use for resizing the image. |
imageFeatureExtractor.resize(image) β <code> Promise. < RawImage > </code>
Resizes the image.
Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage>
- The resized image.
Param | Type | Description |
---|---|---|
image | RawImage | The image to resize. |
imageFeatureExtractor.preprocess(image, overrides) β <code> Promise. < PreprocessedImage > </code>
Preprocesses the given image.
Kind: instance method of ImageFeatureExtractor
Returns: Promise.<PreprocessedImage>
- The preprocessed image.
Param | Type | Description |
---|---|---|
image | RawImage | The image to preprocess. |
overrides | Object | The overrides for the preprocessing options. |
imageFeatureExtractor._call(images, ...args) β <code> Promise. < ImageFeatureExtractorResult > </code>
Calls the feature extraction process on an array of images, preprocesses each image, and concatenates the resulting features into a single Tensor.
Kind: instance method of ImageFeatureExtractor
Returns: Promise.<ImageFeatureExtractorResult>
- An object containing the concatenated pixel values (and other metadata) of the preprocessed images.
Param | Type | Description |
---|---|---|
images | Array.<RawImage> | The image(s) to extract features from. |
...args | any | Additional arguments. |
processors.DetrFeatureExtractor β <code> ImageFeatureExtractor </code>
Detr Feature Extractor.
Kind: static class of processors
Extends: ImageFeatureExtractor
- .DetrFeatureExtractor β
ImageFeatureExtractor
._call(images)
βPromise.<DetrFeatureExtractorResult>
.post_process_object_detection()
:post_process_object_detection
.remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels)
β*
.check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold)
β*
.compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size)
β*
.post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes])
βArray.<{segmentation: Tensor, segments_info: Array<{id: number, label_id: number, score: number}>}>
detrFeatureExtractor._call(images) β <code> Promise. < DetrFeatureExtractorResult > </code>
Calls the feature extraction process on an array of images, preprocesses each image, and concatenates the resulting features into a single Tensor.
Kind: instance method of DetrFeatureExtractor
Returns: Promise.<DetrFeatureExtractorResult>
- An object containing the concatenated pixel values of the preprocessed images.
Param | Type | Description |
---|---|---|
images | Array.<RawImage> | The image(s) to extract features from. |
detrFeatureExtractor.post_process_object_detection() : <code> post_process_object_detection </code>
Kind: instance method of DetrFeatureExtractor
detrFeatureExtractor.remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels) β <code> * </code>
Binarize the given masks using object_mask_threshold
, it returns the associated values of masks
, scores
and labels
.
Kind: instance method of DetrFeatureExtractor
Returns: *
- The binarized masks, the scores, and the labels.
Param | Type | Description |
---|---|---|
class_logits | Tensor | The class logits. |
mask_logits | Tensor | The mask logits. |
object_mask_threshold | number | A number between 0 and 1 used to binarize the masks. |
num_labels | number | The number of labels. |
detrFeatureExtractor.check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold) β <code> * </code>
Checks whether the segment is valid or not.
Kind: instance method of DetrFeatureExtractor
Returns: *
- Whether the segment is valid or not, and the indices of the valid labels.
Param | Type | Default | Description |
---|---|---|---|
mask_labels | Int32Array | Labels for each pixel in the mask. | |
mask_probs | Array.<Tensor> | Probabilities for each pixel in the masks. | |
k | number | The class id of the segment. | |
mask_threshold | number | 0.5 | The mask threshold. |
overlap_mask_area_threshold | number | 0.8 | The overlap mask area threshold. |
detrFeatureExtractor.compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size) β <code> * </code>
Computes the segments.
Kind: instance method of DetrFeatureExtractor
Returns: *
- The computed segments.
Param | Type | Default | Description |
---|---|---|---|
mask_probs | Array.<Tensor> | The mask probabilities. | |
pred_scores | Array.<number> | The predicted scores. | |
pred_labels | Array.<number> | The predicted labels. | |
mask_threshold | number | The mask threshold. | |
overlap_mask_area_threshold | number | The overlap mask area threshold. | |
label_ids_to_fuse | Set.<number> |
| The label ids to fuse. |
target_size | Array.<number> |
| The target size of the image. |
detrFeatureExtractor.post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) β <code> Array. < {segmentation: Tensor, segments_info: Array < {id: number, label_id: number, score: number} > } > </code>
Post-process the model output to generate the final panoptic segmentation.
Kind: instance method of DetrFeatureExtractor
Param | Type | Default | Description |
---|---|---|---|
outputs | * | The model output to post process | |
[threshold] | number | 0.5 | The probability score threshold to keep predicted instance masks. |
[mask_threshold] | number | 0.5 | Threshold to use when turning the predicted masks into binary values. |
[overlap_mask_area_threshold] | number | 0.8 | The overlap mask area threshold to merge or discard small disconnected parts within each binary instance mask. |
[label_ids_to_fuse] | Set.<number> |
| The labels in this state will have all their instances be fused together. |
[target_sizes] | Array.<Array<number>> |
| The target sizes to resize the masks to. |
processors.Processor β <code> Callable </code>
Represents a Processor that extracts features from an input.
Kind: static class of processors
Extends: Callable
- .Processor β
Callable
new Processor(feature_extractor)
._call(input, ...args)
βPromise.<any>
new Processor(feature_extractor)
Creates a new Processor with the given feature extractor.
Param | Type | Description |
---|---|---|
feature_extractor | FeatureExtractor | The function used to extract features from the input. |
processor._call(input, ...args) β <code> Promise. < any > </code>
Calls the feature_extractor function with the given input.
Kind: instance method of Processor
Returns: Promise.<any>
- A Promise that resolves with the extracted features.
Param | Type | Description |
---|---|---|
input | any | The input to extract features from. |
...args | any | Additional arguments. |
processors.WhisperProcessor β <code> Processor </code>
Represents a WhisperProcessor that extracts features from an audio input.
Kind: static class of processors
Extends: Processor
whisperProcessor._call(audio) β <code> Promise. < any > </code>
Calls the feature_extractor function with the given audio input.
Kind: instance method of WhisperProcessor
Returns: Promise.<any>
- A Promise that resolves with the extracted features.
Param | Type | Description |
---|---|---|
audio | any | The audio input to extract features from. |
processors.AutoProcessor
Helper class which is used to instantiate pretrained processors with the from_pretrained
function.
The chosen processor class is determined by the type specified in the processor config.
Example: Load a processor using from_pretrained
.
let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');
Example: Run an image through a processor.
let processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);
// {
// "pixel_values": {
// "dims": [ 1, 3, 224, 224 ],
// "type": "float32",
// "data": Float32Array [ -1.558687686920166, -1.558687686920166, -1.5440893173217773, ... ],
// "size": 150528
// },
// "original_sizes": [
// [ 533, 800 ]
// ],
// "reshaped_input_sizes": [
// [ 224, 224 ]
// ]
// }
Kind: static class of processors
AutoProcessor.from_pretrained(pretrained_model_name_or_path, options) β <code> Promise. < Processor > </code>
Instantiate one of the processor classes of the library from a pretrained model.
The processor class to instantiate is selected based on the feature_extractor_type
property of the config object
(either passed as an argument or loaded from pretrained_model_name_or_path
if possible)
Kind: static method of AutoProcessor
Returns: Promise.<Processor>
- A new instance of the Processor class.
Param | Type | Description |
---|---|---|
pretrained_model_name_or_path | string | The name or path of the pretrained model. Can be either:
|
options | * | Additional options for loading the processor. |
processors~center_to_corners_format(arr) β <code> Array. < number > </code>
Converts bounding boxes from center format to corners format.
Kind: inner method of processors
Returns: Array.<number>
- The coodinates for the top-left and bottom-right corners of the box (top_left_x, top_left_y, bottom_right_x, bottom_right_y)
Param | Type | Description |
---|---|---|
arr | Array.<number> | The coordinate for the center of the box and its width, height dimensions (center_x, center_y, width, height) |
processors~enforce_size_divisibility(size, divisor) β <code> * </code>
Rounds the height and width down to the closest multiple of size_divisibility
Kind: inner method of processors
Returns: *
- The rounded size.
Param | Type | Description |
---|---|---|
size | * | The size of the image |
divisor | number | The divisor to use. |
processors~HeightWidth : <code> * </code>
Named tuple to indicate the order we are using is (height x width), even though the Graphicsβ industry standard is (width x height).
Kind: inner typedef of processors
processors~ImageFeatureExtractorResult : <code> object </code>
Kind: inner typedef of processors
Properties
Name | Type | Description |
---|---|---|
pixel_values | Tensor | The pixel values of the batched preprocessed images. |
original_sizes | Array.<HeightWidth> | Array of two-dimensional tuples like [[480, 640]]. |
reshaped_input_sizes | Array.<HeightWidth> | Array of two-dimensional tuples like [[1000, 1330]]. |
processors~PreprocessedImage : <code> object </code>
Kind: inner typedef of processors
Properties
Name | Type | Description |
---|---|---|
original_size | HeightWidth | The original size of the image. |
reshaped_input_size | HeightWidth | The reshaped input size of the image. |
pixel_values | Tensor | The pixel values of the preprocessed image. |
processors~DetrFeatureExtractorResult : <code> object </code>
Kind: inner typedef of processors
Properties
Name | Type |
---|---|
pixel_mask | Tensor |
processors~SamImageProcessorResult : <code> object </code>
Kind: inner typedef of processors
Properties
Name | Type |
---|---|
pixel_values | Tensor |
original_sizes | Array.<HeightWidth> |
reshaped_input_sizes | Array.<HeightWidth> |
[input_points] | Tensor |
[input_labels] | Tensor |
< > Update on GitHub