How To: Normalized Coordinates of the Bounding Box [ymin, xmin, ymax, xmax] for TFDS, TensorFlow

Paul Xiong
2 min readSep 29, 2022

#tfds.features.BBoxFeature

TFDS BBoxFeature uses normalized coordinates.

What is the normalized coordinates then?

Normalized device coordinate or NDC space is a screen independent display coordinate system; it encompasses a cube where the x, y, and z components range from −1 to 1.

An example for xmin, ymin, xmax, ymax

The bounding box has the following (x, y) coordinates of its corners: top-left is (x_min, y_min) or (98px, 345px), top-right is (x_max, y_min) or (420px, 345px), bottom-left is (x_min, y_max) or (98px, 462px), bottom-right is (x_max, y_max) or (420px, 462px). As you see, coordinates of the bounding box's corners are calculated with respect to the top-left corner of the image which has (x, y) coordinates (0, 0).

pascal_voc

pascal_voc is a format used by the Pascal VOC dataset. Coordinates of a bounding box are encoded with four values in pixels: [x_min, y_min, x_max, y_max]. x_min and y_min are coordinates of the top-left corner of the bounding box. x_max and y_max are coordinates of bottom-right corner of the bounding box.

Coordinates of the example bounding box in this format are [98, 345, 420, 462].

albumentations

albumentations is similar to pascal_voc, because it also uses four values [x_min, y_min, x_max, y_max] to represent a bounding box. But unlike pascal_voc, albumentations uses normalized values. To normalize values, we divide coordinates in pixels for the x- and y-axis by the width and the height of the image.

Coordinates of the example bounding box in this format are [98 / 640, 345 / 480, 420 / 640, 462 / 480] which are [0.153125, 0.71875, 0.65625, 0.9625].

Albumentations uses this format internally to work with bounding boxes and augment them.

--

--

Paul Xiong
Paul Xiong

Written by Paul Xiong

Predicting the next word (token) is what powers ChatGPT, while predicting the next photo (embedding) forms the foundation of ImageGPT.