How To: Normalized Coordinates of the Bounding Box [ymin, xmin, ymax, xmax] for TFDS, TensorFlow
#tfds.features.BBoxFeature
TFDS BBoxFeature uses normalized coordinates.
What is the normalized coordinates then?
Normalized device coordinate or NDC space is a screen independent display coordinate system; it encompasses a cube where the x, y, and z components range from −1 to 1.
An example for xmin, ymin, xmax, ymax
The bounding box has the following (x, y)
coordinates of its corners: top-left is (x_min, y_min)
or (98px, 345px)
, top-right is (x_max, y_min)
or (420px, 345px)
, bottom-left is (x_min, y_max)
or (98px, 462px)
, bottom-right is (x_max, y_max)
or (420px, 462px)
. As you see, coordinates of the bounding box's corners are calculated with respect to the top-left corner of the image which has (x, y)
coordinates (0, 0)
.
pascal_voc
pascal_voc
is a format used by the Pascal VOC dataset. Coordinates of a bounding box are encoded with four values in pixels: [x_min, y_min, x_max, y_max]
. x_min
and y_min
are coordinates of the top-left corner of the bounding box. x_max
and y_max
are coordinates of bottom-right corner of the bounding box.
Coordinates of the example bounding box in this format are [98, 345, 420, 462]
.
albumentations
albumentations
is similar to pascal_voc
, because it also uses four values [x_min, y_min, x_max, y_max]
to represent a bounding box. But unlike pascal_voc
, albumentations
uses normalized values. To normalize values, we divide coordinates in pixels for the x- and y-axis by the width and the height of the image.
Coordinates of the example bounding box in this format are [98 / 640, 345 / 480, 420 / 640, 462 / 480]
which are [0.153125, 0.71875, 0.65625, 0.9625]
.
Albumentations uses this format internally to work with bounding boxes and augment them.