New VOC Dataset for Anno-Robot, Step by Step

How to create a new dataset for Anno-Robot

Clone TF dataset source code, no worry, we are not going to modify it, just to copy

$ git clone

Using tfds to create a new empty dataset template.

$ tfds new anno_dataset

Copy to replace

$ cd anno_dataset
$ cp ../datasets/tensorflow_datasets/object_detection/ ./

In, modify line as:

comment first 2 VocConfig of tree VocConfig, add the last one …year=”2022". Actually, only the filenames={} matters.

In docker, start http server (default port=8000):

$ cd /
$ python3 -m http.server

To test the dataset:

  • for the 1st time build your dataset and it was never built successfully:
# cd anno_dataset
# tfds build
# tfds build --register_checksums
  • for the ≥2nd time:
# tfds build --overwrite
# tfds build --register_checksums

Please note: you will fake-pass the “tfds build” if you don’t do it with — overwrite

How to make a smaller dataset for Pascal format

`-- VOCdevkit
`-- VOC2012
|-- Annotations
|-- Annotations1
|-- ImageSets
| |-- Action
| |-- Layout
| |-- Main
| `-- Segmentation
|-- JPEGImages
|-- JPEGImages1
|-- SegmentationClass
`-- SegmentationObject

It will split treeh possible dataset when running tfds build — register_checksums by following files definition:

|--train.txt, test.txt, val.txt

make the train.txt and val.txt from ‘/mnt/anno_dataset/data/tmp_test/train/VOCdevkit/VOC2012/JPEGImages’

ImageSets/Main# python3
# cp train.txt val.txt

make new tar:

$ cd /mnt/anno_dataset/data/tmp_test/train
$ tar -cvf VOCdevkit.tar ./VOCdevkit

How to modify code to point a new dataset For Anno-Robot

What files are needed:


What file to modify (

Register above files to TFDS

# tfds build --overwrite
# tfds build --register_checksums

To train the model



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Paul Xiong

Paul Xiong

Coding, implementing, optimizing ML annotation with self-supervised learning, TLDR: doctor’s labeling is the 1st priority for our Cervical AI project.