New VOC Dataset for Anno-Robot, Step by Step
--
How to create a new dataset for Anno-Robot
Clone TF dataset source code, no worry, we are not going to modify it, just to copy voc.py.
$ git clone https://github.com/tensorflow/datasets.git
Using tfds to create a new empty dataset template.
$ tfds new anno_dataset
Copy voc.py to replace my_dataset.py
$ cd anno_dataset
$ cp ../datasets/tensorflow_datasets/object_detection/voc.py ./anno_dataset.py
In anno_dataset.py, modify line as:
comment first 2 VocConfig of tree VocConfig, add the last one …year=”2022". Actually, only the filenames={} matters.
In docker, start http server (default port=8000):
$ cd /
$ python3 -m http.server
To test the dataset:
- for the 1st time build your dataset and it was never built successfully:
# cd anno_dataset
# tfds build
# tfds build --register_checksums
- for the ≥2nd time:
# tfds build --overwrite
# tfds build --register_checksums
Please note: you will fake-pass the “tfds build” if you don’t do it with — overwrite
How to make a smaller dataset for Pascal format
`-- VOCdevkit
`-- VOC2012
|-- Annotations
|-- Annotations1
|-- ImageSets
| |-- Action
| |-- Layout
| |-- Main
| `-- Segmentation
|-- JPEGImages
|-- JPEGImages1
|-- SegmentationClass
`-- SegmentationObject
It will split treeh possible dataset when running tfds build — register_checksums by following files definition:
Main
|--train.txt, test.txt, val.txt
make the train.txt and val.txt from ‘/mnt/anno_dataset/data/tmp_test/train/VOCdevkit/VOC2012/JPEGImages’
ImageSets/Main# python3 run_make_traintxt.py
# cp train.txt val.txt
make new tar:
$ cd /mnt/anno_dataset/data/tmp_test/train
$ tar -cvf VOCdevkit.tar ./VOCdevkit
How to modify code to point a new dataset For Anno-Robot
What files are needed:
VOCOtrain.tarVOCOtest.tarconfig.json
What file to modify (anno_dataset.py)
Register above files to TFDS
# tfds build --overwrite
# tfds build --register_checksums