How to Customize CIFAR-10 with TensorFlow
A lot of open source project is using open datasets like CIFAR-10, CIFAR-100… to modify these projects, you may need to customize datasets.
Introduction
Through Tensorflow provides documents to support customizing datasets, there is no easy and complete example. Here I will record my instructions as a step-by-step practice example.
Generating a customizing code pattern
> git clone https://github.com/tensorflow/datasets.git
> cd datasets
> python tensorflow_datasets/scripts/create_new_dataset.py --dataset mycervical --type image
Dataset generated in /usr/local/lib/python3.5/dist-packages/tensorflow_datasets
mycervical is my dataset’s name, We can start with searching TODO(mycervical) in the directory : /usr/local/lib/python3.5/dist-packages/tensorflow_datasets
>/usr/local/lib/python3.5/dist-packages/tensorflow_datasets# find . -type f |grep mycervical
./image/mycervical.py>vim /usr/local/lib/python3.5/dist-packages/tensorflow_datasets/image/mycervical.py
Use local CIFAR-10 instead of downloading from cloud
Method 1: via local webserver
Method2: via local folder
Per the document of Tensorflow we only need to modify mycervical.py by finding ToDo(“mycervical”). We can copy most code from Cifar10.py to mycervical.py. Somehow, it will be running into the following error:
...File "/Users/boxiong/opt/anaconda3/lib/python3.7/site-packages/tensorflow_datasets/core/features/class_label_feature.py", line 148, in encode_example(example_data, self._num_classes))ValueError: Class label 246 greater than configured num_classes 10
After debugging, we need to manually add dataset_info.json to data_dir(defined in mycervical.py, VERSION number will be the directory):
More details about how the cifar-10 being copied and where it is:
The tensorflow_dataset module will first look at
if 3.0.0(it is defined in mycervical.py) does not exist, it will look at:
for cifar-10-batches-bin directory.