How to Customize CIFAR-10 with TensorFlow

Paul Xiong
2 min readMar 25, 2020

A lot of open source project is using open datasets like CIFAR-10, CIFAR-100… to modify these projects, you may need to customize datasets.

Introduction

Through Tensorflow provides documents to support customizing datasets, there is no easy and complete example. Here I will record my instructions as a step-by-step practice example.

Generating a customizing code pattern

> git clone https://github.com/tensorflow/datasets.git 
> cd datasets
> python tensorflow_datasets/scripts/create_new_dataset.py --dataset mycervical --type image

Dataset generated in /usr/local/lib/python3.5/dist-packages/tensorflow_datasets

mycervical is my dataset’s name, We can start with searching TODO(mycervical) in the directory : /usr/local/lib/python3.5/dist-packages/tensorflow_datasets

>/usr/local/lib/python3.5/dist-packages/tensorflow_datasets# find . -type f  |grep mycervical
./image/mycervical.py
>vim /usr/local/lib/python3.5/dist-packages/tensorflow_datasets/image/mycervical.py
In mycervical.py, copied most classes from cifar.py. Need to modify the above value if your data-size is different from cifar10.

Use local CIFAR-10 instead of downloading from cloud

Method 1: via local webserver

Method2: via local folder

Per the document of Tensorflow we only need to modify mycervical.py by finding ToDo(“mycervical”). We can copy most code from Cifar10.py to mycervical.py. Somehow, it will be running into the following error:

...File "/Users/boxiong/opt/anaconda3/lib/python3.7/site-packages/tensorflow_datasets/core/features/class_label_feature.py", line 148, in encode_example(example_data, self._num_classes))ValueError: Class label 246 greater than configured num_classes 10

After debugging, we need to manually add dataset_info.json to data_dir(defined in mycervical.py, VERSION number will be the directory):

More details about how the cifar-10 being copied and where it is:

The tensorflow_dataset module will first look at

if 3.0.0(it is defined in mycervical.py) does not exist, it will look at:

for cifar-10-batches-bin directory.

--

--

Paul Xiong

Coding, implementing, optimizing ML annotation with self-supervised learning, TLDR: doctor’s labeling is the 1st priority for our Cervical AI project.