How to Convert VOC Dataset for Yolo5

a step by step introduction with Cervical cells.

Our cervical cell dataset is Pascal VOC format. Yolo doesn't support Yolo5, we have to convert the dataset to Yolo5 format from Pascal VOC:

  1. Let's download the covert tools code from Github, the Pascal VOC dataset directories tree will be looked like:

xml: labels
img: images
file list: name.txt.

  • Each image will have its original img file, like .tiff, . png, .jpg, etc.
  • Each image will also have the labels file, which has the same file name except comes with different suffixes.
  • The class file includes:
    1) class name,
    2) the type value counted from 0, 1, 2…

2. Run the following command to convert VOC to YOLO5:

After running the above command, the ./output folder will have the converted labels:

PS: if you are not familiar with PASCAL VOC format, above converting is based upon names.txt, here we have chosen train.txt:

3. Let’s put a new directory for Yolo5 now, copy 2000 annotated files (you can choose more, but don’t choose too less, I used 400 files, it causes training model error as no correct recognition:

4. And the data.yaml describes what directories will be included:

5. the last step is to make sure the image file and label file is synced, others will cause Yolo training error. running the following file in Virtual Studio.

Now the Yolo5-training-ready director will be looked like:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Paul Xiong

Coding, implementing, optimizing ML annotation with self-supervised learning, TLDR: doctor’s labeling is the 1st priority for our Cervical AI project.