How to split folders with files (e.g. images) into training, validation and test (dataset) folders.

The package works for the task.

First, install the split-folders package.

pip install split-folders

The original folder should be organized as follows:

input/
    class1/
        img1.jpg
        img2.jpg
        ...
    class2/
        imgWhatever.jpg
        ...
    ...
import splitfolders

# Split with a ratio.
# To only split into training and validation set, set a tuple to `ratio`, i.e, `(.8, .2)`.
splitfolders.ratio("input_folder", output="output",
    seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False) # default values

# Split val/test with a fixed number of items, e.g. `(100, 100)`, for each set.
# To only split into training and validation set, use a single number to `fixed`, i.e., `10`.
# Set 3 values, e.g. `(300, 100, 100)`, to limit the number of training values.
splitfolders.fixed("input_folder", output="output",
    seed=1337, fixed=(100, 100), oversample=False, group_prefix=None, move=False) # default values

The output structure will be organized as follows:

output/
    train/
        class1/
            img1.jpg
            ...
        class2/
            imga.jpg
            ...
    val/
        class1/
            img2.jpg
            ...
        class2/
            imgb.jpg
            ...
    test/
        class1/
            img3.jpg
            ...
        class2/
            imgc.jpg
            ...

One response to “How to split folders with files (e.g. images) into training, validation and test (dataset) folders.”

  1. […] Note: The folder can be split into the same structure following the blog’s suggestion (How to split folders with files (e.g. images) into training, validation and test (dataset) fold… […]

Leave a Reply

%d bloggers like this: