How to do video data augmentation using the Python VIDAUG library

6 min readJul 30, 2023

Video data is becoming increasingly important in various applications such as action recognition, object detection, and video classification. However, one of the major challenges of working with video data is the scarcity of labeled data. Data augmentation is a technique used to create more training data by generating new samples from existing ones, and it has been proven to be effective in improving the performance of deep learning models. In this blog post, we will explore how to perform video data augmentation using the VIDAUG library in Python. VIDAUG is a powerful and easy-to-use library that offers various types of augmentations to create new training data for video-based machine learning tasks. By the end of this tutorial, you will be equipped with the necessary skills to perform video data augmentation using VIDAUG and enhance the performance of your machine learning models.

VIDAUG Library

VIDAUG is an open-source Python library that provides a simple and easy-to-use interface for augmenting videos. It offers a wide range of video augmentation techniques that can be used to generate more training data, improve model performance, and reduce overfitting. VIDAUG can work with various video file formats such as MP4, AVI, and MOV, making it a versatile choice for video data augmentation.

One of the key features of VIDAUG is its ability to perform various types of augmentations, such as flipping, cropping, and resizing. For example, flipping can be used to generate horizontally or vertically mirrored versions of videos, while cropping can be used to extract a specific region of interest from a video. Resizing, on the other hand, can be used to change the resolution of a video to match the requirements of the task or model.

In addition to these basic augmentations, VIDAUG also offers more advanced techniques such as rotation, translation, and color space manipulation. Rotation can be used to rotate the video by a certain degree, while translation can be used to shift the video frame horizontally or vertically. Color space manipulation can be used to change the color of the video or adjust the brightness and contrast.

Overall, VIDAUG is a powerful and versatile library that can be used to perform a wide range of video data augmentations. In the next section, we will demonstrate how to install and use VIDAUG for video data augmentation in Python.

how to install and use VIDAUG

You can refer to the README.md file of the VIDAUG GitHub repository for more details.

Before we can start using VIDAUG for video data augmentation, we first need to install it.

You need to have the following Python packages:

NumPy
PIL
scipy
skimage
OpenCV

To install VIDAUG, we can use the following pip command:

pip install git+https://github.com/okankop/vidaug

or, the repository can be downloaded via

git clone https://github.com/okankop/vidaug

and installed by using

python setup.py sdist && pip install dist/vidaug-0.1.tar.gz.

Once we have installed VIDAUG, we can start using it to perform video data augmentation. Let’s take a look at an example of how to perform flipping augmentation using VIDAUG.

import os
import cv2
import numpy as np
import vidaug.augmentors as va

# Define the paths to the input and output video directories
input_dir = './videos/input/'
output_dir = './videos/augmented/'

# Loop over each video file in the input directory
for file in os.listdir(os.path.join(input_dir,action)):
    video_path = os.path.join(input_dir,action, file)
    
    # Load the video file
    cap = cv2.VideoCapture(video_path)

    # Define the output video writer
    # Meta.
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    frame_size = (width, height)
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out_path = os.path.join(output_dir,file)
    # Video writer.
    out = cv2.VideoWriter(out_path, fourcc, fps, frame_size)

    # collect all frames of the video
    frames = []
    while(cap.isOpened()):
        ret, frame = cap.read()
        if ret == True:
            frames.append(frame)
        else:
            break
    cap.release()

    # Apply the video augmentation pipeline to each frame of the video
    sometimes = lambda aug: va.Sometimes(0.5, aug) # Used to apply augmentor with 50% probability
    seq = va.Sequential([sometimes(va.InvertColor()),sometimes(va.Salt()),sometimes(va.Pepper())])
    #augment the frames
    video_aug = seq(frames)
    
    # output the video
    for frame in video_aug:
        out.write(frame)
    out.release()

This code demonstrates how to use the VIDAUG library for video data augmentation and store them as separate files. The input and output directories are defined at the beginning of the script, and the code then loops over each video file in the input directory. The video file is loaded using the OpenCV library, and an output video writer is defined. The properties of the input video file are retrieved and used to set the output video properties.

Next, the frames of the video are collected and stored in an array. The VIDAUG pipeline is then defined, which consists of three augmentors: InvertColor, Salt, and Pepper. These augmentors are applied to each frame of the video using the VIDAUG Sequential augmentor. The Sequential augmentor allows multiple augmentors to be applied sequentially in a pipeline.

Finally, the augmented frames are written to the output video file using the output video writer. The video writer writes each frame to the output file and releases the writer when all frames have been processed.

This code can be modified to include different types of augmentors, change the probability of applying the augmentors, or apply the augmentors to specific frames of the video.

In the next section, we will provide some tips and best practices for using VIDAUG effectively.

Tips and Best Practices

While VIDAUG provides a powerful set of tools for augmenting video data, it’s important to use these tools effectively to get the best results. Here are some tips and best practices to keep in mind when using VIDAUG for video data augmentation:

Choose appropriate augmentations based on the task :
Different tasks may require different types of augmentations. For example, if you’re working on action recognition, you may want to apply temporal augmentations such as frame sampling or time warping. On the other hand, if you’re working on object detection, you may want to apply spatial augmentations such as cropping and resizing. It’s important to choose augmentations that are relevant to your task and can help improve model performance.
Balance the amount of augmentation with the amount of training data available :
While augmenting data can help improve model performance, it’s important to balance the amount of augmentation with the amount of training data available. If you have a small amount of training data, you may want to use more augmentations to create more diverse samples. However, if you have a large amount of training data, you may want to use fewer augmentations to avoid overfitting.
Ensure that the augmented data is representative of the real-world data :
When augmenting data, it’s important to ensure that the augmented data is representative of real-world data. For example, if you’re applying color space manipulation, you may want to ensure that the colors are still recognizable and don’t distort the objects in the video. It’s important to keep the original context of the video intact while applying augmentations.
Evaluate the effectiveness of the augmentations :
Finally, it’s important to evaluate the effectiveness of the augmentations you apply. You can do this by comparing the performance of the model trained on the augmented data with the performance of the model trained on the original data. This can help you determine which augmentations are effective and which ones are not and can help you fine-tune your augmentation pipeline for better results.

By keeping these tips and best practices in mind, you can use VIDAUG to create more diverse and representative training data for your machine learning models, and improve their performance on video-based tasks.

In conclusion, video data augmentation using VIDAUG is a powerful technique for improving the performance of machine learning models on video-based tasks. In this blog post, we’ve explored what VIDAUG is, how to install and use it for video data augmentation, and some tips and best practices for using VIDAUG effectively.

By choosing appropriate augmentations based on the task, balancing the amount of augmentation with the amount of training data available, ensuring that the augmented data is representative of the real-world data, and evaluating the effectiveness of the augmentations, we can create more diverse and representative training data for our models, and improve their performance on video-based tasks.

How to do video data augmentation using the Python VIDAUG library

VIDAUG Library

how to install and use VIDAUG

Tips and Best Practices

Written by Michelle Fernando

No responses yet