# torchvideo.datasets¶

## Datasets¶

### VideoDataset¶

class torchvideo.datasets.VideoDataset(root_path, label_set=None, sampler=FullVideoSampler(), transform=None)[source]

Bases: torch.utils.data.dataset.Dataset

Abstract base class that all VideoDatasets inherit from. If you are implementing your own VideoDataset, you should inherit from this class.

Parameters
__getitem__(index)[source]

Parameters

index (int) – index of the example within the dataset.

Return type
Returns

Example transformed by transform if one was passed during instantiation, otherwise the example is converted to a tensor without any transformations applied to it. Additionally, if a label set is present, the method return a tuple: (video_tensor, label)

__len__()[source]

Total number of examples in the dataset

Return type

int

labels = None

The labels corresponding to the examples in the dataset. To get the label for example at index i you simple call dataset.labels[i], although this will be returned by __getitem__ if this field is not None.

### ImageFolderVideoDataset¶

class torchvideo.datasets.ImageFolderVideoDataset(root_path, filename_template, filter=None, label_set=None, sampler=FullVideoSampler(), transform=None, frame_counter=None)[source]

Bases: torchvideo.datasets.video_dataset.VideoDataset

Dataset stored as a folder containing folders of images, where each folder represents a video.

The folder hierarchy should look something like this:

root/video1/frame_000001.jpg
root/video1/frame_000002.jpg
root/video1/frame_000003.jpg
...

root/video2/frame_000001.jpg
root/video2/frame_000002.jpg
root/video2/frame_000003.jpg
root/video2/frame_000004.jpg
...

Parameters
__getitem__(index)[source]

Parameters

index (int) – index of the example within the dataset.

Return type
Returns

Example transformed by transform if one was passed during instantiation, otherwise the example is converted to a tensor without any transformations applied to it. Additionally, if a label set is present, the method return a tuple: (video_tensor, label)

__len__()[source]

Total number of examples in the dataset

Return type

int

### VideoFolderDataset¶

class torchvideo.datasets.VideoFolderDataset(root_path, filter=None, label_set=None, sampler=FullVideoSampler(), transform=None, frame_counter=None)[source]

Bases: torchvideo.datasets.video_dataset.VideoDataset

Dataset stored as a folder of videos, where each video is a single example in the dataset.

The folder hierarchy should look something like this:

root/video1.mp4
root/video2.mp4
...

Parameters
__getitem__(index)[source]

Parameters

index (int) – index of the example within the dataset.

Return type
Returns

Example transformed by transform if one was passed during instantiation, otherwise the example is converted to a tensor without any transformations applied to it. Additionally, if a label set is present, the method return a tuple: (video_tensor, label)

__len__()[source]

Total number of examples in the dataset

### GulpVideoDataset¶

class torchvideo.datasets.GulpVideoDataset(root_path, *, gulp_directory=None, filter=None, label_field=None, label_set=None, sampler=FullVideoSampler(), transform=None)[source]

Bases: torchvideo.datasets.video_dataset.VideoDataset

GulpIO Video dataset.

The folder hierarchy should look something like this:

root/data_0.gulp
root/data_1.gulp
...

root/meta_0.gulp
root/meta_1.gulp
...

Parameters
__getitem__(index)[source]

Parameters

index – index of the example within the dataset.

Return type
Returns

Example transformed by transform if one was passed during instantiation, otherwise the example is converted to a tensor without any transformations applied to it. Additionally, if a label set is present, the method return a tuple: (video_tensor, label)

__len__()[source]

Total number of examples in the dataset

## Label Sets¶

Label sets are an abstraction over how your video data is labelled. This provides flexibility in swapping out different storage methods and labelling methods. All datasets optionally take a LabelSet that performs the mapping between example and label.

### LabelSet¶

class torchvideo.datasets.LabelSet[source]

Bases: abc.ABC

Abstract base class that all LabelSets inherit from

If you are implementing your own LabelSet, you should inherit from this class.

__getitem__(video_name)[source]
Parameters

video_name (str) – The filename or id of the video

Return type

Any

Returns

The corresponding label

### DummyLabelSet¶

class torchvideo.datasets.DummyLabelSet(label=0)[source]

Bases: torchvideo.datasets.label_sets.label_set.LabelSet

A dummy label set that returns the same label regardless of video

Parameters

label (Any) – The label given to any video

__getitem__(video_name)[source]
Parameters

video_name – The filename or id of the video

Return type

Any

Returns

The corresponding label

### GulpLabelSet¶

class torchvideo.datasets.GulpLabelSet(merged_meta_dict, label_field='label')[source]

Bases: torchvideo.datasets.label_sets.label_set.LabelSet

LabelSet for GulpIO datasets where the label is contained within the metadata of the gulp directory. Assuming you’ve written the label of each video to a field called 'label' in the metadata you can create a LabelSet like: GulpLabelSet(gulp_dir.merged_meta_dict, label_field='label')

__getitem__(video_name)[source]
Parameters

video_name (str) – The filename or id of the video

Return type

Any

Returns

The corresponding label

### CsvLabelSet¶

class torchvideo.datasets.CsvLabelSet(df, col='label')[source]

Bases: torchvideo.datasets.label_sets.label_set.LabelSet

LabelSet for a pandas DataFrame or Series. The index of the DataFrame/Series is assumed to be the set of video names and the values in a series the label. For a dataframe the field kwarg specifies which field to use as the label

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({'video': ['video1', 'video2'],
...                    'label': [1, 2]}).set_index('video')
>>> label_set = CsvLabelSet(df, col='label')
>>> label_set['video1']
1

Parameters
• df – pandas DataFrame or Series containing video names/ids and their corresponding labels.

• col (Optional[str]) – The column to read the label from when df is a DataFrame.

__getitem__(video_name)[source]
Parameters

video_name (str) – The filename or id of the video

Return type

Any

Returns

The corresponding label