Benchmarks module

This module provides popular continual learning benchmarks and generic facilities to build custom benchmarks.
  • Popular benchmarks (like SplitMNIST, PermutedMNIST, SplitCIFAR, …) are contained in the classic sub-module.

  • Dataset implementations are available in the datasets sub-module.

  • One can create new benchmarks by using the utilities found in the generators sub-module.

  • Avalanche uses custom dataset and dataloader implementations contained in the utils sub-module. More info can be found in this couple of How-Tos here and here.


Continual Learning Scenarios

Generic definitions for scenarios, streams and experiences. All the continual learning benchmarks are specific instantiations of these concepts.



Continual Learning benchmark.

OnlineCLScenario(original_streams[, ...])

ExModelCLScenario(original_benchmark, ...)

Ex-Model CL Scenario.

NCScenario(train_dataset, test_dataset, ...)

This class defines a "New Classes" scenario.

NIScenario(train_dataset, test_dataset, ...)

This class defines a "New Instance" scenario.


Helper to obtain a benchmark with a validation stream.


CLStream(name, exps_iter[, benchmark, ...])

A CL stream is a named iterator of experiences.

EagerCLStream(name, exps[, benchmark, ...])

A CL stream build from a pre-initialized list of experience.

ClassificationStream(name, benchmark, *[, ...])


CLExperience(current_experience, origin_stream)

Base Experience.

ClassificationExperience(origin_stream, ...)

Definition of a learning experience based on a GenericCLScenario instance.

NCExperience(origin_stream, current_experience)

Defines a "New Classes" experience.

NIExperience(origin_stream, current_experience)

Defines a "New Instances" experience.

OnlineCLExperience(*, dataset, origin_experience)

Online CL (OCL) Experience.

ExModelExperience(expert_model, ...[, ...])

Ex-Model CL Experience.

ExperienceAttribute(value[, use_in_train, ...])

Experience attributes are used to define data belonging to an experience which may only be available at train or eval time.

Classic Benchmarks

The classic benchmarks sub-module covers all mainstream benchmarks. Expect this list to grow over time!
CORe50-based benchmarks

Benchmarks based on the CORe50 dataset.

CORe50(*[, scenario, run, object_lvl, mini, ...])

Creates a CL benchmark for CORe50.

CIFAR-based benchmarks

Benchmarks based on the CIFAR-10 and CIFAR-100 datasets.

SplitCIFAR10(n_experiences, *[, ...])

Creates a CL benchmark using the CIFAR10 dataset.

SplitCIFAR100(n_experiences, *[, ...])

Creates a CL benchmark using the CIFAR100 dataset.

SplitCIFAR110(n_experiences, *[, seed, ...])

Creates a CL benchmark using both the CIFAR100 and CIFAR10 datasets.

CUB200-based benchmarks

Benchmarks based on the Caltech-UCSD Birds 200 dataset.

SplitCUB200([n_experiences, ...])

Creates a CL benchmark using the Cub-200 dataset.

EndlessCLSim-based benchmarks

Benchmarks based on the EndlessCLSim derived datasets.

EndlessCLSim(*[, scenario, patch_size, ...])

Creates a CL scenario for the Endless-Continual-Learning Simulator's derived datasets, or custom datasets created from the Endless-Continual-Learning-Simulator's `standalone application <>`__.

FashionMNIST-based benchmarks

Benchmarks based on the Fashion MNIST dataset.

SplitFMNIST(n_experiences, *[, ...])

Creates a CL benchmark using the Fashion MNIST dataset.

ImageNet-based benchmarks

Benchmarks based on the ImageNet ILSVRC-2012 dataset.

SplitImageNet(dataset_root, *[, ...])

Creates a CL benchmark using the ImageNet dataset.

SplitTinyImageNet([n_experiences, ...])

Creates a CL benchmark using the Tiny ImageNet dataset.

iNaturalist-based benchmarks

Benchmarks based on the iNaturalist-2018 dataset.

SplitInaturalist(*[, super_categories, ...])

Creates a CL benchmark using the iNaturalist2018 dataset.

MNIST-based benchmarks

Benchmarks based on the MNIST dataset.

SplitMNIST(n_experiences, *[, ...])

Creates a CL benchmark using the MNIST dataset.

PermutedMNIST(n_experiences, *[, ...])

Creates a Permuted MNIST benchmark.

RotatedMNIST(n_experiences, *[, ...])

Creates a Rotated MNIST benchmark.

Omniglot-based benchmarks

Benchmarks based on the Omniglot dataset.

SplitOmniglot(n_experiences, *[, ...])

Creates a CL benchmark using the OMNIGLOT dataset.

OpenLORIS-based benchmarks

Benchmarks based on the OpenLORIS dataset.

OpenLORIS(*[, factor, train_transform, ...])

Creates a CL benchmark for OpenLORIS.

Stream51-based benchmarks

Benchmarks based on the Stream-51, dataset.

CLStream51(*[, scenario, seed, eval_num, ...])

Creates a CL benchmark for Stream-51.

CLEAR-based benchmarks

Benchmarks based on the CLEAR dataset.

CLEAR(*[, data_name, evaluation_protocol, ...])

Creates a Domain-Incremental benchmark for CLEAR 10 & 100 with 10 & 100 illustrative classes and an n+1 th background class.

Ex-Model benchmarks

Benchmarks for learning from pretrained models or multi-agent continual learning scenarios. Based on the Ex-Model paper. Pretrained models are downloaded automatically.

ExMLMNIST([scenario, run_id])

ExML scenario on MNIST data.

ExMLCoRE50([scenario, run_id])

ExML scenario on CoRE50.

ExMLCIFAR10([scenario, run_id])

ExML scenario on CIFAR10.


The datasets sub-module provides PyTorch dataset implementations for datasets missing from the torchvision/audio/* libraries. These datasets can also be used in a standalone way!

CORe50Dataset(root, *[, train, transform, ...])

CORe50 Pytorch Dataset

CUB200(root, *[, train, transform, ...])

Basic CUB200 PathsDataset to be used as a standard PyTorch Dataset.

EndlessCLSimDataset([root, scenario, ...])

Endless Continual Leanring Simulator Dataset

INATURALIST2018([root, split, transform, ...])

INATURALIST Pytorch Dataset

MiniImageNetDataset(imagenet_path, split, , ...)

The MiniImageNet dataset.

Omniglot(root[, train, transform, ...])

Custom class used to adapt Omniglot (from Torchvision) and make it compatible with the Avalanche API.

OpenLORIS(root, *[, train, transform, ...])

OpenLORIS Pytorch Dataset

Stream51(root, *[, train, transform, ...])

Stream-51 Pytorch Dataset

TinyImagenet(root, *, train[, transform, ...])

Tiny Imagenet Pytorch Dataset

CLEARDataset([root, data_name, download, ...])

CLEAR Base Dataset for downloading / loading metadata

Datasets of audio sequences from TorchAudio.

torchaudio_wrapper.SpeechCommands([root, ...])

root: dataset root location url: version name of the dataset download: automatically download the dataset, if not present subset: one of 'training', 'validation', 'testing' mfcc_preprocessing: an optional torchaudio.transforms.MFCC instance to preprocess each audio. Warning: this may slow down the execution since preprocessing is applied on-the-fly each time a sample is retrieved from the dataset.

Benchmark Generators

The generators sub-module provides a lot of functions that can be used to create a new benchmark.
This set of functions tries to cover most common use cases (Class/Task-Incremental, Domain-Incremental, …) but it also allows for the creation of entirely custom benchmarks from AvalancheDatasets.


Creates a benchmark given a list of datasets for each stream.

class_incremental_benchmark(datasets_dict, *)

Splits datasets according to a class-incremental scenario.

new_instances_benchmark(train_dataset, ...)

Benchmark generator for "New Instances" (NI) scenarios.

task_incremental_benchmark(bm[, ...])

Creates a task-incremental benchmark from a dataset scenario.

If you want to add attributes to experiences (such as classes_in_this_experiences or task_labels) you can use the generic decorators:


Add ClassesTimeline attributes.


Add TaskAware attributes.

Online streams where experiences are made of small minibatches:

split_online_stream(original_stream, ...[, ...])

Split a stream of large batches to create an online stream of small mini-batches.


Creates a stream of sub-experiences from a list of overlapped

Train/Validation splits for streams:


Helper to obtain a benchmark with a validation stream.


Class-balanced dataset split.

split_validation_random(validation_size, shuffle)

Splits an AvalancheDataset in two splits.

Utils (Data Loading and AvalancheDataset)

The custom dataset and dataloader implementations contained in this sub-module are described in more detailed in the How-Tos about “data loading and replay” <> and “Avalanche Dataset” <>.

TaskBalancedDataLoader(data[, batch_size, ...])

Task-balanced data loader for Avalanche's datasets.

GroupBalancedDataLoader(datasets[, ...])

Data loader that balances data from multiple datasets.

ReplayDataLoader(data[, memory, ...])

Custom data loader for rehearsal/replay strategies.

GroupBalancedInfiniteDataLoader(datasets[, ...])

Data loader that balances data from multiple datasets emitting an infinite stream.

AvalancheDataset(datasets, *[, indices, ...])

Avalanche Dataset.

make_avalanche_dataset(dataset, *[, ...])

Avalanche Dataset.


A lazy mapping for <task-label -> task dataset>.

DataAttribute(data, name[, use_in_getitem])

Data attributes manage sample-wise information such as task or class labels.