Training module


Training Templates

Templates define the training/eval loop for each setting (supervised CL, online CL, RL, …). Each template supports a set of callback that can be used by a plugin to execute code inside the training/eval loops.


Templates are defined in the module.

BaseTemplate(*, model[, device, plugins])

Base class for continual learning skeletons.

BaseSGDTemplate(*, model, optimizer, ...[, ...])

Base SGD class for continual learning skeletons.

SupervisedTemplate(*, model, optimizer, ...)

Base class for continual learning strategies.

Plugins ABCs

ABCs for plugins are available in avalanche.core.


ABC for BaseTemplate plugins.


ABC for BaseSGDTemplate plugins.


ABC for SupervisedTemplate plugins.

Training Strategies

Ready-to-use continual learning strategies.

Cumulative(*, model, optimizer, criterion, ...)

Cumulative training strategy.

JointTraining(*, model, optimizer, ...[, ...])

Joint training on the entire stream.

Naive(*, model, optimizer, criterion, ...[, ...])

Naive finetuning.

AR1(*, criterion, ~torch.Tensor], ...[, ...])

AR1 with Latent Replay.

StreamingLDA(*, slda_model, criterion, ...)

Deep Streaming Linear Discriminant Analysis.

ICaRL(*, feature_extractor, classifier, ...)

iCaRL Strategy.

PNNStrategy(*, model, optimizer[, ...])

Progressive Neural Network strategy.

CWRStar(*, model, optimizer, criterion, ...)

CWR* Strategy.

Replay(*, model, optimizer, criterion, ...)

Experience replay strategy.

GSS_greedy(*, model, optimizer, criterion, ...)

Experience replay strategy.

GDumb(*, model, optimizer, criterion, ...[, ...])

GDumb strategy.

LwF(*, model, optimizer, criterion, ...[, ...])

Learning without Forgetting (LwF) strategy.

AGEM(*, model, optimizer, criterion, ...[, ...])

Average Gradient Episodic Memory (A-GEM) strategy.

GEM(*, model, optimizer, criterion, ...[, ...])

Gradient Episodic Memory (GEM) strategy.

EWC(*, model, optimizer, criterion, ...[, ...])

Elastic Weight Consolidation (EWC) strategy.

SynapticIntelligence(*, model, optimizer, ...)

Synaptic Intelligence strategy.

CoPE(*, model, optimizer, criterion, ...[, ...])

Continual Prototype Evolution strategy.

LFL(*, model, optimizer, criterion, ...[, ...])

Less Forgetful Learning strategy.

GenerativeReplay(*, model, optimizer, ...[, ...])

Generative Replay Strategy

MAS(*, model, optimizer, criterion, ...[, ...])

Memory Aware Synapses (MAS) strategy.

BiC(*, model, optimizer, criterion, ...[, ...])

Bias Correction (BiC) strategy.

MIR(*, model, optimizer, criterion, ...[, ...])

Maximally Interfered Replay Strategy See ER_MIR plugin for details.

MER(*, model, optimizer, criterion, ...[, ...])

ER_ACE(*, model, optimizer, criterion, ...)

ER ACE, as proposed in "New Insights on Reducing Abrupt Representation Change in Online Continual Learning" by Lucas Caccia et.

LearningToPrompt(*, model_name, criterion, ...)

Learning to Prompt (L2P) strategy.

SCR(*, model, optimizer[, augmentations, ...])

Supervised Contrastive Replay from

FromScratchTraining(*, model, optimizer, ...)

From scratch training strategy.

ExpertGateStrategy(*, model, optimizer, ...)

Expert Gate strategy.

DER(*, model, optimizer, criterion, ...[, ...])

Implements the DER and the DER++ Strategy, from the "Dark Experience For General Continual Learning" paper, Buzzega et.

supervised.lamaml.LaMAML(*, model, ...[, ...])

supervised.lamaml_v2.LaMAML(*, model, ...[, ...])

Replay Buffers and Selection Strategies

Buffers to store past samples according to different policies and selection strategies.



ABC for rehearsal buffers to store exemplars.


Buffer updated with reservoir sampling.

BalancedExemplarsBuffer(max_size[, ...])

A buffer that stores exemplars for rehearsal in separate groups.

ExperienceBalancedBuffer(max_size[, ...])

Rehearsal buffer with samples balanced over experiences.

ClassBalancedBuffer(max_size[, ...])

Stores samples for replay, equally divided over classes.

ParametricBuffer(max_size[, groupby, ...])

Stores samples for replay using a custom selection strategy and grouping.

Selection strategies


Base class to define how to select a subset of exemplars from a dataset.


Select the exemplars at random in the dataset


Base class to select exemplars from their features

HerdingSelectionStrategy(model, layer_name)

The herding strategy as described in iCaRL.

ClosestToCenterSelectionStrategy(model, ...)

A greedy algorithm that selects the remaining exemplar that is the closest to the center of all elements (in feature space).

Loss Functions


Similar to the Knowledge Distillation Loss.


RegularizationMethod implement regularization strategies.

LearningWithoutForgetting([alpha, temperature])

Learning Without Forgetting.


Asymetric cross-entropy (ACE) Criterion used in "New Insights on Reducing Abrupt Representation Change in Online Continual Learning" by Lucas Caccia et.

SCRLoss([temperature, contrast_mode, ...])

Supervised Contrastive Replay Loss as defined in Eq.

Training Plugins

Plugins can be added to any CL strategy to support additional behavior.

Utilities in

EarlyStoppingPlugin(patience, val_stream_name)

Early stopping and model checkpoint plugin.

EvaluationPlugin(*metrics[, loggers, ...])

Manager for logging and metrics.

LRSchedulerPlugin(scheduler[, ...])

Learning Rate Scheduler Plugin.

Strategy implemented as plugins in

AGEMPlugin(patterns_per_experience, sample_size)

Average Gradient Episodic Memory Plugin.

CoPEPlugin([mem_size, n_classes, p_size, ...])

Continual Prototype Evolution plugin.

CWRStarPlugin(model[, cwr_layer_name, ...])

CWR* Strategy.

EWCPlugin(ewc_lambda[, mode, decay_factor, ...])

Elastic Weight Consolidation (EWC) plugin.


GDumb plugin.

RWalkPlugin([ewc_lambda, ewc_alpha, delta_t])

Riemannian Walk (RWalk) plugin.

GEMPlugin(patterns_per_experience, ...)

Gradient Episodic Memory Plugin.

GSS_greedyPlugin([mem_size, mem_strength, ...])

GSSPlugin replay plugin.


Less-Forgetful Learning (LFL) Plugin.

LwFPlugin([alpha, temperature])

Learning without Forgetting plugin.

ReplayPlugin([mem_size, batch_size, ...])

Experience replay plugin.

SynapticIntelligencePlugin(si_lambda[, eps, ...])

Synaptic Intelligence plugin.

MASPlugin([lambda_reg, alpha, verbose])

Memory Aware Synapses (MAS) plugin.


TrainGeneratorAfterExpPlugin makes sure that after each experience of training the solver of a scholar model, we also train the generator on the data of the current experience.

RWalkPlugin([ewc_lambda, ewc_alpha, delta_t])

Riemannian Walk (RWalk) plugin.

GenerativeReplayPlugin([generator_strategy, ...])

Experience generative replay plugin.

BiCPlugin([mem_size, batch_size, ...])

Bias Correction (BiC) plugin.

MIRPlugin(batch_size_mem[, mem_size, subsample])

Maximally Interfered Retrieval plugin, Implements the strategy defined in "Online Continual Learning with Maximally Interfered Retrieval"

RARPlugin(batch_size_mem[, mem_size, ...])

Retrospective Adversarial Replay for Continual Learning Continual learning is an emerging research challenge in machine learning that addresses the problem where models quickly fit the most recently trained-on data and are prone to catastrophic forgetting due to distribution shifts --- it does this by maintaining a small historical replay buffer in replay-based methods. To avoid these problems, this paper proposes a method, ``Retrospective Adversarial Replay (RAR)'', that synthesizes adversarial samples near the forgetting boundary. RAR perturbs a buffered sample towards its nearest neighbor drawn from the current task in a latent representation space. By replaying such samples, we are able to refine the boundary between previous and current tasks, hence combating forgetting and reducing bias towards the current task. To mitigate the severity of a small replay buffer, we develop a novel MixUp-based strategy to increase replay variation by replaying mixed augmentations. Combined with RAR, this achieves a holistic framework that helps to alleviate catastrophic forgetting. We show that this excels on broadly-used benchmarks and outperforms other continual learning baselines especially when only a small buffer is used. We conduct a thorough ablation study over each key component as well as a hyperparameter sensitivity analysis to demonstrate the effectiveness and robustness of RAR.


From Scratch Training Plugin.