Evaluation module

This module provides a number of metrics to monitor the continual learning performance.
Metrics subclass the PluginMetric class, which provides all the callbacks needed to include custom metric logic in specific points of the continual learning workflow.

evaluation.metrics

Metrics helper functions

High-level functions to get specific plugin metrics objects (to be passed to the EvaluationPlugin).
This is the recommended way to build metrics. Use these functions when available.

accuracy_metrics(*[, minibatch, epoch, ...])

Helper method that can be used to obtain the desired set of plugin metrics.

class_accuracy_metrics(*[, minibatch, ...])

Helper method that can be used to obtain the desired set of plugin metrics.

amca_metrics([streams])

Helper method that can be used to obtain the desired set of plugin metrics.

topk_acc_metrics(*[, top_k, minibatch, ...])

Helper method that can be used to obtain the desired set of plugin metrics.

loss_metrics(*[, minibatch, epoch, ...])

Helper method that can be used to obtain the desired set of plugin metrics.

bwt_metrics(*[, experience, stream])

Helper method that can be used to obtain the desired set of plugin metrics.

forgetting_metrics(*[, experience, stream])

Helper method that can be used to obtain the desired set of plugin metrics.

forward_transfer_metrics(*[, experience, stream])

Helper method that can be used to obtain the desired set of plugin metrics.

confusion_matrix_metrics([num_classes, ...])

Helper method that can be used to obtain the desired set of plugin metrics.

cpu_usage_metrics(*[, minibatch, epoch, ...])

Helper method that can be used to obtain the desired set of plugin metrics.

disk_usage_metrics(*[, paths_to_monitor, ...])

Helper method that can be used to obtain the desired set of standalone metrics.

gpu_usage_metrics(gpu_id[, every, ...])

Helper method that can be used to obtain the desired set of plugin metrics.

ram_usage_metrics(*[, every, minibatch, ...])

Helper method that can be used to obtain the desired set of plugin metrics.

timing_metrics(*[, minibatch, epoch, ...])

Helper method that can be used to obtain the desired set of plugin metrics.

MAC_metrics(*[, minibatch, epoch, experience])

Helper method that can be used to obtain the desired set of plugin metrics.

images_samples_metrics(*[, n_rows, n_cols, ...])

Create the plugins to log some images samples in grids.

labels_repartition_metrics(*[, on_train, ...])

Create plugins to monitor the labels repartition.

mean_scores_metrics(*[, on_train, on_eval, ...])

Helper to create plugins to show the scores of the true class, averaged by

Stream Metrics

Stream metrics work at eval time only. Stream metrics return the average of metric results over all the experiences present in the evaluation stream.
Slicing the evaluation stream during test (e.g., strategy.eval(benchmark.test_stream[0:2])) will not include sliced-out experiences in the average.

StreamAccuracy()

At the end of the entire stream of experiences, this plugin metric reports the average accuracy over all patterns seen in all experiences.

StreamClassAccuracy([classes])

At the end of the entire stream of experiences, this plugin metric reports the average accuracy over all patterns seen in all experiences (separately for each class).

AMCAPluginMetric([classes, streams, ...])

Plugin metric for the Average Mean Class Accuracy (AMCA).

TrainedExperienceAccuracy()

At the end of each experience, this plugin metric reports the average accuracy for only the experiences that the model has been trained on so far.

StreamLoss()

At the end of the entire stream of experiences, this metric reports the average loss over all patterns seen in all experiences.

StreamBWT()

The StreamBWT metric, emitting the average BWT across all experiences encountered during training.

StreamForgetting()

The StreamForgetting metric, describing the average evaluation accuracy loss detected over all experiences observed during training.

StreamForwardTransfer()

The Forward Transfer averaged over all the evaluation experiences.

StreamConfusionMatrix(num_classes, ...)

The Stream Confusion Matrix metric.

WandBStreamConfusionMatrix([class_names])

Confusion Matrix metric compatible with Weights and Biases logger.

StreamCPUUsage()

The average stream CPU usage metric.

StreamDiskUsage(paths_to_monitor)

The average stream Disk usage metric.

StreamTime()

The stream time metric.

StreamMaxRAM([every])

The Stream Max RAM metric.

StreamMaxGPU(gpu_id[, every])

The Stream Max GPU metric.

StreamTopkAccuracy(top_k)

At the end of the entire stream of experiences, this plugin metric reports the average top-k accuracy over all patterns seen in all experiences.

MeanScoresEvalPluginMetric(image_creator, ], ...)

Plugin to show the scores of the true class during evaluation, averaged by

Experience Metrics

Experience metrics compute values that are updated after each experience. Most of them are only updated at eval time and return the average metric results over all the patterns in the experience.

ExperienceAccuracy()

At the end of each experience, this plugin metric reports the average accuracy over all patterns seen in that experience.

ExperienceClassAccuracy([classes])

At the end of each experience, this plugin metric reports the average accuracy over all patterns seen in that experience (separately for each class).

ExperienceLoss()

At the end of each experience, this metric reports the average loss over all patterns seen in that experience.

ExperienceBWT()

The Experience Backward Transfer metric.

ExperienceForgetting()

The ExperienceForgetting metric, describing the accuracy loss detected for a certain experience.

ExperienceForwardTransfer()

The Forward Transfer computed on each experience separately.

ExperienceCPUUsage()

The average experience CPU usage metric.

ExperienceDiskUsage(paths_to_monitor)

The average experience Disk usage metric.

ExperienceTime()

The experience time metric.

ExperienceMAC()

At the end of each experience, this metric reports the MAC computed on a single pattern.

ExperienceMaxRAM([every])

The Experience Max RAM metric.

ExperienceMaxGPU(gpu_id[, every])

The Experience Max GPU metric.

ExperienceTopkAccuracy(top_k)

At the end of each experience, this plugin metric reports the average top-k accuracy over all patterns seen in that experience.

WeightCheckpoint()

The WeightCheckpoint Metric.

ImagesSamplePlugin(*, mode, n_cols, n_rows)

Metric used to sample random images.

Epoch Metrics

Epoch metrics work at train time only. Epoch metrics return the average metric results over all the patterns in the training dataset.

EpochAccuracy()

The average accuracy over a single training epoch.

EpochClassAccuracy([classes])

The average class accuracy over a single training epoch.

EpochLoss()

The average loss over a single training epoch.

EpochCPUUsage()

The Epoch CPU usage metric.

EpochDiskUsage(paths_to_monitor)

The Epoch Disk usage metric.

EpochTime()

The epoch elapsed time metric.

EpochMAC()

The MAC at the end of each epoch computed on a single pattern.

EpochMaxRAM([every])

The Epoch Max RAM metric.

EpochMaxGPU(gpu_id[, every])

The Epoch Max GPU metric.

MeanScoresTrainPluginMetric(image_creator, ...)

Plugin to show the scores of the true class during the lasts training

EpochTopkAccuracy(top_k)

The average top-k accuracy over a single training epoch.

RunningEpoch Metrics

Running Epoch metrics work at train time only. RunningEpoch metrics return the average metric results over all the patterns encountered up to the current iteration in the training epoch.

RunningEpochAccuracy()

The average accuracy across all minibatches up to the current epoch iteration.

RunningEpochClassAccuracy([classes])

The average class accuracy across all minibatches up to the current epoch iteration.

RunningEpochTopkAccuracy(top_k)

The average top-k accuracy across all minibatches up to the current epoch iteration.

RunningEpochLoss()

The average loss across all minibatches up to the current epoch iteration.

RunningEpochCPUUsage()

The running epoch CPU usage metric.

RunningEpochTime()

The running epoch time metric.

Minibatch Metrics

Minibatch metrics work at train time only. Minibatch metrics return the average metric results over all the patterns in the current minibatch.

MinibatchAccuracy()

The minibatch plugin accuracy metric.

MinibatchClassAccuracy([classes])

The minibatch plugin class accuracy metric.

MinibatchLoss()

The minibatch loss metric.

MinibatchCPUUsage()

The minibatch CPU usage metric.

MinibatchDiskUsage(paths_to_monitor)

The minibatch Disk usage metric.

MinibatchTime()

The minibatch time metric.

MinibatchMAC()

The minibatch MAC metric.

MinibatchMaxRAM([every])

The Minibatch Max RAM metric.

MinibatchMaxGPU(gpu_id[, every])

The Minibatch Max GPU metric.

MinibatchTopkAccuracy(top_k)

The minibatch plugin top-k accuracy metric.

Other Plugin Metrics

WeightCheckpoint()

The WeightCheckpoint Metric.

Standalone Metrics

Standalone metrics define the metric computation itself. Unlike metric plugins, they cannot be used in Avalanche strategies directly. However, they can be easily used without Avalanche.

Accuracy()

The Accuracy metric.

AverageMeanClassAccuracy([classes])

The Average Mean Class Accuracy (AMCA) metric.

BWT()

The standalone Backward Transfer metric.

CPUUsage()

The standalone CPU usage metric.

ClassAccuracy([classes])

The Class Accuracy metric.

ConfusionMatrix([num_classes, normalize])

The standalone confusion matrix metric.

DiskUsage([paths_to_monitor])

The standalone disk usage metric.

ElapsedTime()

The standalone Elapsed Time metric.

Forgetting()

The standalone Forgetting metric.

ForwardTransfer()

The standalone Forward Transfer metric.

LabelsRepartition()

Metric used to monitor the labels repartition.

Loss()

The standalone Loss metric.

MAC()

Standalone Multiply-and-accumulate metric.

MaxGPU(gpu_id[, every])

The standalone GPU usage metric.

MaxRAM([every])

The standalone RAM usage metric.

Mean()

The standalone mean metric.

MeanNewOldScores()

Average the scores of the true class by old and new classes

MeanScores()

Average the scores of the true class by label

MultiStreamAMCA([classes, streams])

An extension of the Average Mean Class Accuracy (AMCA) metric (class:AverageMeanClassAccuracy) able to separate the computation of the AMCA based on the current stream.

Sum()

The standalone sum metric.

TopkAccuracy(top_k)

The Top-k Accuracy metric.

TrainedExperienceTopkAccuracy(top_k)

At the end of each experience, this plugin metric reports the average top-k accuracy for only the experiences that the model has been trained on so far.

evaluation.metrics.detection

Metrics for Object Detection tasks. Please, take a look at the examples in the examples folder of Avalanche to better understand how to use these metrics.

make_lvis_metrics([save_folder, ...])

Returns an instance of DetectionMetrics initialized for the LVIS dataset.

get_detection_api_from_dataset(dataset[, ...])

Adapted from: https://github.com/pytorch/vision/blob/main/references/detection/engine.py

DetectionMetrics(*, evaluator_factory, ...)

Metric used to compute the detection and segmentation metrics using the dataset-specific API.

evaluation.metric_definitions

General interfaces on which metrics are built.

Metric(*args, **kwargs)

Standalone metric.

PluginMetric()

A metric that can be used together with EvaluationPlugin.

GenericPluginMetric(metric[, reset_at, ...])

This class provides a generic implementation of a Plugin Metric.

evaluation.metric_results

Metric result types

MetricValue(origin, name, value, x_plot[, ...])

The result of a Metric.

LoggingType(value)

A type for MetricValues.