avalanche.benchmarks.generators.benchmark_with_validation_stream

avalanche.benchmarks.generators.benchmark_with_validation_stream(benchmark_instance: GenericCLScenario, validation_size: Union[int, float] = 0.5, shuffle: bool = False, input_stream: str = 'train', output_stream: str = 'valid', custom_split_strategy: Optional[Callable[[ClassificationExperience], Tuple[make_classification_dataset, make_classification_dataset]]] = None, *, experience_factory: Optional[Callable[[ClassificationStream, int], ClassificationExperience]] = None, lazy_splitting: Optional[bool] = None)[source]

Helper that can be used to obtain a benchmark with a validation stream.

This generator accepts an existing benchmark instance and returns a version of it in which a validation stream has been added.

In its base form this generator will split train experiences to extract validation experiences of a fixed (by number of instances or relative size), configurable, size. The split can be also performed on other streams if needed and the name of the resulting validation stream can be configured too.

Each validation experience will be extracted directly from a single training experience. Patterns selected for the validation experience will be removed from the training one.

If shuffle is True, the validation stream will be created randomly. Beware that no kind of class balancing is done.

The custom_split_strategy parameter can be used if a more specific splitting is required.

Please note that the resulting experiences will have a task labels field equal to the one of the originating experience.

Experience splitting can be executed in a lazy way. This behavior can be controlled using the lazy_splitting parameter. By default, experiences are split in a lazy way only when the input stream is lazily generated.

The default splitting strategy is a random split. A class-balanced split is also available using class_balanced_split_strategy:

validation_size = 0.2
foo = lambda exp: class_balanced_split_strategy(validation_size, exp)
bm = benchmark_with_validation_stream(bm, custom_split_strategy=foo)

Parameters

benchmark_instance – The benchmark to split.
validation_size – The size of the validation experience, as an int or a float between 0 and 1. Ignored if custom_split_strategy is used.
shuffle – If True, patterns will be allocated to the validation stream randomly. This will use the default PyTorch random number generator at its current state. Defaults to False. Ignored if custom_split_strategy is used. If False, the first instances will be allocated to the training dataset by leaving the last ones to the validation dataset.
input_stream – The name of the input stream. Defaults to ‘train’.
output_stream – The name of the output stream. Defaults to ‘valid’.
custom_split_strategy – A function that implements a custom splitting strategy. The function must accept an experience and return a tuple containing the new train and validation dataset. Defaults to None, which means that the standard splitting strategy will be used (which creates experiences according to validation_size and shuffle). A good starting to understand the mechanism is to look at the implementation of the standard splitting function random_validation_split_strategy().
experience_factory – The experience factory. Defaults to GenericExperience.
lazy_splitting – If True, the stream will be split in a lazy way. If False, the stream will be split immediately. Defaults to None, which means that the stream will be split in a lazy or non-lazy way depending on the laziness of the input_stream.

Returns

A benchmark instance in which the validation stream has been added.