avalanche.benchmarks.classic.SplitInaturalist

avalanche.benchmarks.classic.SplitInaturalist(*, super_categories=None, return_task_id=False, download=False, seed=0, train_transform: ~typing.Optional[~typing.Any] = Compose( Resize(size=256, interpolation=bilinear, max_size=None, antialias=None) CenterCrop(size=(224, 224)) ToTensor() Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ), eval_transform: ~typing.Optional[~typing.Any] = Compose( Resize(size=256, interpolation=bilinear, max_size=None, antialias=None) CenterCrop(size=(224, 224)) ToTensor() Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ), dataset_root: ~typing.Optional[~typing.Union[str, ~pathlib.Path]] = None)[source]

Creates a CL benchmark using the iNaturalist2018 dataset.

A selection of supercategories (by default 10) define the experiences. Note that the supercategories are highly imbalanced in the number of classes and the amount of data available.

If the dataset is not present in the computer, this method will automatically download and store it if download=True (120Gtrain/val).

To parse the dataset jsons you need to install an additional dependency: “pycocotools”. You can install it with the command conda install -c conda-forge pycocotools

Implementation is based on the CL survey (https://ieeexplore.ieee.org/document/9349197) but differs slightly. The survey uses only the original iNaturalist2018 training dataset split into 70/10/20 for train/val/test streams. This method instead uses the full iNaturalist2018 training set to make the train_stream, whereas the test_stream is defined by the original iNaturalist2018 validation data.

The returned benchmark will return experiences containing all patterns of a subset of classes, which means that each class is only seen “once”. This is one of the most common scenarios in the Continual Learning literature. Common names used in literature to describe this kind of scenario are “Class Incremental”, “New Classes”, etc. By default, an equal amount of classes will be assigned to each experience.

This generator doesn’t force a choice on the availability of task labels, a choice that is left to the user (see the return_task_id parameter for more info on task labels).

The benchmark instance returned by this method will have two fields, train_stream and test_stream, which can be iterated to obtain training and test Experience. Each Experience contains the dataset and the associated task label.

The benchmark API is quite simple and is uniform across all benchmark generators. It is recommended to check the tutorial of the “benchmark” API, which contains usage examples ranging from “basic” to “advanced”.

Parameters

super_categories – The list of supercategories which define the tasks, i.e. each task consists of all classes in a super-category.
download – If true and the dataset is not present in the computer, this method will automatically download and store it. This will take 120G for the train/val set.
return_task_id – if True, a progressive task id is returned for every experience. If False, all experiences will have a task ID of 0.
seed – A valid int used to initialize the random number generator. Can be None.
train_transform – The transformation to apply to the training data, e.g. a random crop, a normalization or a concatenation of different transformations (see torchvision.transform documentation for a comprehensive list of possible transformations). If no transformation is passed, the default train transformation will be used.
eval_transform – The transformation to apply to the test data, e.g. a random crop, a normalization or a concatenation of different transformations (see torchvision.transform documentation for a comprehensive list of possible transformations). If no transformation is passed, the default test transformation will be used.
dataset_root – The root path of the dataset. Defaults to None, which means that the default location for ‘inatuarlist2018’ will be used.

Returns

A properly initialized NCScenario instance.