avalanche.checkpointing.maybe_load_checkpoint

avalanche.checkpointing.maybe_load_checkpoint(strategy: T, fname: str | PathLike | BinaryIO | IO[bytes], map_location: Callable[[Tensor, str], Tensor] | device | str | Dict[str, str] | None = None, unique_objects: Any | None = None) Tuple[T, int][source]

Load the strategy state from a checkpoint file.

The method returns the strategy with the state deserialized from the file and the index of the training experience to resume training.

If the file does not exists, the method returns the strategy unmodified and the index 0. As a result, the method can be safely called even if no checkpoint has been previously created (e.g. during the first run).

Example:

``` strategy = Naive(model, opt, train_mb_size=128) strategy, initial_exp = maybe_load_checkpoint(strategy, fname)

for exp in benchmark.train_stream[initial_exp:]:

strategy.train(exp) save_checkpoint(strat, fname)

```

This also supports de-duplicating unique objects, such as datasets. This is useful to avoid duplicating the memory usage when loading a checkpoint with a large number of datasets (or an experiment that was already checkpointed are re-loaded multiple times).

Consider passing the benchmark object as unique_objects to avoid duplicating the memory associated with dataset(s).

In practice, de-duplication works by taking unique_objects and listing objects of classes serialized by using constructor_based_serialization() and comparing their constructor arguments. If an object found in unique_objects is already present in the checkpoint, it is re-used instead of being re-loaded from the checkpoint. This prevents the checkpoint size from exploding when checkpointing and re-loading frequently (such as when running on a SLURM cluster that frequently preempts you job).

Parameters:
  • strategy – strategy to load. It must be already initialized.

  • fname – file name

  • map_location – sets the location of the tensors after serialization. Same as map_location of torch.load, except that you can also pass a device object or a string (a proper map will be created accordingly). The recommended way to use this parameter is to pass the used reference device. In addition, all torch.device objects will be un-pickled using that map (this is not usually done by torch.load, but it is needed to properly manage things in Avalanche). Defaults to None, which means that no mapping will take place.

  • unique_objects – list of (or a single) unique object(s) that do not need to be unpickled. This is useful to avoid duplicating the memory associated with a dataset (or an experiment that was already checkpointed are re-loaded multiple times). Classes of objects that need de-duplication must be registered as such using helpers such as constructor_based_serialization(). Defaults to None. Recommended: at least pass the benchmark object.

Returns:

tuple <strategy, exp_counter> strategy after deserialization, index of the current experience to resume training.