Data and Configurations
1. Dataloaders
Section titled “1. Dataloaders”When using Qadence, you can supply classical data to a quantum machine learning
algorithm by using a standard PyTorch DataLoader instance. Qadence also provides
the DictDataLoader convenience class which allows
to build dictionaries of DataLoaders instances and easily iterate over them.
import torchfrom torch.utils.data import DataLoader, TensorDatasetfrom qadence.ml_tools import DictDataLoader, to_dataloader
def dataloader(data_size: int = 25, batch_size: int = 5, infinite: bool = False) -> DataLoader: x = torch.linspace(0, 1, data_size).reshape(-1, 1) y = torch.sin(x) return to_dataloader(x, y, batch_size=batch_size, infinite=infinite)
def dictdataloader(data_size: int = 25, batch_size: int = 5) -> DictDataLoader: dls = {} for k in ["y1", "y2"]: x = torch.rand(data_size, 1) y = torch.sin(x) dls[k] = to_dataloader(x, y, batch_size=batch_size, infinite=True) return DictDataLoader(dls)
# iterate over standard DataLoaderfor (x,y) in dataloader(data_size=6, batch_size=2): print(f"Standard {x = }")
# construct an infinite dataset which will keep sampling indefinitelyn_epochs = 5dl = iter(dataloader(data_size=6, batch_size=2, infinite=True))for _ in range(n_epochs): (x, y) = next(dl) print(f"Infinite {x = }")
# iterate over DictDataLoaderddl = dictdataloader()data = next(iter(ddl))print(f"{data = }")Standard x = tensor([[0.0000], [0.2000]])Standard x = tensor([[0.4000], [0.6000]])Standard x = tensor([[0.8000], [1.0000]])Infinite x = tensor([[0.6000], [0.2000]])Infinite x = tensor([[0.], [1.]])Infinite x = tensor([[0.8000], [0.4000]])Infinite x = tensor([[0.6000], [0.2000]])Infinite x = tensor([[0.], [1.]])data = {'y1': [tensor([[0.4171], [0.1440], [0.8342], [0.4124], [0.4072]]), tensor([[0.4051], [0.1435], [0.7408], [0.4008], [0.3961]])], 'y2': [tensor([[0.4978], [0.8297], [0.5417], [0.6601], [0.6967]]), tensor([[0.4775], [0.7378], [0.5156], [0.6132], [0.6417]])]}Note:
In case of infinite=True, the dataloader iterator will provide a random sample from the dataset.
2. Training Configuration
Section titled “2. Training Configuration”The TrainConfig class provides a comprehensive configuration setup for training quantam machine learning models in Qadence. This configuration includes settings for batch size, logging, check-pointing, validation, and additional custom callbacks that control the training process's granularity and flexibility.
The TrainConfig tells Trainer what batch_size should be used, how many epochs to train, in which intervals to print/log metrics and how often to store intermediate checkpoints.
It is also possible to provide custom callback functions by instantiating a Callback
with a function callback.
For example of how to use the TrainConfig with Trainer, please see Examples in Trainer
2.1 Explanation of TrainConfig Attributes
Section titled “2.1 Explanation of TrainConfig Attributes”| Attribute | Type | Default | Description |
|---|---|---|---|
max_iter |
int |
10000 |
Total number of training epochs. |
batch_size |
int |
1 |
Batch size for training. |
print_every |
int |
0 |
Frequency of console output. Set to 0 to disable. |
write_every |
int |
0 |
Frequency of logging metrics. Set to 0 to disable. |
plot_every |
int |
0 |
Frequency of plotting metrics. Set to 0 to disable. |
checkpoint_every |
int |
0 |
Frequency of saving checkpoints. Set to 0 to disable. |
val_every |
int |
0 |
Frequency of validation checks. Set to 0 to disable. |
val_epsilon |
float |
1e-5 |
Threshold for validation improvement. |
validation_criterion |
Callable |
None |
Function for validating metric improvement. |
trainstop_criterion |
Callable |
None |
Function to stop training early. |
callbacks |
list[Callback] |
[] |
List of custom callbacks. |
root_folder |
Path |
"./qml_logs" |
Root directory for saving logs and checkpoints. |
log_folder |
Path |
"./qml_logs" |
Logging directory for saving logs and checkpoints. |
log_model |
bool |
False |
Enables model logging. |
verbose |
bool |
True |
Enables detailed logging. |
tracking_tool |
ExperimentTrackingTool |
TENSORBOARD |
Tool for tracking training metrics. |
plotting_functions |
tuple |
() |
Functions for plotting metrics. |
hyperparams |
dict |
{} |
Dictionary of hyperparameters |
nprocs |
int |
1 |
Number of processes to use when spawning subprocesses; for multi-GPU setups, set this to the total number of GPUs. |
compute_setup |
str |
"cpu" |
Specifies the compute device: "auto", "gpu", or "cpu". |
backend |
str |
"gloo" |
Backend for distributed training communication (e.g., "gloo", "nccl", or "mpi"). |
log_setup |
str |
"cpu" |
Device setup for logging; use "cpu" to avoid GPU conflicts |
dtype |
dtype or None |
None |
Data type for computations (e.g., torch.float32) |
all_reduce_metrics |
bool |
False |
If True, aggregates metrics (e.g., loss) across processes |
from qadence.ml_tools import OptimizeResult, TrainConfigfrom qadence.ml_tools.callbacks import Callback
batch_size = 5n_epochs = 100
print_parameters = lambda opt_res: print(opt_res.model.parameters())condition_print = lambda opt_res: opt_res.loss < 1.0e-03modify_extra_opt_res = {"n_epochs": n_epochs}custom_callback = Callback(on="train_end", callback = print_parameters, callback_condition=condition_print, modify_optimize_result=modify_extra_opt_res, called_every=10,)
config = TrainConfig( root_folder="some_path/", max_iter=n_epochs, checkpoint_every=100, write_every=100, batch_size=batch_size, callbacks = [custom_callback])2.2 Key Configuration Options in TrainConfig
Section titled “2.2 Key Configuration Options in TrainConfig”Iterations and Batch Size
Section titled “Iterations and Batch Size”max_iter(int): Specifies the total number of training iterations (epochs). For anInfiniteTensorDataset, each epoch contains one batch; for aTensorDataset, it containslen(dataloader)batches.batch_size(int): Defines the number of samples processed in each training iteration.
Example:
config = TrainConfig(max_iter=2000, batch_size=32)Training Parameters
Section titled “Training Parameters”print_every(int): Controls how often loss and metrics are printed to the console.write_every(int): Determines how frequently metrics are written to the tracking tool, such as TensorBoard or MLflow.checkpoint_every(int): Sets the frequency for saving model checkpoints.
Note: Set 0 to diable.
Example:
config = TrainConfig(print_every=100, write_every=50, checkpoint_every=50)The user can provide either the root_folder or the log_folder for saving checkpoints and logging. When neither are provided, the default root_folder "./qml_logs" is used.
root_folder(Path): The root directory for saving checkpoints and logs. All training logs will be saved inside a subfolder in this root directory. (The path to these subfolders can be accessed using config._subfolders, and the current logging folder is config.log_folder)create_subfolder_per_run(bool): Creates a unique subfolder for each training run within the specified folder.tracking_tool(ExperimentTrackingTool): Specifies the tracking tool to log metrics, e.g., TensorBoard or MLflow.log_model(bool): Enables logging of a serialized version of the model, which is useful for model versioning. Thi happens at the end of training.
Note
- The user can also provide log_folder argument - which will only be used when create_subfolder_per_run = False.
- log_folder (Path): The log folder used for saving checkpoints and logs.
Example:
config = TrainConfig(root_folder="path/to/checkpoints", tracking_tool=ExperimentTrackingTool.MLFLOW, checkpoint_best_only=True)Validation Parameters
Section titled “Validation Parameters”checkpoint_best_only(bool): If set toTrue, saves checkpoints only when there is an improvement in the validation metric.val_every(int): Frequency of validation checks. Setting this to0disables validation.val_epsilon(float): A small threshold used to compare the current validation loss with previous best losses.validation_criterion(Callable): A custom function to assess if the validation metric meets a specified condition.
Example:
config = TrainConfig(val_every=200, checkpoint_best_only = True, validation_criterion=lambda current, best: current < best - 0.001)If it is desired to only the save the "best" checkpoint, the following must be ensured:
(a) `checkpoint_best_only = True` is used while creating the configuration through `TrainConfig`,(b) `val_every` is set to a valid integer value (for example, `val_every = 10`) which controls the no. of iterations after which the validation data should be used to evaluate the model during training, which can also be set through `TrainConfig`,(c) a validation criterion is provided through the `validation_criterion`, set through `TrainConfig` to quantify the definition of "best", and(d) the validation dataloader passed to `Trainer` is of type `DataLoader`. In this case, it is expected that a validation dataloader is also provided along with the train dataloader since the validation data will be used to decide the "best" checkpoint.The criterion used to decide the "best" checkpoint can be customized by validation_criterion, which should be a function that can take val_loss, best_loss, and val_epsilon arguments and return a boolean value (True or False) indicating whether some validation metric is satisfied or not. An example of a simple validation_criterion is:
def validation_criterion(val_loss: float, best_val_loss: float, val_epsilon: float) -> bool: return val_loss < (best_val_loss - val_epsilon)Custom Callbacks
Section titled “Custom Callbacks”TrainConfig supports custom callbacks that can be triggered at specific stages of training. The callbacks attribute accepts a list of callback instances, which allow for custom behaviors like early stopping or additional logging.
See Callbacks for more details.
callbacks(list[Callback]): List of custom callbacks to execute during training.
Example:
from qadence.ml_tools.callbacks import Callback
def callback_fn(trainer, config, writer): if trainer.opt_res.loss < 0.001: print("Custom Callback: Loss threshold reached!")
custom_callback = Callback(on = "train_epoch_end", called_every = 10, callback_function = callback_fn )
config = TrainConfig(callbacks=[custom_callback])Hyperparameters and Plotting
Section titled “Hyperparameters and Plotting”hyperparams(dict): A dictionary of hyperparameters (e.g., learning rate, regularization) to be tracked by the tracking tool.plot_every(int): Determines how frequently plots are saved to the tracking tool, such as TensorBoard or MLflow.plotting_functions(tuple[LoggablePlotFunction, ...]): Functions for in-training plotting of metrics or model state.
Note: Please ensure that plotting_functions are provided when plot_every > 0
Example:
config = TrainConfig( plot_every=10, hyperparams={"learning_rate": 0.001, "batch_size": 32}, plotting_functions=(plot_loss_function,))Advanced Distributed Training
Section titled “Advanced Distributed Training”-
nprocs(int): Specifies the number of processes to be used. For multi-GPU training, this should match the total number of GPUs available. When nprocs is greater than 1,Trainerspawns additional subprocesses for training. This is useful for parallel or distributed training setups. -
compute_setup(str): Determines the compute device configuration: 1."auto"(automatically selects GPU if available), 2."gpu"- (forces GPU usage and errors if no GPU is detected), and 3."cpu"(Forces the use of the CPU). -
backend(str): Specifies the communication backend for distributed training. Common options are"gloo"(default),"nccl"(optimized for GPUs), or"mpi", depending on your setup. It should be one of the backends supported bytorch.distributed. For further details, please look at torch backends (external)
Notes: - Logging Specific Callbacks: Logging is available only through the main process, i.e. process 0. Model logging, plotting, logging metrics will only be performed for a single process, even if multiple processes are run. - Training with specific callbacks: Callbacks specific to training, e.g.,
EarlyStopping,LRSchedulerStepDecay, etc will be called from each process. -PrintMetrics(set through theprint_everyargument inTrainCongig) is available from all processes.
Example: For CPU MultiProcessing
config = TrainConfig( compute_setup="cpu", nprocs=5, backend="gloo")Example: For GPU multiprocessing training
config = TrainConfig( compute_setup="gpu", nprocs=2, # World-size/Total number of GPUs backend="nccl")Precision Options
Section titled “Precision Options”-
dtype(dtype or None): Sets the numerical precision (data type) for computations. For instance, you can usetorch.float32ortorch.float16depending on your performance and precision needs. Both model parameters, and dataset will be of the provided precision.- If not specified or None, the default torch precision (usually torch.float32) is used.
- If provided dtype is complex dtype, appropriate precision for the data and model parameters will be used as follows:
Data Type ( dtype)Data Precision Model Precision Model Parameters Precision (Real Part & Imaginary Part ) torch.float1616-bit 16-bit N/A torch.float3232-bit 32-bit N/A torch.float6464-bit 64-bit N/A torch.complex3216-bit 32-bit 16-bit torch.complex6432-bit 64-bit 32-bit torch.complex12864-bit 128-bit 64-bit Complex Dtypes: Complex data types are useful for Quantum Neural Networks - such as
QNNprovided by qadence. The industry standard is to usetorch.complex128, however, the user can also specify a lower precision (torch.complex64ortorch.complex32) for faster training.
Furthermore, the user can also utilize the following options:
-
log_setup(str): Configures the device used for logging. Using"cpu"ensures logging runs on the CPU (which may avoid conflicts with GPU operations), while"auto"aligns logging with the compute device. -
all_reduce_metrics(bool): When enabled, aggregates metrics (such as loss or accuracy) across all training processes to provide a unified summary, though it may introduce additional synchronization overhead.
3. Experiment tracking with mlflow
Section titled “3. Experiment tracking with mlflow”Qadence allows to track runs and log hyperparameters, models and plots with tensorboard (external) and mlflow (external). In the following, we demonstrate the integration with mlflow.
mlflow configuration
Section titled “mlflow configuration”We have control over our tracking configuration by setting environment variables. First, let's look at the tracking URI. For the purpose of this demo we will be working with a local database, in a similar fashion as described here (external),
export MLFLOW_TRACKING_URI=sqlite:///mlruns.dbQadence can also read the following two environment variables to define the mlflow experiment name and run name
export MLFLOW_EXPERIMENT=test_experimentexport MLFLOW_RUN_NAME=run_0If no tracking URI is provided, mlflow stores run information and artifacts in the local ./mlflow directory and if no names are defined, the experiment and run will be named with random UUIDs.