-
BaseKernel — Base class for implementations of the Quantum Evolution Kernel.
-
FastQEK — FastQEK class.
-
IntegratedQEK — A variant of the Quantum Evolution Kernel that supports fit/transform/fit_transform from raw data (graphs).
qek.kernel.kernel
module
qek.kernel.kernel
The Quantum Evolution Kernel itself, for use in a machine-learning pipeline.
Classes
Functions
-
count_occupation_from_bitstring — Counts the number of '1' bits in a binary string.
-
dist_excitation_and_vec — Calculates the distribution of excitation energies from a dictionary of bitstrings to their respective counts.
class
BaseKernel
(mu: float, size_max: int | None = None, similarity: Callable[[NDArray[np.floating], NDArray[np.floating]], np.floating] | None = None)
Bases : abc.ABC, Generic[KernelData]
Base class for implementations of the Quantum Evolution Kernel.
Unless you are implementing a new kernel, you should probably consider using one of the subclasses: - FastQEK (lower-level API, requires processed data, optimized); - IntegratedQEK (higher-level API, accepts graphs, slower).
Initialize the kernel.
Attributes
-
- X (Sequence[ProcessedData]) — Training data used for fitting the kernel.
-
- kernel_matrix (np.ndarray) — Kernel matrix. This is assigned in the
fit()
method
Training parameters
mu (float): Scaling factor for the Jensen-Shannon divergence
size_max (int, optional): If specified, only consider the first size_max
qubits of bitstrings. Otherwise, consider all qubits. You may use this
to trade precision in favor of speed.
Parameters
-
mu : float — Scaling factor for the Jensen-Shannon divergence
-
size_max : int, optional — If specified, only consider the first
size_max
qubits of bitstrings. Otherwise, consider all qubits. You may use this to trade precision in favor of speed. -
similarity : optional — If specified, a custom similarity metric to use. Otherwise, use the Jensen-Shannon divergence.
Note: This class does not accept raw data, but rather ProcessedData
. See
class IntegratedQuantumEvolutionKernel for a subclass that provides a more powerful API,
at the expense of performance.
Methods
-
to_processed_data — Convert the raw data into features.
-
default_similarity — The Jensen-Shannon similarity metric used to compute the kernel, used when calling
kernel(X1, X2)
. -
similarity — Compute the similarity between two graphs using Jensen-Shannon divergence.
-
fit — Fit the kernel to the training dataset by storing the dataset.
-
transform — Transform the dataset into the kernel space with respect to the training dataset.
-
fit_transform — Fit the kernel to the training dataset and transform it.
-
create_train_kernel_matrix — Compute a kernel matrix for a given training dataset.
-
create_test_kernel_matrix — Compute a kernel matrix for a given testing dataset and training set.
-
set_params — Set multiple parameters for the kernel.
-
get_params — Retrieve the value of all parameters.
method
to_processed_data
(X: Sequence[KernelData]) → Sequence[ProcessedData]
Convert the raw data into features.
Raises
-
NotImplementedError
method
default_similarity
(row: NDArray[np.floating], col: NDArray[np.floating]) → np.floating
The Jensen-Shannon similarity metric used to compute the kernel, used when calling kernel(X1, X2)
.
This is the default similarity, if no parameter similarity
is provided.
method
similarity
(graph_1: KernelData, graph_2: KernelData) → float
Compute the similarity between two graphs using Jensen-Shannon divergence.
This method computes the square of the Jensen-Shannon divergence (JSD) between two probability distributions over bitstrings. The JSD is a measure of the difference between two probability distributions, and it can be used as a kernel for machine learning algorithms that require a similarity function.
The input graphs are assumed to have been processed using the ProcessedData class from qek_os.data_io.dataset.
Parameters
-
graph_1 : KernelData — First graph.
-
graph_2 : KernelData — Second graph.
Returns
-
float — Similarity between the two graphs, scaled by a factor that depends on mu.
method
fit
(X: Sequence[KernelData], y: list | None = None) → None
Fit the kernel to the training dataset by storing the dataset.
Parameters
-
X : Sequence[KernelData] — The training dataset.
-
y : list | None — list: Target variable for the dataset sequence. This argument is ignored, provided only for compatibility with machine-learning libraries.
method
transform
(X_test: Sequence[KernelData], y_test: list | None = None) → np.ndarray
Transform the dataset into the kernel space with respect to the training dataset.
Parameters
-
X_test : Sequence[KernelData] — The dataset to transform. y_test: list: Target variable for the dataset sequence. This argument is ignored, provided only for compatibility with machine-learning libraries.
-
Returns — np.ndarray: Kernel matrix where each entry represents the similarity between the given dataset and the training dataset.
Raises
-
ValueError
method
fit_transform
(X: Sequence[KernelData], y: list | None = None) → np.ndarray
Fit the kernel to the training dataset and transform it.
Parameters
-
X : Sequence[KernelData] — The dataset to fit and transform. y: list: Target variable for the dataset sequence. This argument is ignored, provided only for compatibility with machine-learning libraries.
-
Returns — np.ndarray: Kernel matrix for the training dataset.
method
create_train_kernel_matrix
(train_dataset: Sequence[KernelData]) → np.ndarray
Compute a kernel matrix for a given training dataset.
This method computes a symmetric N x N kernel matrix from the Jensen-Shannon divergences between all pairs of graphs in the input dataset. The resulting matrix can be used as a similarity metric for machine learning algorithms. Args: train_dataset: A list of objects to compute the kernel matrix from. Returns: np.ndarray: An N x N symmetric matrix where the entry at row i and column j represents the similarity between the graphs in positions i and j of the input dataset.
method
create_test_kernel_matrix
(test_dataset: Sequence[KernelData], train_dataset: Sequence[KernelData]) → np.ndarray
Compute a kernel matrix for a given testing dataset and training set.
This method computes an N x M kernel matrix from the Jensen-Shannon divergences between all pairs of graphs in the input testing dataset and the training dataset. The resulting matrix can be used as a similarity metric for machine learning algorithms, particularly when evaluating the performance on the test dataset using a trained model. Args: test_dataset: The testing dataset. train_dataset: The training set. Returns: np.ndarray: An M x N matrix where the entry at row i and column j represents the similarity between the graph in position i of the test dataset and the graph in position j of the training set.
method
set_params
(**kwargs: dict[str, Any]) → None
Set multiple parameters for the kernel.
Parameters
-
**kwargs : dict[str, Any] — Arbitrary keyword dictionary where keys are attribute names
-
and values are their respective values
method
get_params
(deep: bool = True) → dict[str, Any]
Retrieve the value of all parameters.
Parameters
-
deep : bool — Ignored for the time being. Added for compatibility with various machine learning libraries, such as scikit-learn.
Returns dict: A dictionary of parameters and their respective values. Note that this method always performs a copy of the dictionary.
class
FastQEK
(mu: float, size_max: int | None = None, similarity: Callable[[NDArray[np.floating], NDArray[np.floating]], np.floating] | None = None)
Bases : BaseKernel[ProcessedData]
FastQEK class.
Initialize the kernel.
Attributes
-
- X (Sequence[ProcessedData]) — Training data used for fitting the kernel.
-
- kernel_matrix (np.ndarray) — Kernel matrix. This is assigned in the
fit()
method
Training parameters
mu (float): Scaling factor for the Jensen-Shannon divergence
size_max (int, optional): If specified, only consider the first size_max
qubits of bitstrings. Otherwise, consider all qubits. You may use this
to trade precision in favor of speed.
Parameters
-
mu : float — Scaling factor for the Jensen-Shannon divergence
-
size_max : int, optional — If specified, only consider the first
size_max
qubits of bitstrings. Otherwise, consider all qubits. You may use this to trade precision in favor of speed. -
similarity : optional — If specified, a custom similarity metric to use. Otherwise, use the Jensen-Shannon divergence.
Note: This class does not accept raw data, but rather ProcessedData
. See
class IntegratedQEK for a subclass that provides a more powerful API,
at the expense of performance.
Methods
-
to_processed_data — Convert the raw data into features.
method
to_processed_data
(X: Sequence[ProcessedData]) → Sequence[ProcessedData]
Convert the raw data into features.
class
IntegratedQEK
(mu: float, extractor: BaseExtractor[GraphType], size_max: int | None = None, similarity: Callable[[NDArray[np.floating], NDArray[np.floating]], np.floating] | None = None)
Bases : BaseKernel[GraphType]
A variant of the Quantum Evolution Kernel that supports fit/transform/fit_transform from raw data (graphs).
Initialize an IntegratedQEK
Performance note
This class uses an extractor to convert the raw data into features. This can be very slow if you use, for instance, a remote QPU, as the waitlines to access a QPU can be very long. If you are using this in an interactive application or a server, this will block the entire thread during the wait.
We recommend using this class only with local emulators.
Training parameters
mu (float): Scaling factor for the Jensen-Shannon divergence
extractor: An extractor (e.g. a QPU or a Quantum emulator) used to conver the raw data (graphs) into features.
size_max (int, optional): If specified, only consider the first size_max
qubits of bitstrings. Otherwise, consider all qubits. You may use this
to trade precision in favor of speed.
similarity (optional): If specified, a custom similarity metric to use. Otherwise,
use the Jensen-Shannon divergence.
Parameters
-
mu : float — Scaling factor for the Jensen-Shannon divergence
-
extractor : BaseExtractor[GraphType] — An extractor (e.g. a QPU or a Quantum emulator) used to conver the raw data (graphs) into features.
-
size_max : int, optional — If specified, only consider the first
size_max
qubits of bitstrings. Otherwise, consider all qubits. You may use this to trade precision in favor of speed. -
similarity : optional — If specified, a custom similarity metric to use. Otherwise, use the Jensen-Shannon divergence.
Methods
-
to_processed_data — Convert the raw data into features.
method
to_processed_data
(X: Sequence[GraphType]) → Sequence[ProcessedData]
Convert the raw data into features.
Performance note
This method can can be very slow if you use, for instance, a remote QPU, as the waitlines to access a QPU can be very long. If you are using this in an interactive application or a server, this will block the entire thread during the wait.
count_occupation_from_bitstring (bitstring: str) → int
Counts the number of '1' bits in a binary string.
Parameters
-
bitstring : str — A binary string containing only '0's and '1's.
Returns
-
int — The number of '1' bits found in the input string.
dist_excitation_and_vec (count_bitstring: dict[str, int], size_max: int | None = None) → np.ndarray
Calculates the distribution of excitation energies from a dictionary of bitstrings to their respective counts.
Parameters
-
count_bitstring : dict[str, int] — A dictionary mapping binary strings to their counts.
-
size_max : int | None — If specified, only keep
size_max
energy distributions in the output. Otherwise, keep all values.
Returns
-
np.ndarray — A NumPy array where keys are the number of '1' bits in each binary string and values are the normalized counts.
Raises
-
ValueError