Skip to content
Pasqal Documentation

QCNN model

In this tutorial, we’ll explore how to train a Quantum Convolutional Neural Network (QCNN) using Qadence, demonstrating its application on a simple yet insightful data structure. The tutorial begins by detailing the construction of the quantum circuit and the synthetic data generation process. Convolutional architectures excel at capturing spatial relationships in grid-like structures, so we design our data to emulate a system of five interconnected grid elements, where the collective behavior produces a single continuous output value. The key idea is to aggregate features from four neighboring elements into a fifth central node, generating a compact embedding that encodes structural information. By training the QCNN to predict the global property from this localized representation, we effectively enable the model to infer system-wide behavior from a single node’s contextual features—showcasing the potential of quantum machine learning for relational reasoning tasks.

A generic QCNN is described as Φθ,λ=l=1L(PlλlClθl)\Phi_{{\theta},{\lambda}} =\bigcirc_{l=1}^{L} \big(\text{P}_{l}^{{\lambda}_{l}} \circ \text{C}_{l}^{{\theta}_{l}}\big)

where \circ denotes a single function composition, and l=1L\bigcirc_{l=1}^L represents the composition of LL functions applied sequentially. Each QGCN layer ll comprises a quatnum convolutional layer Clθl\text{C}_l^{{\theta}_{l}} followed by a quantum pooling layer Plλl\text{P}_{l}^{{\lambda}_{l}}, with θl{\theta}_{l} and λl{\lambda}_{l} being the convolution and pooling parameters, respectively. The alternating structure of the QGNN circuit processes and simplifies the input quantum state, starting with C1\text{C}_1 and ending with PL\text{P}_{L}.

The convolutional layer Clθl:S(Hl)S(Hl)\text{C}_l^{{\theta}_l}: \mathcal{S}(\mathcal{H}_l) \rightarrow \mathcal{S}(\mathcal{H}_l) preserves the size of the quantum register. It reads Clθl()=j=1r(iS(j)Wl(i,i+1)(θl))()\text{C}_l^{{\theta}_l}(\cdot) = \bigcirc_{j=1}^{r} \left(\bigotimes_{i \in \text{S}(j)} W_l^{(i, i+1)}\left({\theta}_l\right)\right)(\cdot)

Each convolutional layer acts on an even number of qubits, since the unitary WlW_l1 convolves a pairs of neighboring qubits (i,i+1)(i, i+1). In general, the operator WW acts on pairs of adjacent qubits defined by the set S(j)={(i,i+1)ij ⁣ ⁣ ⁣mod2, 0iN2}S(j) = \{(i, i+1) \mid i \equiv j \!\!\!\mod 2, \ 0 \leq i \leq N-2\}, where NN denotes the total number of qubits2. This construction ensures an alternating nearest-neighbor interaction pattern. The alternation is determined by the parity of jj: for even jj, WW acts on even-indexed pairs (0,1)(0,1), (2,3)(2,3), etc.; similarly, for odd values of jj, the same operation is performed on odd-indexed pairs in an analogous manner. Each convolutional layer Clθl\text{C}_l^{{\theta}_l} has an associated parameter rr, representing its depth. For example, in a two-layer QGCN architecture, the depths of C1\text{C}_1 and C2\text{C}_2 are denoted as r1r_1 and r2r_2, respectively.

The pooling layer Plλl:S(Hl)S(Hl+1)\text{P}_{l}^{{\lambda}_l}: \mathcal{S}(\mathcal{H}_l) \rightarrow \mathcal{S}(\mathcal{H}_{l+1}) reduces the size of the quantum register by tracing out specific qubits, such that dim(Hl+1)<dim(Hl)\dim(\mathcal{H}_{l+1}) < \dim(\mathcal{H}_l), and is defined as Plλl()=Tri[()]\text{P}_{l}^{{\lambda}_l}(\cdot) = Tr_{i}[(\cdot)]

where the ii-qubit is traced out in the ll-th layer. The pooling layers typically discard half of the qubits at each step.

We adopt a simple architecture for Clθl\text{C}_l^{{\theta}_l} and Plλl\text{P}_{l}^{{\lambda}_l}. The unitary WW is defined as in Ref. vatan_2004_optimal (external), where the AA gates are defined similarly in terms of RG(θl)=eiXθl1/2eiZθl2/2eiXθl3/2R_G({\theta}_l) = e^{-iX\theta^1_{l}/2}e^{-iZ\theta^2_{l}/2}e^{-iX\theta^3_{l}/2}. Entanglement is achieved applying non-parametrized CZ gates. In the pooling layers, entanglement is followed by local measurements on O\mathcal{O}, enabled by the deferred measurement principle as in Ref. Nielsen (external). The chosen design reduces the complexity of the QGCN circuit by Φθ,λΦθ\Phi_{{\theta}, {\lambda}} \to \Phi_{{\theta}}

A schematic illustration of the full circuit can be found in our work on QGNNs (external) which shares a similar circuit design

  1. 4 grid elements (each with n_qubits features) influence a 5th element
  2. The relationship is encoded in a single continuous target value

Input and Target Tensors (dummy_input and dummy_target)

Section titled “Input and Target Tensors (dummy_input and dummy_target)”
  1. Input
    • Shape: (4, n_qubits): 4 samples representing 4 grid elements with n_qubits features per grid element. Random values between [0, 1)
  2. Target
    • Shape: (1, 1): Single scalar value representing the aggregated property of the 5th connected grid element. Random value between [0, 1)
import torch
n_features = 8
dummy_input = torch.rand(4, n_features).float()
dummy_target = torch.rand(1, 1).float()

Now we perform the training of the QCNN on the dummy data generated earlier.

First we define a class with mean pooling on the QCNN output to match target value

class qcnn_msg_passing(torch.nn.Module):
def __init__(self, qcnn: torch.nn.Module, output_size: int = 1):
super(qcnn_msg_passing, self).__init__()
self.qcnn = qcnn
self.output_size = output_size
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = x.float()
x = self.qcnn(x).float().view(-1, self.output_size)
return torch.mean(x, dim=0, keepdim=True)

The model is hence defined as

# QCNN model
from qadence import RX, RZ, CZ, QCNN
n_qubits = n_features
hidden_depth = [1,1,1]
operations = [RX, RZ, RX]
entangler = CZ
random_meas = True
# qgcn_circuit = QGCN(
qcnn_circuit = QCNN(
n_inputs=n_qubits,
n_qubits=n_qubits,
depth=hidden_depth,
operations=operations,
entangler=entangler,
random_meas=random_meas,
is_corr=False,
)
model = qcnn_msg_passing(qcnn_circuit, dummy_target.shape[1])

The training is performed as follows:

n_epochs = 100
lr = 0.01
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
# Training loop
for epoch in range(n_epochs + 1):
optimizer.zero_grad()
output = model(dummy_input)
loss = torch.nn.functional.mse_loss(output, dummy_target)
loss.backward()
optimizer.step()
if epoch % 10 == 0:
print(f"Epoch={epoch:>4d} | Loss={loss.item():>10.6f}")
Epoch= 0 | Loss= 0.361984 Epoch= 10 | Loss= 0.019123 Epoch= 20 | Loss= 0.002146 Epoch= 30 | Loss= 0.002778 Epoch= 40 | Loss= 0.000004 Epoch= 50 | Loss= 0.000360 Epoch= 60 | Loss= 0.000022 Epoch= 70 | Loss= 0.000027 Epoch= 80 | Loss= 0.000014 Epoch= 90 | Loss= 0.000000 Epoch= 100 | Loss= 0.000002

  1. We also refer to WW as convolutional cell. 

  2. Qubit indexing is 0-based, i.e., 0,1,,N10, 1, \dots, N-1