Skip to content
Pasqal Documentation

Estimating Memory Consumption and Runtime

The presence of the max_bond_dim and max_krylov_dim config parameters means an upper bound on memory consumption and the complexity of the time evolving algorithm can be estimated. By limiting the max_bond_dim of a simulation to make it fit in the available hardware memory, it can be guaranteed to run for arbitrary times for an arbitrary number of qubits. Of course, the sources of error described on the error page imply that limiting the memory consumption of the program will negatively impact the quality of the results once a certain threshold is exceeded. The page in this link, for example, outlines a case study to determine whether emulation results are accurate.

Estimating the memory consumption of a simulation

Section titled “Estimating the memory consumption of a simulation”

In this section we outline how to estimate the memory consumption of a simulation, for a given max_bond_dim, a Krylov subspace of size max_krylov_dim, and for NN being the number of qubits to be simulated. There are four contributions to the peak memory consumption of emu-mps that will be discussed in the next sections:

  • the state
  • the baths
  • the Krylov space
  • temporary tensors

The quantum state is stored in MPS format (see here). At worst, the bond dimensions of the tensors in an MPS grow exponentially inwards as

2,4,8,16...,16,8,4,2 2,4,8,16...,16,8,4,2

in which case an MPS will take more memory than a state vector. Let χ\chi denote the value of max_bond_dim. When χ<2N/2\chi<2^{N/2}, the bond dimensions in the center all cap at that fixed value of max_bond_dim. Since each tensor in the MPS has 2 bonds of size at most χ\chi, and a physical index of size p=2p=2, where each element in the tensor takes s=16s=16 bytes (2 8-byte floats to store a complex number), the memory consumption of the state reads

ψ<spNχ2=32Nχ2 |\psi| < spN\chi^2 = 32N\chi^2

Note that this is a strict over-estimation because the outer bonds in the MPS will be much smaller than χ\chi.

For TDVP, for each qubit a left and a right bath tensor is stored. The bath tensors are used to compute an effective interaction between the 2-qubit subsystem being evolved, and the rest of the system (see here). Each of them has 3 indices. Two of them will have a size that depends on the state that is evolved, here upper bounded by the maximum allowed value χ\chi for the bond dimension. For the third, the size hh will depend on the interaction type. In concrete, for each qubit, hh will be the bond dimension of the MPO representation of the Hamiltonian. In summary, for the Rydberg Hamiltonian we expect that h=2+floor(n/2)h=2+\text{floor}(n/2), and for the XY Hamiltonian that h=2+2floor(n/2)h=2+2\text{floor}(n/2), where in both cases n=min(i,Ni)n=\text{min}(i,N-i) for the bath associated with the qubit ii. The computation is slightly involved, but summing all the contributions leads to a total memory occupation of the baths:

bath<Nsχ2h=4χ2N(N+10) |\mathrm{bath}| < Ns\chi^2h = 4\chi^2N(N+10)

Note that the baths take up more memory than the state, always, and potentially much more. Furthermore, just as for the state this is a strict over-estimation, because it assumes all the bonds in the state are of size χ\chi.

The remainder of the memory consumption is to compute the time-evolution of qubit pairs in TDVP. This is done by contracting 2 tensors from the MPS together into a single 2-qubit tensor, and time-evolving it by applying an effective Hamiltonian constructed from the baths and the Hamiltonian MPO. Each 2-qubit tensor has a size bounded by sp2χ2sp^2\chi^2, so the memory of the Krylov vectors used in the Lanczos algorithm reads

krylovksp2χ2=64kχ2 |\mathrm{krylov}| \leq ksp^2\chi^2 = 64k\chi^2

where kk is the value of max_krylov_dim. Recall that the default value of k=100k=100 and if the Lanczos algorithm requires more Krylov vectors to converge to the tolerance, it will error, rather than exceed the above bound.

Finally, to compute the above Krylov vectors, the effective two-site Hamiltonian has to be applied to the previous Krylov vector to obtain the next one. The resulting tensor network contraction cannot be done in-place, so it has to store two intermediate results that get very large. The intermediate results take the most memory at the center qubit, where the bond dimension of the Hamiltonian becomes hh, where

intermediate=2shp2χ2=128hχ2 |\mathrm{intermediate}| = 2*shp^2\chi^2 = 128h\chi^2

It should be noted that the value of hh cited above assumes that all qubits in the system interact via a two-body term, which is technically true for the Rydberg interaction.

Putting all of this together, for the total memory consumption mm of the program, we can write the following bound:

m(N,χ,k)=ψ+bath+krylov+intermediate<32Nχ2+4χ2N(N+10)+64kχ2+64(N+4)χ2=4χ2[N(N+34)+16k+64] m(N,\chi,k) = |\psi| + |\mathrm{bath}| + |\mathrm{krylov}| + |\mathrm{intermediate}| < 32N\chi^2 + 4\chi^2N(N+10) + 64*k*\chi^2 + 64(N+4)\chi^2 = 4\chi^2[N(N+34) + 16k + 64]

Note that this estimate is pessimistic, since not all kk Krylov vectors are likely to be needed, and not all tensors in ψ\psi and the baths have the maximum bond dimension dd. On the other hand, the estimate for intermediate|intermediate| is likely to be accurate, since the bond dimension of χ\chi is probably attained at the center qubit.

To test the accuracy of the above memory estimations, we run the TDVP time evolution algorithm, fixing the bond dimension to a particular desired value. For different combinations of the number of atoms in a register NN and the fixed bond dimension chichi, we collect the maximum resident size, or RSS, which is expected to capture the maximum memory needed to run the emulation. We plot the RSS in the following picture (left), as a function of the number of qubits and for different bond dimensions. Notice that, once the RSS is normalized by χ2\chi^2, as suggested by our estimate above, all the points fall into the same functional dependency on the number of atoms. Moreover, as we plot the normalized function m(N,χ,k)/χ2m(N,\chi,k)/\chi^2, for a reasonable estimate of the size of the Krylov subspace (k=30k=30), it is clear that our upper bound on memory occupation can be reasonably trusted on a wide range of qubit number and bond dimensions.

Finally, having established an estimate for the memory consumption, it makes sense to explore what are the available regimes of qubits/bond dimension can be reached for a given hardware capability. Since all heavy simulations will be run on an NVIDIA A100 (on Pasqal's DGX cluster), we have 40 GB of available memory. Therefore, above, we show (right image) the contour lines of the RSS estimate m(N,χ,k=30)<40m(N,\chi,k=30) < 40 GB for particular useful values of the total memory, allowing to quickly estimate the memory footprint of an emu-mps emulation.

For example, the results from the case study were obtained using N=49N=49 and d=1600d=1600 on 2 GPUs. Taking the above formula, and halving the contributions from ψ\psi and bath|\mathrm{bath}| since they are split evenly on the GPUs, we reproduce the memory consumption of the program for k=13k=13. Notice that the actual number of Krylov vectors required to reach convergence is likely closer to around 3030, but here we underestimate it, since the contributions of ψ\psi and bath|\mathrm{bath}| are over-estimated.

Similarly to the previous section, here, we briefly estimate the complexity of the two-site TDVP algorithm we use to time evolve the state in a single pulse sequence step. As before, the two relevant computational steps are

  • Computing the baths
  • Applying the effective Hamiltonian

In both cases, it will boil down to an exercise in complexity estimation of tensor network contractions. For simplicity, as before, we will restrict to the worst case scenario in which the bond dimension χ\chi always take the maximum allowed value. Importantly, another significant contribution to the runtime can come from computing complex observables like two-point correlation functions, which is not included here.

Roughly, baths computation involves the represented tensor network contraction:

Each of these tensor multiplication takes respectively O(phχ3)O(ph\chi^3), O(p2h2χ2)O(p^2h^2\chi^2), and O(phχ3)O(ph\chi^3). In an all-to-all Rydberg interaction, we already argued that the bond dimension of the Hamiltonian MPO should scale as the number of atoms. Moreover, the left and right baths need to be computed roughly N times, thus the overall expected complexity is O(N2χ3)+O(N3χ2)O(N^2\chi^3) + O(N^3\chi^2).

Contribution from the effective Hamiltonian

Section titled “Contribution from the effective Hamiltonian”

Applying the effective two-body Hamiltonian is slightly a more involved tensor network contraction:

In steps, it is composed by applying:

  • the left bath: O(p2hχ3)O(p^2h\chi^3)
  • a two-body term coming form the MPO Hamiltonian: O(p4h2χ2)O(p^4h^2\chi^2)
  • the right bath: O(p2hχ3)O(p^2h\chi^3)

As before, for an all-to-all Rydberg interaction we expect hNh\sim N. Moreover, the effective Hamiltonian application needs to be done kk times, to build the appropriate Krylov subspace, and for every pair. Finally, to complete the time evolution and bring back the tensors of the state into an MPS form, a final singular value decomposition is required. For every pair, this requires O(Nχ3)O(N\chi^3) to be done. Overall, the expected complexity is thus O(kN2χ3)+O(kN3χ2)+O(Nχ3)O(kN^2\chi^3) + O(kN^3\chi^2) + O(N\chi^3).

From the previous complexity estimations, we thus expect the complexity of the two-sites TDVP algorithm to have two main contributions

ΔtTDVP(N,χ,k)αN2χ3+βN3χ2\Delta t_{\text{TDVP}}(N,\chi,k)\sim \alpha N^2\chi^3 + \beta N^3\chi^2

To check such estimation, as before, we run TDVP multiple times, measuring the average runtime to perform a step. Below, we show the obtained results for different number of atoms in a register NN at fixed bond dimension χ\chi (left), and at different fixed NN but increasing the bond dimension (left). On top of these data points, we also plot the resulting fit of the complexity estimation presented in the equation above. Remarkably, with just two parameters α\alpha and β\beta with get good agreement.

To wrap up, and to provide an useful tool for runtime estimation for emu-mps, the time to perform a single time step in a sequence can be conveniently visualized (below) for both NN and χ\chi on contour lines.

Superimposing the 40 GB hardware constrain derived in the previous section, it is easy to see that in worst-case scenario, a TDVP step will take roughly 250 seconds to be computed.