OneFlavor Algorithms for Simulation of Lattice QCD with DomainWall Fermion: EOFA versus RHMC
Abstract:
We compare the performances of the exact oneflavor algorithm (EOFA) and the rational hybrid Monte Carlo algorithm (RHMC), for dynamical simulations of lattice QCD with domainwall fermion.
1 Introduction
Recently, an exact pseudofermion action for hybrid Monte Carlo simulation (HMC) of lattice QCD with oneflavor of domainwall fermion (DWF) has been derived, with the effective 4dimensional Dirac operator equal to the optimal rational approximation of the overalp Dirac operator with kernel , where and are constants, and is the standard WilsonDirac operator plus a negative parameter () [1]. Since the action is exact without taking square root, it does not require a large memory space to compute the fermion force, unlike the widely used rational hybrid Monte Carlo algorithm (RHMC) [2]. In the following, we refer the HMC with the exact oneflavor pseudofermion action as the exact oneflavor algorithm (EOFA). Obviously, the memorysaving feature of EOFA is crucial for largescale simulations of lattice QCD on any platforms. This is especially true for GPUs, since each GPU has enormous floatingpoint computing power but limited device memory. For example, using EOFA, two GPUs (each of 6 GB device momory, e.g., Nvidia GTXTITAN) working together is capable to simulate lattice QCD with DWF quarks on the lattice, while this is infeasible for RHMC. About the computational efficiency of EOFA, our studies in Ref. [1] suggest that EOFA is compatible with RHMC. However, in Ref. [1], a salient feature of EOFA has not been exploited. Namely, and [see Eq. (23) of Ref. [1]] can be updated at two different time scales, since the fermion force of is much smaller than that of . Now, applying the multipletime scale method to and , we find that EOFA is more efficient than RHMC for all variants of DWF. In this paper, we demonstrate that this is the case for the conventional DWF with kernel (i.e, , and ), and the optimal DWF [3] with kernel (i.e., , , and the reflectionsymmetric ), and the tests are performed for and QCD respectively.
Since the details of EOFA have been given in Ref. [1], we do not repeat them here. In the following, we outline how we implement RHMC in our tests. The pseudofermion action for RHMC of DWF can be written as
(1) 
where is defined as [4],
and the number of poles in the optimal rational approximation of and is . The pseudofermion field is generated from the Gaussian noise field as follows.
At this point, we note that one can use the following DWF action to reduce the memory consumption in RHMC.
where and are independent pseudofermion fields. Then the fermion force due to poles can be computed in subsets, each with multipleshift CG. Thus the memory consumption can be reduced by a factor of . However, it takes more time to compute these subsets than just one set with poles. To save time, one may apply the multipletime scale method to these subsets. Nevertheless, one cannot apply the mass preconditioning for this action, which may be a drawback of this approach. In the following, we use (1) for RHMC in all tests.
2 Memory Requirements for EOFA and RHMC
Defining , then the link variables (with each matrix in the format of 2column storage) take , the conjugate momenta , and a 5D vector (on the 5dimensional lattice) , where is the extent in the fifth dimension.
For EOFA, it takes to store the old and new gauge configurations, for the conjugate momenta, for and (pseudofermion fields) of each fermion, and for the fermion force. To compute the fermion force by conjugate gradient, it needs for the working space. Thus the memory requirement for EOFA (with one heavy mass preconditioner) amounts to
(2) 
For RHMC, it takes (for old and new gauge configurations), (for conjugate momenta and fermion force), and for the pseudofermion fields (the light fermion and the heavy mass preconditioner) after taking into account of evenodd preconditioning. To compute the fermion force, it needs 5D vectors for multshift CG and working space, where is the number of poles used in the rational approximation. Thus the memory requirement for RHMC is
(3) 
From (2) and (3), the ratio is
independent of the size of the 4D lattice. For example, for and , the ratio is 6.58. In other words, for HMC of oneflavor QCD with DWF on the lattice, EOFA takes 12 GB, while RHMC with requires at least 79 GB. Obviously, the memorysaving feature of EOFA has significant impacts to largescale simulations on any platforms, especially for GPUs.
3 Computational Efficiencies of EOFA and RHMC
To compare the efficiencies of EOFA and RHMC, we perform the following tests:

QCD on the lattice

Conventional DWF with kernel (i.e., , , and ) and .

Optimal DWF with kernel (i.e., , , and the reflectionsymmetric with ) and .


and QCD on the lattice

Conventional DWF with kernel (i.e., , , and ) and .

In all cases, the gauge action is the Wilson plaquette action at . In the molecular dynamics, we use the Omelyan integrator [5], the multipletime scale method [6], and the auxiliary heavy fermion field [7]. For QCD, the seaquark mass is set to , with the heavy mass preconditioner for conventional (optimal) DWF. For QCD, the seaquark masses are set to with the heavy mass preconditioner , and the values of and its mass preconditioner . In RHMC, the number of poles in the optimal rational approximation of and is fixed to for the lattice size , while for .
3.1
(a)  (b) 
(a)  (b) 
First, we compare the HMC chracteristics of EOFA and RHMC. In Fig. 1, we plot the change of Hamiltonian versus the trajectory number (after thermalization), for EOFA and RHMC respectively. In both cases, is quite smooth without any spikes. Moreover, the measured values of are:
EOFA  RHMC  

Conventional DWF  1.0003(16)  1.0038(17) 
Optimal DWF  0.9991(17)  0.9994(18) 
They are all in good agreement with the condition which follows from the areapreserving property of HMC.
In Fig. 2, we plot the maximum force (averaged over all links) in each trajectory, for the gauge field, the heavy fermion field, and the light fermion field respectively. For both EOFA and RHMC, the fermion forces behave smoothly in all trajectories. However, the fermion forces of EOFA are substantially smaller than their counterparts in RHMC. The averages of the maximum fermion forces are:
EOFA  RHMC  

Conventional DWF  0.0046(1)  0.0331(2)  0.0609(1)  0.1318(2)  0.1076(1)  0.2855(1) 
Optimal DWF  0.0009(2)  0.0139(2)  0.0487(1)  0.1810(1)  0.0695(3)  0.3534(1) 
Note that for EOFA, the fermion forces of are much smaller than their counterparts of . This immediately implies that and can be updated at two different time scales. This will be exploited in the tests on the lattice.
For tests of QCD on the lattice, we set the multipletime scales as follows. With the length of the HMC trajectory equal to one, three different time scales are set to , and the fields are updated according to the following assignment:
Thus the smallest time interval in the molecular dynamic is , and the numbers of momentum updates for are respectively, according to the Omelyan integrator.
Using one core of Intel i73820 [email protected], we measure the average time per HMC trajectory (T) and the acceptance rate (A) after thermalization, and obtain the following results.
EOFA  RHMC  
T (seconds)  A  T (seconds)  A  
Conventional DWF  6293(77)  0.980(9)  7365(96)  0.996(4) 
Optimal DWF  8916(263)  0.980(9)  10657(538)  0.984(8) 
Thus, in both cases (conventional DWF and optimal DWF), EOFA outperforms RHMC for QCD on the lattice.
3.2
Next we turn to tests of and QCD on the lattice, for the conventional DWF with kernel . The details of the simulation of 2flavors of DWF have been presented in Ref. [4]. After the initial thermalization of 300 trajectories (done with a GPU), we pick one configuration and use 4 cores CPU of i74820K [email protected] to continue the HMC simulation with EOFA and RHMC respectively, and accumulate 5 trajectories in each case. With the length of the HMC trajectory equal to one, four different time scales are set to , and the fields are updated according to the following assignment:
Then the smallest time interval in the molecular dynamic is , and the numbers of momentum updates for are respectively, according to the Omelyan integrator.
With the statistics of five trajectories (all accepted), the average time (seconds) for generating one HMC trajectory (after thermalization) is listed below.
EOFA  RHMC  

93241(290)  119445(408)  
143099(833)  172569(588) 
These results suggest that EOFA outperforms RHMC for and QCD with the conventional DWF.
4 Conclusion
In this paper, we compare the performances of EOFA and RHMC, for and QCD with DWF, on the and lattices respectively. Our results suggest that EOFA outperforms RHMC, no matter in terms of the computational efficiency or the memory requirement. This makes EOFA a better choice for dynamical simulations of lattice QCD with DWF. Currently, TWQCD Collaboration is using EOFA to simulate lattice QCD with quarks on the and lattices, with Nvidia GPUs (GTXTITAN).
Acknowledgments.
This work is supported in part by the Ministry of Science and Technology (No. NSC1022112M002019MY3) and NTUCQSE (Nos. 103R891404).References
 [1] Y. C. Chen and T. W. Chiu [TWQCD Collaboration], Phys. Lett. B 738, 55 (2014)
 [2] M. A. Clark and A. D. Kennedy, Phys. Rev. Lett. 98, 051601 (2007)
 [3] T. W. Chiu, Phys. Rev. Lett. 90, 071601 (2003)
 [4] T. W. Chiu [TWQCD Collaboration], J. Phys. Conf. Ser. 454, 012044 (2013)
 [5] I.P. Omelyan, I.M. Mryglod, and R. Folk, Phys. Rev. Lett. 86, 898 (2001).
 [6] J. C. Sexton and D. H. Weingarten, Nucl. Phys. B 380, 665 (1992).
 [7] M. Hasenbusch, Phys. Lett. B 519, 177 (2001)