Abstract
The translation equivariance of convolutional layers enables convolutional neural networks to generalize well on image problems. While translation equivariance provides a powerful inductive bias for images, we often additionally desire equivariance to other transformations, such as rotations, especially for nonimage data. We propose a general method to construct a convolutional layer that is equivariant to transformations from any specified Lie group with a surjective exponential map. Incorporating equivariance to a new group requires implementing only the group exponential and logarithm maps, enabling rapid prototyping. Showcasing the simplicity and generality of our method, we apply the same model architecture to images, ballandstick molecular data, and Hamiltonian dynamical systems. For Hamiltonian systems, the equivariance of our models is especially impactful, leading to exact conservation of linear and angular momentum.
Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data
Generalizing Convolutional Neural Networks for Equivariance
to Lie Groups on Arbitrary Continuous Data
Marc Finzi \icmlauthorSamuel Stanton \icmlauthorPavel Izmailov \icmlauthorAndrew Gordon Wilson
New York University
Equivariance, Point Clouds, Hamiltonian Neural Networks, Lie Groups
1 Introduction
Symmetry pervades the natural world. The same law of gravitation governs a game of catch, the orbits of our planets, and the formation of galaxies. It is precisely because of the order of the universe that we can hope to understand it. Once we started to understand the symmetries inherent in physical laws, we could predict behavior in galaxies billions of lightyears away by studying our own local region of time and space. For statistical models to achieve their full potential, it is essential to incorporate our knowledge of naturally occurring symmetries into the design of algorithms and architectures. An example of this principle is the translation equivariance of convolutional layers in neural networks (LeCun et al., 1995): when an input (e.g. an image) is translated, the output of a convolutional layer is translated in the same way.
Group theory provides a mechanism to reason about symmetry and equivariance. Convolutional layers are equivariant to translations, and are a special case of group convolution. A group convolution is a general linear transformation equivariant to a given group, used in group equivariant convolutional networks (Cohen and Welling, 2016a).
In this paper, we develop a general framework for equivariant models on arbitrary continuous (point) data represented as coordinates and values . Point data is a broad category, including ballandstick representations of molecules, the coordinates of a dynamical system, and images (shown in Figure 1). When the inputs or group elements lie on a grid (e.g., image data) one can simply enumerate the values of the convolutional kernel at each group element. But in order to extend to continuous data, we define the convolutional kernel to be a continuous function on the group parameterized by a neural network.
We consider the large class of continuous groups known as Lie groups. In most cases, Lie groups can be parameterized in terms of a vector space of infinitesimal generators (the Lie algebra) via the logarithm and exponential maps. Many useful transformations are Lie groups, including translations, rotations, and scalings. We propose LieConv, a convolutional layer that can be made equivariant to a given Lie group by defining and maps. We demonstrate the expressivity and generality of LieConv with experiments on images, molecular data, and dynamical systems. We emphasize that we use the same network architecture for all transformation groups and data types. LieConv achieves stateoftheart performance in these domains, even compared to domainspecific architectures. In short, the main contributions of this work are as follows:

We propose LieConv, a new convolutional layer equivariant to transformations from Lie groups. Models composed with LieConv layers can be applied to arbitrary continuous data.

We evaluate LieConv on the image classification benchmark dataset rotMNIST (Larochelle et al., 2007), and the regression benchmark dataset QM9 (Blum and Reymond, 2009; Rupp et al., 2012). LieConv outperforms stateoftheart methods on some tasks in QM9, and in all cases achieves competitive results.

We apply LieConv to modeling the Hamiltonian of physical systems, where equivariance corresponds to the preservation of physical quantities (energy, angular momentum, etc.). LieConv outperforms stateoftheart methods for the modeling of dynamical systems.
We make code available at
https://github.com/mfinzi/LieConv
2 Related Work
One approach to constructing equivariant CNNs, first introduced in Cohen and Welling (2016a), is to use standard convolutional kernels and transform them or the feature maps for each of the elements in the group. For discrete groups this approach leads to exact equivariance and uses the socalled regular representation of the group (Cohen et al., 2019). This approach is easy to implement, and has also been used when the feature maps are vector fields (Zhou et al., 2017; Marcos et al., 2017), and with other representations (Cohen and Welling, 2016b), but only on image data where locations are discrete and the group cardinality is small. This approach has the disadvantage that the computation grows quickly with the size of the group, and some groups like 3D rotations cannot be easily discretized onto a lattice that is also a subgroup.
Another approach, drawing on harmonic analysis, finds a basis of equivariant functions and parametrizes convolutional kernels in that basis (Worrall et al., 2017; Weiler and Cesa, 2019). These kernels can be used to construct networks that are exactly equivariant to continuous groups. While the approach has been applied on general data types like spherical images (Esteves et al., 2018; Cohen et al., 2018; Jiang et al., 2019), voxel data (Weiler et al., 2018), and point clouds (Thomas et al., 2018; Anderson et al., 2019), the requirement of working out the representation theory for the group can be cumbersome and timeconsuming, and is limited to compact groups. Our approach reduces the amount of work to implement equivariance to a new group, encouraging rapid prototyping.
There is also work applying Lie group theory to deep neural networks. Huang et al. (2017) define a network where the intermediate activations of the network are 3D rotations representing skeletal poses and embed elements into the Lie algebra using the map. Bekkers (2019) use Bsplines to define convolutional kernels acting on a Lie algebra which they evaluate on a grid and apply to image problems. However, their method is not readily applicable to point data and cannot be used when the input space is not a homogeneous space of the group. Both of these issues are addressed by our work.
3 Background
3.1 Equivariance
A mapping is equivariant to a set of transformations if when we apply any transformation to the input of , the output is also transformed by . The most common example of equivariance in deep learning is the translation equivariance of convolutional layers: if we translate the input image by an integer number of pixels in and , the output is also translated by the same amount (ignoring the regions close to the boundary of the image). Formally, if , and is a set of transformations acting on , we say is equivariant to if , ,
(1) 
The continuous convolution of a function with the kernel is equivariant to translations in the sense that where translates the inputs of the function by : .
It is easy to construct invariant functions, where transformations on the input do not affect the output, by simply discarding information. It is not easy to construct equivariant transformations, but it is necessary. Strict invariance unnecessarily limits the expressive power by discarding relevant information.
3.2 Groups of Transformations and Lie Groups
Many important sets of transformations form a group. To form a group the set must be closed under composition, include an identity transformation, each element must have an inverse, and composition must be associative. The set of 2D rotations, SO(), is a simple and instructive example. Composing two rotations and , yields another rotation . There exists an identity that maps every point in to itself (i.e., rotation by a zero angle). And for every rotation , there exists an inverse rotation such that . Finally, the composition of rotations is an associative operation: . So we see that SO() is indeed a group.
We can also adopt a more familiar view of SO() in terms of angles, where a rotation matrix is parametrized as . is the antisymmetric matrix is the matrix exponential. Note that is totally unconstrained. Using we can add and subtract rotations. Given , we can compute . is an example of the Lie algebra parametrization of a group, and SO() forms a Lie group. (an infinitesimal generator of the group) and
More generally, a Lie group is a group whose elements form a smooth manifold. Since is not a vector space, we cannot add or subtract group elements. However, the Lie algebra of , the tangent space at the identity, , is a vector space and can be understood informally as a space of infinitesimal transformations from the group. As a vector space, one can readily expand elements in a basis and use the components for calculations. The Lie bracket between two elements in , , measures the extent to which the infinitesimal transformations fail to commute.
The exponential map gives a mapping from the Lie algebra to the Lie group, converting infinitesimal transformations to group elements. In many cases, the image of the exponential map covers the group, and an inverse mapping can be defined. For matrix groups the map coincides with the matrix exponential (), and the map with the matrix logarithm. Matrix groups are particularly amenable to our method because in many cases the and maps can be computed in closed form. For example, there are analytic solutions for the translation group T(), the rotation group SO(), the translation and rotation group SE() for , the rotationscale group SO(), and many others (Eade, 2014). In the event that an analytic solution is not available there are reliable numerical methods at our disposal (Moler and Van Loan, 2003).
3.3 Group Convolutions
Adopting the convention of left equivariance, one can define a group convolution between two functions on the group, which generalizes the translation equivariance of convolution to other groups:
Definition 1.
Let , and be the Haar measure on . For any , the convolution of and on at is given by
(2) 
3.4 PointConv Trick
In order to extend learnable convolution layers to point clouds, not having the regular grid structure in images, Dai et al. (2017), Simonovsky and Komodakis (2017), and Wu et al. (2019) go back to the continuous definition of a convolution for a single channel between a learned function (convolutional filter) and an input feature map yielding the function ,
(3) 
We approximate the integral using a discretization:
(4) 
Here is the volume of the space integrated over and is the number of quadrature points. In a convolutional layer for images, where points fall on a uniform square grid, the filter has independent parameters for each of the inputs . In order to accommodate points that are not on a regular grid, can be parametrized as a small neural network, mapping input offsets to filter matrices, explored with MLPs in Simonovsky and Komodakis (2017). The compute and memory costs has severely limited this approach, for typical CIFAR10 images with , evaluating a single layer requires computing billion values for .
In PointConv, Wu et al. (2019) develop a trick where clever reordering of the computation cuts memory and computational requirements by orders of magnitude, allowing them to scale to the point cloud classification, segmentation datasets ModelNet40 and ShapeNet, and the image dataset CIFAR10. We review and generalize the EfficientPointConv trick in Appendix A.1, which we will use to accelerate our method.
4 Convolutional Layers on Lie Groups
We now introduce LieConv, a new convolutional layer that can be made equivariant to a given Lie group. Models with LieConv layers can act on arbitrary collections of coordinates and values , for and where is a vector space. This includes point cloud data (a collection of points forming an object in ), featurized ball and stick molecular data, the spatial arrangement of a multibody system, and image data.
We begin with a highlevel overview of the method. In Section 4.1 we discuss transforming raw inputs into group elements on which we can perform group convolution. We refer to this process as lifting. Section 4.2 addresses the irregular and varied arrangements of group elements that result from lifting arbitrary continuous input data by parametrizing the convolutional kernel as a neural network. In Section 4.3, we show how to enforce the locality of the kernel by defining an invariant distance on the group. In Section 4.4, we define a Monte Carlo estimator for the group convolution integral in Eq. (2) and show that this estimator is equivariant in distribution. In Section 4.5, we extend the procedure to cases where the group does not cover the input space (i.e., when we cannot map any point to any other point with a transformation from the group). Additionally, in Appendix A.2, we show that our method generalizes coordinate transform equivariance when is Abelian. At the end of Section 4.5 we provide a concise algorithmic description of the lifting procedure and our new convolution layer.
4.1 Lifting from to
If is a homogeneous space of , then every two elements in are connected by an element in , and one can lift elements by simply picking an origin and defining : all elements in the group that map the origin to . This procedure enables lifting tuples of coordinates and features , with up to elements group elements for each input.^{1}^{1}1When , lifting in this way is equivalent to defining as in Kondor and Trivedi (2018). To find all the elements , one simply needs to find one element and use the elements in the stabilizer of the origin , to generate the rest with . For continuous groups the stabilizer may be infinite, and in these cases we sample uniformly using the Haar measure which is described in Appendix B.2. We visualize the lifting procedure for different groups in Figure 2.
4.2 Parameterization of the Kernel
The conventional method for implementing an equivariant convolutional network (Cohen and Welling, 2016a) requires enumerating the values of over the elements of the group, with separate parameters for each element. This procedure is infeasible for irregularly sampled data and problematic even for a discretization because there is no generalization between different group elements. Instead of having a discrete mapping from each group element to the kernel values, we parametrize the convolutional kernel as a continuous function using an fully connected neural network with Swish activations, varying smoothly over the elements in the Lie group.
However, as neural networks are best suited to learn on euclidean data and does not form a vector space, we propose to model by mapping onto the Lie Algebra , which is a vector space, and expanding in a basis for the space. To do so, we restrict our attention in this paper to Lie groups whose exponential maps are surjective, where every element has a logarithm. This means defining , where is the function parametrized by an MLP, . Surjectivity of the map guarantees that , although not in the other order.
4.3 Enforcing Locality
Important both to the inductive biases of convolutional neural networks and their computational efficiency is the fact that convolutional filters are local, for . In order to quantify locality on matrix groups, we introduce the function:
(5) 
where is the matrix logarithm, and is the Frobenius norm. The function is left invariant, since , and is a semimetric (it does not necessarily satisfy the triangle inequality). In Appendix A.3 we show the conditions under which is additionally the distance along the geodesic connecting from a left invariant metric tensor (satisfied for all groups we use except SE()), a generalization of the well known formula for the geodesic distance between rotations (Kuffner, 2004).
To enforce that our learned convolutional filter is local, we can use our definition of distance to only evaluate the sum for , implicitly setting outside a local neighborhood ,
(6) 
This restriction to a local neighborhood does not break equivariance precisely because is left invariant. Since this restriction is equivalent to multiplying by the indicator function which depends only on . Note that equivariance would have been broken if we used neighborhoods that depend on fixed regions in the input space like the square region. Figure 3 shows what these neighborhoods look like in terms of the input space.
4.4 Discretization of the Integral
Assuming that we have a collection of quadrature points as input and the function evaluated at these points, we can judiciously choose to evaluate the convolution at another set of group elements , so as to have a set of quadrature points to approximate an integral in a subsequent layer. Because we have restricted the integral (6) to the compact neighbourhood , we can define a proper sampling distribution to estimate the integral, unlike for the possibly unbounded . Computing the outputs only at these target points, we use the Monte Carlo estimator for (1) as
(7) 
where the number of points in each neighborhood.
For , the Monte Carlo estimator is equivariant (in distribution).
Proof: Recalling that we can absorb the local neighborhood into the definition of using an indicator function, we have
Here , and the last line follows from the fact that the random variables are equal in distribution because they are sampled from the Haar measure with property . Now that we have the discretization , we can accelerate this computation using the EfficientPointConv trick, with the argument of for the MLP. See Appendix A.1 for more details. Note that we can also apply this discretization of the convolution when the inputs are not functions , but simply coordinates and values , and the mapping is still equivariant.
We also detail two methods for equivariantly subsampling the elements to further reduce the cost in Appendix A.4.
4.5 More Than One Orbit?
In this paper, we consider groups both large and small, and we require the ability to enable or disable equivariances like translations. To achieve this functionality, we need to go beyond the usual setting of homogeneous spaces considered in the literature, where every pair of elements in are related by an element in . Instead, we consider the quotient space , consisting of the distinct orbits of in (visualized in Figure 4).^{2}^{2}2When is a homogeneous space and the quantity of interest is the quotient with the stabilizer of the origin : , which has been examined extensively in the literature. Here we concerned with the separate quotient space , relevant when is not a homogeneous space. Each of these orbits is a homogeneous space of the group, and when is a homogeneous space of then there is only a single orbit. But in general, there will be many distinct orbits, and lifting should not lose which orbit each point is on.
Since the most general equivariant mappings will need to use and preserve this information, throughout the network the space of elements should not be but rather , and is lifted to the tuples for and . This mapping may be onetoone or onetomany depending on the size of , but will preserve the information in as where is the chosen origin for each orbit. In general, equivariant linear transforms will depend on both the input and output orbit, and equivariance only constrains the dependence on group elements and not the orbits.
When the space of orbits is continuous we can write the equivariant integral transform as
(8) 
When is the trivial group , this equation simplifies to the integral transform where each element in is in its own orbit.
In general, even if is a smooth manifold and is a Lie group it is not guaranteed that is a manifold (Kono and Ishitoya, 1987). However in practice this is not an issue as we only care about the discretization which we can always do. All we need is an invertible way of embedding the orbit information into a vector space to be fed into . One option is to use an embedding of the orbit origin , or simply find enough invariants of the group to identify the orbit. To give a few examples:

and

and

and
Discretizing (8) as we did in (7), we get
(9) 
which again can be accelerated with the EfficientPointConv trick by feeding in as input to the MLP. If we want the filter to be local over orbits also, we can extend the distance , which need not be invariant to transformations on . To the best of our knowledge, we are the first to systematically address equivariances of this kind, where is not a homogeneous space of .
To recap, Algorithms 1 and 2 give a concise overview of our lifting procedure and our new convolution layer respectively. Please consult Appendix B.1 for additional implementation details.
Baseline Methods  LieConv (Ours)  

GCNN  HNET  ORN  TIPooling  RotEqNet  E(2)Steerable  Trivial  T()  SO()  SO()  SE() 
2.28  1.69  1.54  1.2  1.09  0.77  1.57  1.50  1.40  1.33  1.39 
Task  ZPVE  

Units  bohr  meV  meV  meV  D  cal/mol K  meV  meV  bohr  meV  meV  meV 
NMP  .092  69  43  38  .030  .040  19  17  .180  20  20  1.500 
SchNet  .235  63  41  34  .033  .033  14  14  .073  19  14  1.700 
Cormorant  .085  61  34  38  .038  .026  20  21  .961  21  22  2.027 
LieConv(T3)  .084  49  30  25  .032  .038  22  24  .800  19  19  2.280 
5 Applications to Image and Molecular Data
First, we evaluate LieConv on two types of problems: classification on image data and regression on molecular data. With LieConv as the convolution layers, we implement a bottleneck ResNet architecture with a final global pooling layer (Figure 5). For a detailed architecture description, see Appendix B.3. We use the same model architecture for all tasks and achieve performance competitive with taskspecific specialized methods.
5.1 Image Equivariance Benchmark
The RotMNIST dataset consists of 12k randomly rotated MNIST digits with rotations sampled uniformly from . This commonly used dataset has been a standard benchmark for equivariant CNNs focused on image data. To apply LieConv to image data we interpret each input image as a collection of points on with associated binary values: to which we apply a circular center crop. We note that LieConv is primarily targeting generic continuous data, and more practical equivariant methods exist specifically for images (e.g. Weiler and Cesa (2019)). However, as we demonstrate in Table 1, we are able to easily incorporate equivariance to different groups without any changes to the method or the architecture of the network, while achieving performance competitive with methods that are not applicable beyond image data.
5.2 Molecular Data
Now we apply LieConv to the QM9 molecular property learning task (Wu et al., 2018). The QM9 regression dataset consists of small inorganic molecules encoded as a collection of 3D spatial coordinates for each of the atoms, and their atomic charges. The labels consist of various properties of the molecules such as heat capacity. This is a challenging task as there is no canonical origin or orientation for each molecule, and the target distribution is invariant to (translation, rotation, and reflection) transformations of the coordinates. Successful models must generalize across different spatial locations and orientations.
We first perform an ablation study on the Homo problem of predicting the energy of the highest occupied molecular orbital for the molecules. We apply LieConv with different equivariance groups, combined with SO() data augmentation. The results are reported in Table 3. Of the groups, our T() network performs the best. We then apply T()equivariant LieConv layers to the full range of tasks in the QM9 dataset and report the results in Table 2. We perform competitively with stateoftheart methods (Gilmer et al., 2017; Schütt et al., 2018; Anderson et al., 2019), with lowest MAE on several of the tasks.
SchNet  Trivial+aug  T()+aug  SE()+aug 

41  33.9  29.6  44.9 
6 Modeling Dynamical Systems
Accurate transition models for macroscopic physical systems are critical components in control systems (Lenz et al., 2015; Kamthe and Deisenroth, 2017; Chua et al., 2018) and dataefficient reinforcement learning algorithms (Nagabandi et al., 2018; Janner et al., 2019). In this section we show how incorporating equivariance into Hamiltonian dynamics models guarantees that our model system exhibits characteristics like conservation of angular momentum.
6.1 Predicting Trajectories with Hamiltonian Mechanics
For dynamical systems, the equations of motion can be written in terms of the state and time : . Many physically occurring systems have Hamiltonian structure, meaning that the state can be split into generalized coordinates and momenta , and the dynamics can be written as
(10) 
for some choice of scalar Hamiltonian . is often the total energy of the system, and can sometimes be split into kinetic and potential energy terms . The dynamics can also be written compactly as for .
As shown in Greydanus et al. (2019), a neural network parametrizing can be learned directly from trajectory data, providing substantial benefits in generalization over directly modeling . We follow the approach of SanchezGonzalez et al. (2019) and Zhong et al. (2019). Given an initial condition and , we employ a twicedifferentiable model architecture and a differentiable ODE solver (Chen et al., 2018) to compute predicted states . The parameters of the Hamiltonian model can be trained directly through the loss,
(11) 
6.2 Exact Conservation of Momentum
While equivariance is broadly useful as an inductive bias, it has a very special implication for the modeling of Hamiltonian systems. Noether’s Hamiltonian theorem states that for each continuous symmetry in the Hamiltonian of a dynamical system there exists a corresponding conserved quantity (Noether, 1971; Butterfield, 2006). Symmetry with respect to the continuous transformations of translations and rotations lead directly to conservation of the total linear and angular momentum of the system, an extremely valuable property for modeling the dynamics of the system. See Appendix A.5 for a primer on Hamiltonian symmetries, Noether’s theorem, and the implications in the current setting.
As showed in Section 4, we can construct models that are equivariant to a large variety of continuous Lie Group symmetries, and thus exactly conserve associated quantities like linear and angular momentum. Figure 7 shows that using LieConv layers with a given T() and/or SO() symmetry, the model trajectories conserve linear and/or angular momentum with relative error close to machine epsilon, determined by the integrator tolerance. Note that this approach for enforcing conserved quantities would be infeasible with the approach from Cohen and Welling (2016a), not just because the data is not gridded and derivatives with respect to the input positions cannot be computed, but also because there is no corresponding Noether conservation law for discrete symmetry groups.
6.3 Results
For evaluation, we compare fullyconnected (FC) feedforward networks, ODE graph networks (OGN) (Battaglia et al., 2016), and our own LieConv architecture on a canonical physical system prediction task, the spring problem. Figure 6 presents a visualization of the task, and our quantitative results are presented in Figure 7. In the spring problem bodies with mass interact through pairwise spring forces with constants . The system preserves energy, linear momentum, and angular momentum. The behavior of the system is complex and sensitive to both the values of the system parameters (i.e. ) and the initial conditions . The dynamics model must learn not only to predict trajectories across a broad range of initial conditions, but also infer the dependence on varied system parameters, which are treated as additional inputs to the model. We compare models that attempt to learn the dynamics directly against models that learn the Hamiltonian as described in section 6.1. Models that learn the Hamiltonian are differentiated by prepending ‘H’ to the name (e.g. OGN vs. HOGN, LieConv vs. HLieConv).
In Figure 7(a) we show that by changing the invariance of our Hamiltonian models, we have direct control over the conservation of linear and angular momentum in the predicted trajectories. Figure 7(b) demonstrates that our method outperforms HOGN, a SOTA architecture for dynamics problems, and achieves significant improvement over the naïve fullyconnected (FC) model. In Figure 7(c), we demonstrate that conservation requires both symmetry and modeling the Hamiltonian. An equivariant model alone is not enough. We summarize the various models and their symmetries in Table 4.
FC  
OGN  
HOGN  ★  
LieConvT(2)  ✪  
HLieConvTrivial  
HLieConvT(2)  ✪  
HLieConvSO(2)  ✪  
HLieConvSO(2)*  ★  ✪ 
Because of the position vectors are mean centered in the model forward pass , HOGN and HLieConvSO2* have additional T() invariance, yielding SE() invariance for HLieConvSO2*. We also experimented with a HLieConvSE2 equivariant model, but found that the exponential map for SE2 (involving taylor expands and masking) was not numerically stable enough for for second derivatives, required for optimizing through the Hamiltonian dynamics. Layer equivariance is preferable for not prematurely discarding useful information and for better modeling performance, but invariance alone is sufficient for the conservation laws. Additionally, since we know a priori that the spring problem has Euclidean coordinates, we need not model the kinetic energy and instead focus on modeling the potential . We observe that this additional inductive bias of Euclidean coordinates improves model performance.
Finally, in Figure 8 we evaluate test MSE of the different models over a range of training dataset sizes, highlighting the additive improvements in generalization from the Hamiltonian, GraphNetwork, and equivariance inductive biases successively.
7 Discussion
We presented a generalpurpose convolutional layer that can be made equivariant to transformations from a Lie group with a surjective exponential map. While the image, molecular, and dynamics experiments demonstrate the generality of our method, there are many exciting application domains and directions for future work. In Table 2, we believe the performance gap between LieConvSE(3) and LieConvT(3) could arise from the variance of the Monte Carlo estimate of Eq. (6). Approximating the integral with points selected from a lowdiscrepancy sequence may improve the convergence when the dimension of the group is large. Additionally, we believe that it will be possible to benefit from the inductive biases of HLieConv models even for systems that do not exactly preserve energy or momentum, such as those found in control systems and reinforcement learning.
The success of convolutional neural networks on images has highlighted the power of encoding symmetries in models for learning from raw sensory data. But the variety and complexity of other modalities of data is a significant challenge in further developing this approach. More general data may not be on a grid, it may possess other kinds of symmetries, or it may contain quantities that cannot be easily combined. We believe that central to solving this problem is a decoupling of convenient computational representations of data as dense arrays with generically composable vectors. We hope to move towards models that can ‘see’ molecules, dynamical systems, multiscale objects, heterogeneous measurements, and higher mathematical objects, in the way that convolutional neural networks perceive images.
References
 Anderson et al. [2019] Brandon Anderson, Truong Son Hy, and Risi Kondor. Cormorant: Covariant molecular neural networks. In Advances in Neural Information Processing Systems, pages 14510–14519, 2019.
 Battaglia et al. [2016] Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. Interaction networks for learning about objects, relations and physics. In Advances in neural information processing systems, pages 4502–4510, 2016.
 Bekkers [2019] Erik J Bekkers. Bspline cnns on lie groups. arXiv preprint arXiv:1909.12057, 2019.
 Blum and Reymond [2009] L. C. Blum and J.L. Reymond. 970 million druglike small molecules for virtual screening in the chemical universe database GDB13. J. Am. Chem. Soc., 131:8732, 2009.
 Butterfield [2006] Jeremy Butterfield. On symmetry and conserved quantities in classical mechanics. In Physical theory and its interpretation, pages 43–100. Springer, 2006.
 Chen et al. [2018] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Advances in neural information processing systems, pages 6571–6583, 2018.
 Chua et al. [2018] Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In Advances in Neural Information Processing Systems, pages 4754–4765, 2018.
 Cohen and Welling [2016a] Taco Cohen and Max Welling. Group equivariant convolutional networks. In International conference on machine learning, pages 2990–2999, 2016a.
 Cohen and Welling [2016b] Taco S Cohen and Max Welling. Steerable cnns. arXiv preprint arXiv:1612.08498, 2016b.
 Cohen et al. [2018] Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical cnns. arXiv preprint arXiv:1801.10130, 2018.
 Cohen et al. [2019] Taco S Cohen, Mario Geiger, and Maurice Weiler. A general theory of equivariant cnns on homogeneous spaces. In Advances in Neural Information Processing Systems, pages 9142–9153, 2019.
 Dai et al. [2017] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 764–773, 2017.
 Eade [2014] Ethan Eade. Lie groups for computer vision. Cambridge Univ., Cambridge, UK, Tech. Rep, 2014.
 Esteves et al. [2017] Carlos Esteves, Christine AllenBlanchette, Xiaowei Zhou, and Kostas Daniilidis. Polar transformer networks. arXiv preprint arXiv:1709.01889, 2017.
 Esteves et al. [2018] Carlos Esteves, Christine AllenBlanchette, Ameesh Makadia, and Kostas Daniilidis. Learning so (3) equivariant representations with spherical cnns. In Proceedings of the European Conference on Computer Vision (ECCV), pages 52–68, 2018.
 Gilmer et al. [2017] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 1263–1272. JMLR. org, 2017.
 Greydanus et al. [2019] Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks. In Advances in Neural Information Processing Systems, pages 15353–15363, 2019.
 He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
 Huang et al. [2017] Zhiwu Huang, Chengde Wan, Thomas Probst, and Luc Van Gool. Deep learning on lie groups for skeletonbased action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6099–6108, 2017.
 Janner et al. [2019] Michael Janner, Justin Fu, Marvin Zhang, and Sergey Levine. When to trust your model: Modelbased policy optimization. arXiv preprint arXiv:1906.08253, 2019.
 Jiang et al. [2019] Chiyu Jiang, Jingwei Huang, Karthik Kashinath, Philip Marcus, Matthias Niessner, et al. Spherical cnns on unstructured grids. arXiv preprint arXiv:1901.02039, 2019.
 Kamthe and Deisenroth [2017] Sanket Kamthe and Marc Peter Deisenroth. Dataefficient reinforcement learning with probabilistic model predictive control. arXiv preprint arXiv:1706.06491, 2017.
 Kingma and Ba [2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 Kondor and Trivedi [2018] Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. arXiv preprint arXiv:1802.03690, 2018.
 Kono and Ishitoya [1987] Akira Kono and Kiminao Ishitoya. Squaring operations in mod 2 cohomology of quotients of compact lie groups by maximal tori. In Algebraic Topology Barcelona 1986, pages 192–206. Springer, 1987.
 Kuffner [2004] James J Kuffner. Effective sampling and distance metrics for 3d rigid body path planning. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, volume 4, pages 3993–3998. IEEE, 2004.
 Laptev et al. [2016] Dmitry Laptev, Nikolay Savinov, Joachim M Buhmann, and Marc Pollefeys. Tipooling: transformationinvariant pooling for feature learning in convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 289–297, 2016.
 Larochelle et al. [2007] Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th international conference on Machine learning, pages 473–480, 2007.
 LeCun et al. [1995] Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
 Lenz et al. [2015] Ian Lenz, Ross A Knepper, and Ashutosh Saxena. Deepmpc: Learning deep latent features for model predictive control. In Robotics: Science and Systems. Rome, Italy, 2015.
 Marcos et al. [2017] Diego Marcos, Michele Volpi, Nikos Komodakis, and Devis Tuia. Rotation equivariant vector field networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 5048–5057, 2017.
 Moler and Van Loan [2003] Cleve Moler and Charles Van Loan. Nineteen dubious ways to compute the exponential of a matrix, twentyfive years later. SIAM review, 45(1):3–49, 2003.
 Nagabandi et al. [2018] Anusha Nagabandi, Gregory Kahn, Ronald S Fearing, and Sergey Levine. Neural network dynamics for modelbased deep reinforcement learning with modelfree finetuning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 7559–7566. IEEE, 2018.
 Noether [1971] Emmy Noether. Invariant variation problems. Transport Theory and Statistical Physics, 1(3):186–207, 1971.
 Ramachandran et al. [2017] Prajit Ramachandran, Barret Zoph, and Quoc V Le. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2017.
 Rupp et al. [2012] M. Rupp, A. Tkatchenko, K.R. Müller, and O. A. von Lilienfeld. Fast and accurate modeling of molecular atomization energies with machine learning. Physical Review Letters, 108:058301, 2012.
 SanchezGonzalez et al. [2019] Alvaro SanchezGonzalez, Victor Bapst, Kyle Cranmer, and Peter Battaglia. Hamiltonian graph networks with ode integrators. arXiv preprint arXiv:1909.12790, 2019.
 Schütt et al. [2018] Kristof T Schütt, Huziel E Sauceda, PJ Kindermans, Alexandre Tkatchenko, and KR Müller. Schnet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24):241722, 2018.
 Simonovsky and Komodakis [2017] Martin Simonovsky and Nikos Komodakis. Dynamic edgeconditioned filters in convolutional neural networks on graphs. In CVPR, 2017.
 Thomas et al. [2018] Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotationand translationequivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
 Weiler and Cesa [2019] Maurice Weiler and Gabriele Cesa. General e (2)equivariant steerable cnns. In Advances in Neural Information Processing Systems, pages 14334–14345, 2019.
 Weiler et al. [2018] Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco S Cohen. 3d steerable cnns: Learning rotationally equivariant features in volumetric data. In Advances in Neural Information Processing Systems, pages 10381–10392, 2018.
 Willson [2009] Benjamin Willson. Reiter nets for semidirect products of amenable groups and semigroups. Proceedings of the American Mathematical Society, 137(11):3823–3832, 2009.
 Worrall et al. [2017] Daniel E Worrall, Stephan J Garbin, Daniyar Turmukhambetov, and Gabriel J Brostow. Harmonic networks: Deep translation and rotation equivariance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5028–5037, 2017.
 Wu et al. [2019] Wenxuan Wu, Zhongang Qi, and Li Fuxin. Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9621–9630, 2019.
 Wu et al. [2018] Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
 Zagoruyko and Komodakis [2016] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
 Zhong et al. [2019] Yaofeng Desmond Zhong, Biswadip Dey, and Amit Chakraborty. Symplectic odenet: Learning hamiltonian dynamics with control. arXiv preprint arXiv:1909.12077, 2019.
 Zhou et al. [2017] Yanzhao Zhou, Qixiang Ye, Qiang Qiu, and Jianbin Jiao. Oriented response networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 519–528, 2017.
Appendix A Derivations and Additional Methodology
a.1 Generalized PointConv Trick
The matrix notation becomes very cumbersome for manipulating these higher order dimensional arrays, so we will instead use index notation with Latin indices indexing points, Greek indices indexing feature channels, and indexing the coordinate dimensions of which there are for PointConv and for LieConv.^{3}^{3}3 is the dimension of the space into which , the orbit identifiers, are embedded. As the objects are not geometric tensors but simply dimensional arrays, we will make no distinction between upper and lower indices. After expanding into indices, it should be assumed that all values are scalars, and that any free indices can range over all of the values.
Let be the output of the MLP which takes as input and acts independently over the locations . For PointConv, the input and for LieConv the input .
We wish to compute
(12) 
In Wu et al. [2019], it was observed that since is the output of an MLP, for some final weight matrix and penultimate activations ( is simply the result of the MLP after the last nonlinearity). With this in mind, we can rewrite (12)
(13)  
(14) 
In practice, the intermediate number of channels is much less than the product of and : and so this reordering of the computation leads to a massive reduction in both memory and compute. Furthermore, can be implemented with regular matrix multiplication and can be also by flattening into a single axis : .
The sum over index can be restricted to a subset (such as a chosen neighborhood) by computing at each of the required indices and padding to the size of the maximum subset with zeros, and computing using dense matrix multiplication. Masking out of the values at indices and is also necessary when there are different numbers of points per minibatch but batched together using zero padding. The generalized PointConv trick can thus be applied in batch mode when there may be varied number of points per example and varied number of points per neighborhood.
a.2 Abelian and Coordinate Transforms
For Abelian groups that cover in a single orbit, the computation is very similar to ordinary Euclidean convolution. Defining , , and using the fact that means that . Defining , ; we get
(15) 
where projects to the image of the logarithm map. Apart from a projection and a change to logarithmic coordinates, this is equivalent to Euclidean convolution in a vector space with dimensionality of the group. When the group is Abelian and is a homogeneous space, then the dimension of the group is the dimension of the input. In these cases we have a trivial stabilizer group and single origin , so we can view and as acting on the input .
This directly generalizes some of the existing coordinate transform methods for achieving equivariance from the literature such as log polar coordinates for rotation and scaling equivariance Esteves et al. [2017], and using hyperbolic coordinates for squeeze and scaling equivariance.
Log Polar Coordinates: Consider the Abelian Lie group of positive scalings and rotations: SO() acting on . Elements of the group can be expressed as a matrix
for and . The matrix logarithm is^{4}^{4}4Here is defined to mean for the integer such that the value is in , consistent with the principal matrix logarithm.
or more compactly , which is in the basis for the Lie algebra . It is clear that is simply on the component.
As is a homogeneous space of , one can choose the global origin . A little algebra shows that lifting to the group yields the transformation for each point , where , and are the polar coordinates of the point . Observe that the logarithm of has a simple expression highlighting the fact that it is invariant to scale and rotational transformations of the elements,
Now writing out our Monte Carlo estimation of the integral:
which is a discretization of the log polar convolution from Esteves et al. [2017]. This can be trivially extended to encompass cylindrical coordinates with the group 1 SO().
Hyperbolic coordinates: For another nontrivial example, consider the group of scalings and squeezes acting on the positive orthant . Elements of the group can be expressed as the product of a squeeze mapping and a scaling
for any . As the group is abelian, the logarithm splits nicely in terms of the two generators and :
Again is a homogeneous space of , and we choose a single origin . With a little algebra, it is clear that where and are the hyperbolic coordinates of .
Expressed in the basis for the Lie algebra above, we see that
yielding the expression for convolution
which is equivariant to squeezes and scalings.
As demonstrated, equivariance to groups that contain the input space in a single orbit and are abelian can be achieved with a simple coordinate transform; however our approach generalizes to groups that are both ’larger’ and ’smaller’ than the input space, including coordinate transform equivariance as a special case.
a.3 Sufficient Conditions for Geodesic Distance
In general, the function , defined on the domain of covered by the exponential map, satisfies the first three conditions of a distance metric but not the triangle inequality, making it a semimetric:



.
However for certain subgroups of with additional structure, the triangle inequality holds and the function is the distance along geodesics connecting group elements and according to the metric tensor
(16) 
where denotes inverse and transpose.
Specifically, if the subgroup is in the image of the map and each infinitesmal generator commutes with its transpose: for , then is the geodesic distance between .
Geodesic Equation: Geodesics of (16) satisfying can equivalently be derived by minimizing the energy functional
using the calculus of variations. Minimizing curves , connecting elements and in () satisfy
Noting that and the linearity of the trace,
Using the cyclic property of the trace and integrating by parts, we have that
where the boundary term vanishes since .
As may be chosen to vary arbitrarily along the path, must satisfy the geodesic equation:
(17) 
Solutions: When satisfies , the curve is a solution to the geodesic equation (17). Clearly connects and , and . Plugging in into the left hand side of equation (17), we have
Length of : The length of the curve connecting and is ,
Of the Lie Groups that we consider in this paper, all of which have a single connected component, the groups = T(), SO(), SO(), SO() satisfy this property that ; however, the SE() groups do not.
a.4 Equivariant Subsampling
Even if all distances and neighborhoods are precomputed, the cost of computing equation (6) for is still quadratic, , because the number of points in each neighborhood grows linearly with as is more densely evaluated. So that our method can scale to handle a large number of points, we show two ways two equivariantly subsample the group elements, which we can use both for the locations at which we evaluate the convolution and the locations that we use for the Monte Carlo estimator. Since the elements are spaced irregularly, we cannot readily use the coset pooling method described in Cohen and Welling [2016a], instead we can perform:
Random Selection: Randomly selecting a subset of points from the original preserves the original sampling distribution, so it can be used.
Farthest Point Sampling: Given a set of group elements , we can select a subset of size by maximizes the minimum distance between any two elements in that subset,
(18) 
farthest point sampling on the group. Acting on a set of elements, , the farthest point subsamplin g is equivariant for any . Meaning that applying a group element to each of the elements does not change the chosen indices in the subsampled set because the distances are left invariant .
Now we can use either of these methods for to equivariantly subsample the quadrature points in each neighborhood used to estimate the integral to a fixed number ,
(19) 
Doing so has reduced the cost of estimating the convolution from