Computational aspects
Unfortunately, Theorem 1 is presented in the continuous frequency domain only (Sun et al. 2009), which is not easy to be applied directly in practical problems because \(\hat{g}(\omega )\) is a continuous function that has to be discretized. A naive extension of Theorem 1 can be constructed by using the circular convolution (Oppenheim et al. 1999) instead of the regular convolution.
Proposition 2
If
\(H_n\)
is a circulant matrix, then the
\(n \times n\)
metric matrix (which is also circulant) defined by
\(G_n = H_n^T H_n\)
can be determined by
$$\begin{aligned} G_n (i, j) = g [i  j], \end{aligned}$$
where
g [i] is the autocorrelation function of
h [i], i.e.,
$$\begin{aligned} g [i] = h [i] \circledast _n h^{*} [i], \end{aligned}$$
with
\(h^{*} [i] = \overline{h [ i]}\)
, where
\({\circledast }_n\)
denotes the
npoint circular convolution, or equivalently in frequency domain,
$$\begin{aligned} \hat{g} [j] =  \hat{h} [j] ^2 . \end{aligned}$$
The above extension has problems. The first problem is that, for the same filter h[i], the induced metric filters \(g = h *h\) and \(\tilde{g} = h {\circledast }_n h\) are different, i.e.,
$$\begin{aligned} H^{*}_n H_n \ne H^{*}_{m, n} H_{m, n}, \end{aligned}$$
because linear convolution and circular convolution don’t equal generally.
The second problem is even worse: to derive a translationinvariant transform in discrete frequency domain, the matrix representation of the metric \({\mathbf {G}}\) must be a circulant matrix, which is not true for common cases, including both IMED and GED.
We adopt the following approach to overcome these problems: padding the finitely supported sequences to periodic sequences. Given h[i] supported on \([m,m)\) and x[i] supported on [0, n), define \(\tilde{h}[i]\) and \(\tilde{x}[i]\) of period\((n+2m)\) by
$$\begin{aligned} \tilde{h} \left[ i \right] = {\left\{ \begin{array}{ll} h [i], &{} i \in [m, m)\\ 0, &{} i \in [m, m+n) \end{array}\right. } \end{aligned}$$
and
$$\begin{aligned} \tilde{x} \left[ i \right] = {\left\{ \begin{array}{ll} x \left[ i \right] , &{} i \in \left[ 0, n \right) \\ 0, &{} i \in \left[  m, 0 \right) \cup \left[ n, m + n \right) . \end{array}\right. } \end{aligned}$$
By the circular convolution theorem (Oppenheim et al. 1999), the two types of convolution coincide:
$$\begin{aligned} h *x \left[ i \right] = {\left\{ \begin{array}{ll} \tilde{h} {\circledast }_{n+2m} \tilde{x} \left[ i \right] , &{} i \in \left[  m, m + n \right) ;\\ 0, &{} \text {else} . \end{array}\right. } \end{aligned}$$
In other words, the linear convolution of h and x on its support is a period of the circular convolution of their periodic expansion \(\tilde{h}\) and \(\tilde{x}\).
Now consider the two versions of metric filter: \(g[i] = h *h^{*} [i]\) and \(\tilde{g} [i] = \tilde{h} {\circledast }_{n+2m} \tilde{h}^{*} [i]\). Because
$$\begin{aligned} \forall i \in \left[ 0,\; n + 2 m \right) ,\tilde{h} \left[ i  m \right] = h \left[ i  m \right] , \end{aligned}$$
hence \(g \left[ i \right] = \tilde{g} \left[ i \right] \) if and only if
$$\begin{aligned} i \in \left[  2 m, n \right) \bigcap \left(  n, + \infty \right) . \end{aligned}$$
On the other hand, by definition the metric filter is conjugate symmetric, i.e,,
$$\begin{aligned} g [i] = \overline{g[i]},\,\, \tilde{g} [i] = \overline{\tilde{g} [i]}, \end{aligned}$$
so it can be asserted that \(g \left[ i \right] = \tilde{g} \left[ i \right] \) when \(i \in \left(  n, n \right) \).
The above statements assert that given a finitely supported translationinvariant transform h[x], the induced metric \(\tilde{g}[i]\) constructed by the padded period filter \(\tilde{h}[i]\) is also translation invariant.
Hence, the analogous version of Theorem 1 can be given as follows.
Theorem 3
Given the
\([m, m)\)
supported metric filter
g[i], there exists a circular filter
\(\tilde{h} [i]\)
, such that
g[i] is equal to
\(\tilde{h} {\circledast }_{n+2m} \tilde{h} [i]\)
on its support.
Proof
Define the period\((n + 2 m)\) sequence \(\tilde{g}\) by
$$\begin{aligned} \tilde{g} \left[ i \right] = {\left\{ \begin{array}{ll} g \left[ i \right] , &{} i \in \left[  m, m \right) \\ 0, &{} i \in \left[ m, m + n \right) . \end{array}\right. } \end{aligned}$$
Let \(\tilde{h} [i] =\mathcal {F}^{ 1} \left( \sqrt{\widehat{\tilde{g}} \left[ i \right] } \right) \) and the proof is complete. \(\square \)
It is beneficial to derive the matrix representation of Theorem 3. Given the \(n \times n\) metric matrix \(G_n\), by Theorem 1, it determines a filter h[i] supported on \([m,m)\), and hence the \((n+2m) \times n\) translateinvariant matrix \(H_{m,n}\); by theorem 3, it determines a filter \(\tilde{h} [i]\) of period \(n+2m\), and hence the \((n+2m) \times (n+2m)\) circular matrix \(\tilde{H}_{m,n}\). Writing
$$\begin{aligned} G_n&= H_{m, n}^{*} H_{m, n}\\ \tilde{G}_{n + 2 m}&= \tilde{H}_{n + 2 m}^{*} \tilde{H}_{n + 2 m}, \end{aligned}$$
and it can be checked that \(G_n\) is the leftupper \(n \times n\) block of \(\tilde{G}_{n + 2 m}\).
The results in discrete frequency domain can be easily extended to multidimensional signal space the same as in continuous frequency domain (Sun et al. 2009). A convenient property of the extension is that the multidimensional data (e.g, 2d images) can be processed without vectorization.
The translationinvariant transforms of IMED and GED
To demonstrate that the proposed method can be applied to multidimensional cases directly, we write the metric matrices of IMED and GED in tenser form.

IMED The metric tensor \(\mathbbm {g}\) for IMED is defined in (Wang et al. 2005) by a Gaussian, i.e.,
$$\begin{aligned} \mathbbm {g}_{j_1 j_2}^{i_1 i_2} = \frac{1}{2 \pi } e^{ \frac{d^2}{2}}, \end{aligned}$$
where
$$\begin{aligned} d = \sqrt{(i_1  j_1)^2 + (i_2  j_2)^2}. \end{aligned}$$
The metric filter for IMED is separable, i.e.,
$$\begin{aligned} g[i_1, i_2] = \frac{1}{2 \pi } e^{ \frac{i_1^2 + i_2^2}{2}} = \frac{1}{\sqrt{2 \pi }} e^{ \frac{i_1^2}{2}} \cdot \frac{1}{\sqrt{2 \pi }} e^{ \frac{i_2^2}{2}} = g_0 [i_1] g_0 [i_2]. \end{aligned}$$
We choose the support length \(m_1 = m_2 = 4\) (\(g [4, 4] \approx 1.7911 \times 10^{ 8}\)), i.e., \(g [i_1, i_2]\) is supported on \([ 4, 4] \times [ 4, 4]\). For \(52 \times 52\) signals (\(n_1 = n_2 = 52\)), we build the period \(n_1 + 2 m_1 = 60\) sequence
$$\begin{aligned} \widetilde{g_0}[i]={\left\{ \begin{array}{ll} g_0 [i] = \frac{1}{\sqrt{2 \pi }} e^{\frac{i^2}{2}}, &{} i \in [4,4] \\ 0, &{} i \in (4,56). \\ \end{array}\right. } \end{aligned}$$
It is easy to validate that \(\widehat{\widetilde{g_0}} [j] \geqslant 0, \forall j\). Thus the separated period filter \(\widetilde{h_0} [i]\) can be constructed by
$$\begin{aligned} \widetilde{h_0} [i] =\mathcal {F}^{ 1} \left( \sqrt{\widehat{\tilde{g}} [j]} \right) , \end{aligned}$$
and the overall filter is \(\tilde{h} [i_1, i_2] = \widetilde{h_0} [i_1] \widetilde{h_0} [i_2]\).

GED The metric tensor \(\mathbbm {g}\) for GED is defined in (Jean 1990) by a Laplacian, i.e.,
$$\begin{aligned} \mathbbm {g}_{j_1 j_2}^{i_1 i_2} = r^d = e^{d \log r}, \end{aligned}$$
where \(d =  i_1  j_1  +  i_2  j_2 \) is the \(l_1\) distance of the two pixels and \(r = 0.6\) is a decay constant. The metric filter for GED is separable, i.e.,
$$\begin{aligned} g [i_1, i_2] = r^{ i_1  +  i_2 } = r^{ i_1 } \cdot r^{ i_2 } = g_0 [i_1] g_0 [i_2]. \end{aligned}$$
We choose the support length \(m_1 = m_2 = 15\) (\(g [15, 15] \approx 2.2107 \times 10^{ 7}\)), i.e., \(g [i_1, i_2]\) is supported on \([ 15, 15] \times [ 15, 15]\). For \(30 \times 30\) signals (\(n_1 = n_2 = 30\)), we build the period \(n_1 + 2 m_1 = 60\) sequence
$$\begin{aligned} \widetilde{g_0}[i]={\left\{ \begin{array}{ll} g_0[i] = r^{\vert i \vert }, &{} i \in [15,15] \\ 0, &{} i \in (15,45). \\ \end{array}\right. } \end{aligned}$$
We can validate that \(\widehat{\widetilde{g_0}} [j] \geqslant 0, \forall j\). Thus the separated period filter \(\widetilde{h_0} [i]\) can be constructed by
$$\begin{aligned} \widetilde{h_0} [i] =\mathcal {F}^{ 1} \left( \sqrt{\widehat{\tilde{g}} [j]} \right) , \end{aligned}$$
and the overall filter is \(\tilde{h} [i_1, i_2] = \widetilde{h_0} [i_1] \widetilde{h_0} [i_2]\).
The translationinvariant transforms of IMED and GED in space and frequency domain are drawn in Fig. 1. It clearly shows that applying the GED or IMED is equivalent to a lowpass filtering process, which is robust to small perturbation of images.
The fast implementation of IMED and GED
The advantages of the filtering decomposition over the GET or ST are not only the physical explanation but also the time and space complexity. Generally, the computational complexity associated with the filtering decomposition can be of \(O (n \log n)\) due to the efficiency of FFT (Oppenheim et al. 1999).
In the case of IMED and GED, since the corresponding filters decay rapidly (Fig. 1), e.g, \(g[4] = \frac{1}{\sqrt{2 \pi }} e^{ \frac{4^2}{2}} \approx 1.34 \times 10^{ 4}\) (IMED), the vector \({\mathbf {g}} = (g[0], \ldots , g [m], 0, \ldots , 0, g[m], \ldots , g[1])^T\) can be set of length n. Therefore \(G \approx \tilde{G}\) and the transform can be applied on the original X than the zeropadded image \(\tilde{X}\). Finally, the period filter \(\tilde{g}\) can be built using only several significant values. The templates of IMED (\(\sigma =1\)) and GED (\(\alpha =2\)) are
$$\begin{aligned} \begin{pmatrix} 0 &{} 0.0012 &{} 0.0029 &{} 0.0012 &{} 0 \\ 0.0012 &{} 0.0471 &{} 0.1198 &{} 0.0471 &{} 0.0012 \\ 0.0029 &{} 0.1198 &{} 0.3046 &{} 0.1198 &{} 0.0029 \\ 0.0012 &{} 0.0471 &{} 0.1198 &{} 0.0471 &{} 0.0012 \\ 0 &{} 0.0012 &{} 0.0029 &{} 0.0012 &{} 0 \\ \end{pmatrix}, \end{aligned}$$
and
$$\begin{aligned} \begin{pmatrix} 0.0003 &{} 0.0025 &{} 0.0183 &{} 0.0025 &{} 0.0003 \\ 0.0025 &{} 0.0183 &{} 0.1353 &{} 0.0183 &{} 0.0025 \\ 0.0183 &{} 0.1353 &{} 1.0000 &{} 0.1353 &{} 0.0183 \\ 0.0025 &{} 0.0183 &{} 0.1353 &{} 0.0183 &{} 0.0025 \\ 0.0003 &{} 0.0025 &{} 0.0183 &{} 0.0025 &{} 0.0003 \\ \end{pmatrix}, \end{aligned}$$
respectively.
Since the filter is of fixed size, the fast implementation can further reduces the space complexity from \(O(n^2)\) to O(1), and the time complexity from \(O(n^2)\) to O(n).
Transform domain metric learning
Generally, in order to learn a metric G, one can do optimization with respect to G. For images of size \(n_1 \times n_2\), G has \(n_1^2 \times n_2^2\) elements, making the optimization intractable. Another problem is G must satisfy the positive semidefinite constraint, i.e., \(G \geqslant 0\), so it is not easy to find efficient algorithm to solve problem with such a constraint.
Theorem 1 can be equivalently written
$$\begin{aligned} x^T G x = \frac{1}{2\pi } \int _{\pi }^{\pi } \hat{g} (\omega ) \hat{x}(\omega ) ^2 \mathrm{{d}} \omega . \end{aligned}$$
(2)
Equation (2) introduces great simplifications to the optimization problem of metric learning. With the translationinvariant assumption on G, things are much simpler. This is because the positive semidefinitive constraint \(G \geqslant 0\) is reduced to a bound constraint \(\hat{g} (\varvec{\omega } ) \geqslant 0\). Furthermore, the number of parameters is the sampling number on \(\hat{g}\), which is usually chosen to be the same as the size of input data. An additional benefit of the translationinvariant approach is that it applies to any dimensionality without modifications, thus is unnecessary to stack the multidimensional data to vectors.
Suppose we have some data \(\left\{ x_i \right\} \), and are given the data label \(\left\{ y_i \right\} \). Let \(f_i\) be the Fourier transform of \(x_i\), we compute the total “similar” and “dissimilar” power spectrum:
$$\begin{aligned} p_w ( \varvec{\omega } ) = \sum _{i, j, y_i = y_j}  f_i ( \varvec{\omega } )  f_j ( \varvec{\omega } ) ^2, \qquad p_b ( \varvec{\omega } ) = \sum _{i, j, y_i \ne y_j}  f_i ( \varvec{\omega } )  f_j ( \varvec{\omega } ) ^2. \end{aligned}$$
The criterion here is that the filtered withinclass distance is minimized, and the filtered betweenclass distance is maximized, simultaneously. This gives the objective functional
$$\begin{aligned} J_0 \left( g \right) = \frac{\int _{T^d} \hat{g} \left( \varvec{\omega } \right) p_w \left( \varvec{\omega } \right) \mathrm{d} \varvec{\omega }}{\int _{T^d} \hat{g} \left( \varvec{\omega } \right) p_b \left( \varvec{\omega } \right) \mathrm{d} \varvec{\omega }}. \end{aligned}$$
(3)
The objective (3) resembles the idea of LDA (Duda et al. 2000). In fact, TDML can be viewed as a translateinvariant solution to LDA.