Augmented NETT regularization of inverse problems (2024)

Various applications in medical imaging, remote sensing and elsewhere require solving inverse problems of the form

where is an operator between Hilbert spaces modeling the forward problem, is the data perturbation, is the noisy data and is the sought for signal. Inverse problems are well analyzed and several established approaches for its stable solution exist [1, 2]. Recently, neural networks and deep learning appeared as new paradigms for solving inverse problems [3–7]. Several approaches based on deep learning have been developed, including post-processing networks [8–12], regularizing null-space networks [13, 14], plug-and-play priors [15–17], deep image priors [18, 19], variational networks [20, 21], network cascades [22, 23], learned iterative schemes [24–29] and learned regularizers [30–33].

Classical deep learning approaches may lack data consistency for unknowns very different from the training data. To address this issue, in [31] a deep learning approach named NETT (NETwork Tikhonov) regularization has been introduced which considers minimizers of the NETT functional

Here, is a similarity measure, is a trained neural network, a functional and a regularization parameter. In [31] it is shown that under suitable assumptions, NETT yields a convergent regularization method. This in particular includes provable stability guarantees and error estimates. Moreover, a training strategy has been proposed, where is trained such that favors artifact-free reconstructions over reconstructions with artifacts.

1.1.The augmented NETT

One of the main assumptions for the analysis of [31] is the coercivity of the regularizer which requires special care in network design and training. In order to overcome this limitation, we propose an augmented form of the regularizer for which we are able to rigorously prove coercivity. More precisely, for fixed , we consider minimizers of the augmented NETT functional

Here, is a similarity measure and is an encoder-decoder network trained such that for any signal on a signal manifold we have and that is small. We term this approach augmented NETT (aNETT) regularization. In this work we provide a mathematical convergence analysis for aNETT, present a novel modular training strategy and investigate its practical performance.

The term implements learned prior knowledge on the encoder coefficients, while smallness of forces to be close to the signal manifold. The latter term also guarantees the coercivity of (1.3). In the original NETT version (1.2), coercivity of the regularizer requires coercivity conditions on the network involved. Indeed, in the numerical experiments, the authors of [31] observed a semi-convergence behaviour when minimizing (1.2), so early stopping of the iterative minimization scheme has been used as additional regularization. We attribute this semi-convergence behavior to a potential non-coercivity of the regularization term. In the present paper we address this issue systematically by augmentation of the NETT functional which guarantees coercivity and allows a more stable minimization. Coercivity is also one main ingredient for the mathematical convergence analysis.

An interesting practical instance of aNETT takes as a weighted ℓ^q-norm enforcing sparsity of the encoding coefficients [34, 35]. An important example for the similarity measure is given by the squared norm distance, which from a statistical viewpoint can be motivated by a Gaussian white noise model. General similarity measures allow us to adapt to different noise models which can be more appropriate for certain problems.

1.2.Main contributions

The contributions of this paper are threefold. As described in more detail below, we introduce the aNETT framework, mathematically analyze its convergence, and propose a practical implementation that is applied to tomographic limited data problems.

The first contribution is to introduce the structure of the aNETT regularizer . A similar approach has been studied in [36] for a linear encoder . However, in this paper we do not assume that the image consists of two components u and v but rather assume that there is some transformation in which the signal has some desired property such as, for example, sparsity. The term enforces regularity of the analysis coefficients, which is an ingredient in most of existing variational regularization techniques. For example, this includes sparse regularization in frames or dictionaries, regularization with Sobolev norms or total variation regularization. On the other hand, the augmented term penalized distance to the signal manifold. It is the combination of these two terms that results in a stable reconstruction scheme without the need of strong assumptions on the involved networks.
The second main contribution is the theoretical analysis of aNETT (1.3) in the context of regularization theory. We investigate the case where the image domain of the encoder is given by for some countable set Λ, and is a coercive functional measuring the complexity of the encoder coefficients. The presented analysis is in the spirit of the analysis of NETT given in [31]. However, opposed to NETT, the required coercivity property is derived naturally for the class of considered regularizers. This supports the use of the regularizer also from a theoretical side. Moreover, the convergence rates results presented here use assumptions significantly different from [31]. While we present our analysis for the transform domain we could replace the encoder space by a general Hilbert or Banach space.
As a third main contribution we propose a modular strategy for training together with a possible network architecture. First, independent of the given inverse problem, we train a -penalized autoencoder that learns representing signals from the training data with low complexity. In the second step, we train a task-specific network which can be adapted to the specific inverse problem at hand. In our numerical experiments, we empirically found this modular training strategy to be superior to directly adapting the autoencoder to the inverse problem. For the -penalized autoencoder, we train the modified version described in [37] of the tight frame U-Net of [38] in a way such that poses additional constraints on the autoencoder during the training process.

1.3.Outline

In section 2 we present the mathematical convergence analysis of aNETT. In particular, as an auxiliary result, we establish the coercivity of the regularization term. Moreover, we prove stability and derive convergence rates. Section 3 presents practical aspects for aNETT. We propose a possible architecture and training strategy for the networks, and a possible ADMM based scheme to obtain minimizers of the aNETT functional. In section 4, we present reconstruction results and compare aNETT with other deep learning based reconstruction methods. The paper concludes with a short summary and discussion. Parts of this paper were presented at the ISBI 2020 conference and the corresponding proceedings [39]. Opposed to the proceedings, this article treats a general similarity measure and considers a general complexity measure . Further, all proofs and all numerical results presented in this paper are new.

In this section we prove the stability and convergence of aNETT as regularization method. Moreover, we derive convergence rates in the form of quantitative error estimates between exact solutions for noise-free data and aNETT regularized solutions for noisy data. To this end we make the assumption that we can achieve global minimizers of the functional (1.3) and analyze the properties of these solutions. This is a common assumption in variational regularization approaches and is adopted in this work. Extending the analysis to consider only local minima is consiberably more difficult and is out of the scope of this paper.

2.1.Assumptions and coercivity results

For our convergence analysis we make use of the following assumptions on the underlying spaces and operators involved.

Condition 2.1.

(A1) and are Hilbert spaces.
(A2) for a countable Λ.
(A3) is weakly sequentially continuous.
(A4) is weakly sequentially continuous.
(A5) is weakly sequentially continuous.
(A6) is coercive and weakly sequentially lower semi-continuous.

We set and, for given , define

which we refer to as the aNETT (or augmented NETT) regularizer.

According to (A4)–(A6), the aNETT regularizer is weakly sequentially lower semi-continuous. As a main ingredient for our analysis we next prove its coercivity.Coercivity of the aNETT regularizer.

Theorem 2.2If Condition 2.1 holds, then the regularizer as defined in (2.1) with is coercive.

Proof.Let be some sequence in such that is bounded. Then by definition of it follows that is bounded and by coercivity of we have that is also bounded. By assumption, is weakly sequentially continuous and thus must be bounded. Using that , we obtain the inequality . This shows that is bounded and therefore that is coercive.□

Sparse aNETT regularizer.

Example 2.3To obtain a sparsity promoting regularizer we can choose where and . Since we have and hence is coercive. As a sum of weakly sequentially lower semi-continuous functionals it is also weakly sequentially lower semi-continuous [35]. Therefore Condition (A6) is satisfied for the weighted ℓ^q-norm. Together with theorem 2.2, we conclude that the resulting weighted sparse aNETT regularizer is a coercive and weakly sequentially lower semi-continuous functional.

For the further analysis we will make the following assumptions regarding the similarity measure .

Similarity measure.

Condition 2.4

(B1) .
(B2) is sequentially lower semi-continuous with respect to the weak topology in the first and the norm topology in the second argument.
(B3) as ).
(B4) as ().
(B5) with .

While (B1)–(B4) restrict the choice of the similarity measure, (B5) is a technical assumption involving the forward operator, the regularizer and the similarity measure, that is required for the existence of minimizers. For a more detailed discussion of these assumptions we refer to [40].

Similarity measures using the norm.

Example 2.5A classical example of a similarity measure satisfying (B1)–(B4) is given by for some and more generally by , where is a continuous and monotonically increasing function that satisfies .

Taking into account theorem 2.2, Conditions 2.1 and 2.4 imply that the aNETT functional defined by (1.3), (2.1) is proper, coercive and weakly sequentially lower semi-continuous. This in particular implies the existence of minimizers of for all data and regularization parameters (compare ) [2, 31].

2.2.Stability

Next we prove the stability of minimizing the aNETT functional regarding perturbations of the data .

Stability.

Theorem 2.6Let Conditions 2.1 and 2.4 hold, and . Moreover, let be a sequence of perturbed data with and consider minimizers . Then the sequence has at least one weak accumulation point and weak accumulation points are minimizers of . Moreover, for any weak accumulation point of and any subsequence with we have .

Proof.Let be such that . By definition of we have . Since by assumption and , we have . This implies that is bounded by some positive constant for sufficiently large n. By definition of we have . Since is coercive it follows that is a bounded sequence and hence it has a weakly convergent subsequence.

Let be a weakly convergent subsequence of and denote its limit by . By the lower semi-continuity we get and . Thus for all with we have

2.3.Convergence

In this subsection we consider the limit process as the noise-level δ tends to 0. Assuming that we would expect the regularized solutions to converge to some solution of the equation . This raises the obvious question whether this solution has any additional properties. In fact, we prove that the minimizers of the aNETT functional for noisy data converge to such a special kind of solution, namely solutions which minimize among all possible solutions. For that purpose, here and below we use the following notation.

 -minimizing solutions.

Definiton 2.8For , we call an element an -minimizing solution of the equation if

An -minimizing solution always exists provided that data satisfies , which means that the equation has at least one solution with finite value of . To see this, consider a sequence of solutions with . Since is coercive there exists a weakly convergent subsequence with weak limit . Using the weak sequential lower semi-continuity of one concludes that is an -minimizing solution. We first show weak convergence.

Weak convergence of aNETT.

Theorem 2.9Suppose Conditions 2.1 and 2.4 are satisfied. Let , with and let satisfy . Choose such that and let . Then the following hold:

(a)
has at least one weakly convergent subsequence.
(b)
All accumulation points of are -minimizing solutions of .
(c)
For every convergent subsequence it holds .
(d)
If the -minimizing solution is unique then .

Proof.(a): Because , there exists an -minimizing solution of the equation which we denote by . Because we have

Because this shows that is bounded. Due to the coercivity of the aNETT regularizer (see theorem 2.2), this implies that has a weakly convergent subsequence.

(b), (c): Let be a weakly convergent subsequence of with limit . From the weak lower semi-continuity we get , which shows that is a solution of . Moreover,

where for the second last inequality we used (2.4) and for the last equality we used that . Therefore, is an -minimizing solution of the equation . In a similar manner we derive which shows .

(d): If has a unique -minimizing solution , then every subsequence of has itself a subsequence weakly converging to , which implies that weakly converges to the -minimizing solution.□

Next we derive strong convergence of the regularized solutions. To this end we recall the absolute Bregman distance, the modulus of total nonlinearity and the total nonlinearity, defined in [31].

Absolute Bregman distance.

Definiton 2.10Let be Gâteaux differentiable at . The absolute Bregman distance at with respect to is defined by

Here and below denotes the Gâteaux derivative of at .

Modulus of total nonlinearity and total nonlinearity.

Definiton 2.11Let be Gâteaux differentiable at . We define the modulus of total nonlinearity of at as . We call totally nonlinear at if for all .

Using these definitions we get the following convergence result in the norm topology.

2.4.Convergence rates

We will now prove convergence rates by deriving quantitative estimates for the absolute Bregman distance between -minimizing solutions for exact data and regularized solutions for noisy data. The convergence rates will be derived under the additional assumption that satisfies the quasi triangle-inequality (2.2).

Convergence rates for aNETT.

Proposition 2.13Let the assumptions of theorem 2.12 be satisfied and suppose that satisfies the quasi triangle-inequality (2.2) for some . Let be an -minimizing solution of such that is Gâteaux differentiable at and assume there exist with

For any , let be noisy data satisfying , , and write . Then the following hold:

(a)
For sufficiently small α, it holds .
(b)
If , then as .

Proof.By definition of we have . By theorem 2.12 for sufficiently small α we can assume that and hence

Together with the inequality of arithmetic and geometric means for and this implies which shows (a). Item (b) is an immediate consequence of (a).□

The following results is our main convergence rates result. It is similar to proposition [31], theorem 3.1, but uses different assumptions.

Convergence rates for finite rank operators.

Theorem 2.14Let the assumptions of theorem 2.12 be satisfied, take , assume has finite dimensional range and that is Lipschitz continuous and Gâteaux differentiable. For any , let be noisy data satisfying and write . Then for the parameter choice we have the convergence rates result as .

Proof.According to proposition 2.13, it is sufficient to show that (2.5) holds with in place of . For that purpose, let denote the orthogonal projection onto the null-space and let L be a Lipschitz constant of . Since restricted to is injective with finite dimensional range, we can choose a constant such that .

We first show the estimates

To that end, let and write . Then . Since is an -minimizing solution, we have . Since , we have . The last two estimates prove (2.6). Because is an -minimizing solution, we have whenever . On the other hand, using that is Gâteaux differentiable and that has finite rank, shows for . This proves (2.7).

Inequality (2.6) implies . Together with (2.7) this yields

which proves (2.5) with .□

Note that the theoretical results stated remain valid, if we replace by a general coercive and weakly lower semi-continuous regularizer .

In this section we investigate practical aspects of aNETT. We present a possible network architecture together with a possible training strategy in the discrete setting³ . Further we discuss minimization of aNETT using the ADMM algorithm. For the sake of clarity we restrict our discussion to the finite dimensional case where and for a finite index set Λ.

3.1.Proposed modular aNETT training

To find a suitable network defining the aNETT regularizer , we propose a modular data driven approach that comes in two separate steps. In a first step, we train a -regularized denoising autoencoder independent of the forward problem , whose purpose is to well represent elements of a training data set by low complexity encoder coefficients. In a second step, we train a task-specific network that increases the ability of the aNETT regularizer to distinguish between clean images and images containing problem specific artifacts.

Let denote the given set of artifact-free training phantoms.

–regularized autoencoder:

First, an autoencoder is trained such that is close to and that is small for the given training signals. For that purpose, let be a family of autoencoder networks, where are encoder and decoder networks, respectively.

To achieve that unperturbed images are sparsely represented by , whereas disrupted images are not, we apply the following training strategy. We randomly generate images where is additive Gaussian white noise with a standard deviation proportional to the mean value of , and is a binary random variable that takes each value with probability 0.5. For the numerical results below we use a standard deviation of 0.05 times the mean value of . To select the particular autoencoder based on the training data, we consider the following training strategy

and set . Here are regularization parameters.

Including perturbed signals in (3.1) increases robustness of the -regularized autoencoder. To enforce regularity for the encoder coefficients only on the noise-free images, the penalty is only used for the noise-free inputs, reflected by the pre-factor . Using auto-encoders, regularity for a signal class could also be achieved by means of dimensionality reduction techniques, where is used as a bottleneck in the network architecture. However, in order to get a regularizer that is able to distinguish between perturbed and unperturbed signals we use to be of sufficiently high dimensionality.

Task-specific network:

Numerical simulations showed that the -regularized autoencoder alone was not able to sufficiently well distinguish between artifact-free training phantoms and images containing problem specific artifacts. In order to address this issue, we compose the operator independent network with another network , that is trained to distinguish between images with and without problem specific artifacts.

For that purpose, we consider randomly generated images where either or with equal probability. Here is an approximate right inverse and are error terms We choose a network architecture and select , where

for some regularization parameter . In particular, the image residuals now depend on the specific inverse problem and we can consider them to consist of operator and training signal specific artifacts.

The above training procedure ensures that the network adapts to the inverse problem at hand as well as to the -regularized autoencoder. Training the network independently of , or directly training the auto-encoder to distinguish between images with and without problem specific artifacts, we empirically found to perform considerably worse.

The final autoencoder is then given as with modular decoder . For the numerical results we take as the tight frame U-Net of [38]. Moreover, we choose as modified tight frame U-Net proposed in [37] for deep synthesis regularization. In particular, opposed to the original tight frame U-net, the modified tight frame U-Net does not involve skip connections.

3.2.Possible aNETT minimization

For minimizing the aNETT functional (1.3) we use the alternating direction method of multiplies (ADMM) with scaled dual variable [41–43]. For that purpose, the aNETT minimization problem is rewritten as the following constraint minimization problem

The resulting ADMM update scheme with scaling parameter initialized by and then reads as follows:

(S1)

Augmented NETT regularization of inverse problems (420)

(S2)

Augmented NETT regularization of inverse problems (421)

. (S3)

Augmented NETT regularization of inverse problems (422)

One interesting feature of the above approach is that the signal update (S1) is independent of the possibly non-smooth penalty . Moreover, the encoder update (S2) uses the proximal mapping of which in important special cases can be evaluated explicitly and therefore fast and exact. Moreover, it guarantees regular encoder coefficients during each iteration. For example, if we choose the penalty as the ℓ¹-norm, then (S2) is a soft-thresholding step which results in sparse encoder coefficients. Step (S1) in typical cases has to be computed iteratively via an inner iteration. To find an approximate solution for (S1) for the results presented below we use gradient descent with at most 10 iterations. We stop the gradient descent updates early if the difference of the functional evaluated at two consecutive iterations is below our predefined tolerance of 10⁻⁵.

The concrete implementation of the aNETT minimization requires specification of the similarity measure, the total number of outer iterations , the step-size γ for the iteration in (S1) and the parameters defining the aNETT functional. These specifications are selected dependent of the inverse problem at hand. Table 1 lists the particular choices for the reconstruction scenarios considered in the following section.

Table 1.Parameter specifications for proposed aNETT functional and it numerical minimization.

	α		γ		N_φ	noise model
Sparse view	10⁻⁴	10²		50	40	Gaussian,
Low dose		10²	10⁻³	20	1138	Poisson,
Universality	10⁻⁴	10²		50	160	Gaussian,

In order to choose the parameters for the numerical simulations we have tested different values and manually chose the parameters which maximized performance among the considered parameters. Another way of choosing these parameters could be to try and learn these parameters from the data using some kind of machine learning approach or choose a bilevel approach similar to [44].

In the simulations we have observed that choosing c larger will tend to oversmooth the resulting reconstructions. Taking a smaller value for c we observed that the manifold term tends to be undervalued resulting in worse performance. In a similar fashion we found that choosing larger will have a smoothing effect on the resulting reconstructions while lowering will make the reconstructions less smooth.

The ADMM scheme for aNETT minimization shares similarities with existing iterative neural network based reconstruction methods. In particular, ADMM inspired plug-and-play priors [15–17] may be most closely related. However, opposed the plug and play approach we can deduce convergence from existing results for ADMM for non-convex problems [45]. While convergence of (S1)–(S3) and relations with plug and play priors are interesting and relevant, they are beyond the scope of this work. This also applies to the comparison with other iterative minimization schemes for minimizing aNETT.

In this section we apply aNETT regularization to sparse view and low-dose computed tomography (CT). For the experiments we always choose to be the ℓ¹-norm. The parameter specifications for the proposed aNETT functional and its numerical minimization are given in table 1. For quantitative evaluation, we use the peak-signal-to-noise-ratio (PSNR) defined by

Here is the ground truth image and its numerical reconstruction. Higher value of PSNR indicates better reconstruction.

4.1.Discretization and dataset

For sparse view CT as well as for low dose CT we work with a discretization of the Radon transform . The values are integrals of the function over lines orthogonal to for angle and signed distance . We discretize the Radon transform using the ODL library [46] where we assume that the function has compact support in and sampled on an equidistant grid. We use N_φ equidistant samples of and N_s equidistant samples of . In both cases, we end up with an inverse problem of the form (1.1), where is the discretized linear forward operator. Elements will be referred to as CT images and the elements as sinograms.

For all results presented below we work with image size 512×512 and use N_s=768. The number of angular samples N_φ is taken 40 for sparse view CT and for the low dose example. In both cases we use the CT images from the Low Dose CT Grand Challenge dataset [47] provided by the Mayo Clinic. The dataset consists of 512×512 grayscale images of 10 different patients, where for each patient there are multiple CT scanning series available. We use the split for training, validation and testing which corresponds to CT images in the respective sets. We use the validation set to select networks which achieve the minimal loss on the validation set. The test set is used to evaluate the final performance. Note that by splitting of the dataset according to patient we avoid validation and testing on images patients that have already be seen during training time. An example image and the corresponding simulated sparse view and low-dose sinogram are shown in figure 1.

Augmented NETT regularization of inverse problems (460)

4.2.Numerical results

We compare results of aNETT to the learned primal-dual algorithm (LPD) [48], the tight frame U-Net [38] applied as post-processing network (CNN) and the filtered back-projection (FBP). Minimization of the loss-function for all methods was done using Adam [49] for 100 epochs, cosine decay learning rate with in the t-th epoch, and a batch-size of 4. For LPD we take the hyper-parameters and N=7 network iterations and train according to [48]. Here, we choose to only use N = 7 network iterations because we observed instabilities during the training phase when this parameter was chosen larger and we have not performed any parametr tuning. For training of the tight frame U-Net we do not follow the patch approach of [38] but instead use full images obtained with FBP as CNN inputs. Training of all the networks was done on a GTX 1080 Ti with an Intel Xeon Bronze 3104 CPU.

Sparse View CT: to simulate sparse view data we evaluate the Radon transform for directions. We generate noisy data by adding Gaussian white noise with standard deviation taken as 0.02 times the mean value of . We use the ℓ²-norm distance as the similarity measure. Quantitative results evaluated on the test set are shown in table 2. All learning-based methods yield comparable performance in terms of PSNR and clearly outperform FBP. The reconstructions shown in figure 2 indicate that aNETT reconstructions are less smooth than CNN reconstructions and less blocky than LPD reconstructions.
Low Dose CT: for the low dose problem, we use a fully sampled sinogram with and add Poisson noise corresponding to 10⁴ incident photons per pixel bin. The Kullback-Leibler divergence is a more appropriate discrepancy term than the squared ℓ²-norm distance in case of Poisson noise and the reported values and reconstructions use the Kullback-Leibler divergence as the similarity measure. Quantitative results are shown in table 2. Again, all learning-based methods give similar results and significantly outperform FBP. Visual comparison of the reconstructions in figure 3 shows that CNN yields cartoon like images and the LPD reconstruction again looks blocky. The aNETT reconstruction shows more texture than the CNN reconstruction and at the same time is less blocky than the LPD reconstruction.
Universality: in practical applications, we may not have a fixed sampling pattern. If we have many different sampling patterns, then training a network for each sampling pattern is infeasible and hence reconstruction methods should be applicable to different sampling scenarios. Additionally, it is desirable that an increased number of samples indeed increases performance. In order to test this issue, we consider the sparse view CT problem but with an increased number of angular samples without retraining the networks. Due to the rigidity of the used framework LPD cannot easily be adapted to this problem and we therefore decided to only compare aNETT with the post-processing CNN. For the results presented here, no network was retrained. Quantitative evaluation for this scenario is given in table 2. We see that aNETT slightly outperforms the CNN in terms of PSNR. The advantage of aNETT over CNN, however, is best observed in figure 4. One observes that CNN yields a similar reconstructions for both angular sampling patterns. On the other hand, aNETT is able to synergistically combine the increased sampling rate of the sinogram with the network trained on coarsely sampled data. Despite using the network trained with only 40 angular samples, aNETT reconstructs small details which are not present in the reconstruction from 40 angular samples.

Augmented NETT regularization of inverse problems (469)

Augmented NETT regularization of inverse problems (472)

Augmented NETT regularization of inverse problems (474)

Table 2.Overview of metric results evaluated on the test-set. The values shown are the average of the PSNR±the standard deviation calculated over the test dataset. The values in bold show the best results. The na entry means that LPD was not applied to this problem setting, as in the used framework there is no canonical way to use LPD with modified sampling pattern.

PSNR	FBP	LPD	Post	aNETT
Sparse view	23.8±1.3		37.1±0.9	37.1±1.0
Low dose	36.9±1.6	43.6±1.3		43.9±1.3
Universality	32.4±1.6	`na`	37.7±0.8

4.3.Discussion

The results show that the proposed aNETT regularization is competitive with prominent deep-learning methods such as LPD and post-processing CNNs. We found that the aNETT does not suffer as much from over-smoothing which is often observed in other deep-learning reconstruction methods. This can for example be seen in figure 3 where the CNN yields an over-smoothed reconstruction and the aNETT reconstruction shows more texture. Besides this, aNETT reconstructions are less blocky than LPD reconstructions. Moreover, aNETT is able to leverage higher sampling rates without retraining the networks to reconstruct small details while other deep-learning methods fail to do so. We conjecture that this advantage arises due to the fact that aNETT can make use of the higher sampling rate using the data-consistency term in (1.3), while the CNN is agnostic to this change in the sampling rate. In some scenarios, it may not be possible to retrain networks. Especially for learned iterative schemes network training is a time-consuming task. Training aNETT on the other hand is straightforward and, as demonstrated, yields a method which is robust to changes of the forward problem during testing time.

While a more extensive study with respect to the influence of noise could be done to further analyse the advantages and disadvantages of each method, this is not our main focus here and is thus postponed to a future study.

Finally, we note that aNETT relies on minimizing (1.3) iteratively. With the use of the ADMM minimization scheme presented in this article, aNETT is slower than the methods used for comparison in this article. Designing faster optimization schemes for (1.3) is beyond the scope of this work, but is an important and interesting aspect.

We have proposed the aNETT (augmented NETwork Tikhonov) for which we derived coercivity of the regularizer under quite mild assumptions on the networks involved. Using this coercivity we presented a convergence analysis of aNETT with a general similarity measure . We proposed a modular training strategy in which we first train an -regularized autoencoder independent of the problem at hand and then a network which is adapted to the problem and first autoencoder. Experimentally we found this training strategy to be superior to directly training the autoencoder on the full task. Lastly, we conducted numerical simulations demonstrating the feasibility of aNETT.

The experiments show that aNETT is able to keep up with the classical post-processing CNNs and the learned primal-dual approach for sparse view and low dose CT. Typical deep learning methods work well for a fixed sampling pattern on which they have been trained on. However, reconstruction methods are expected to perform better if we use an increased sampling rate. We have experimentally shown that aNETT is able to leverage higher sampling rates to reconstruct small details in the images which are not visible in the other reconstructions. This universality can be advantageous in applications where one is not fixed to one sampling pattern or is not able to train a network for every sampling pattern.

D O and M H acknowledge support of the Austrian Science Fund (FWF), project P 30 747-N32. Theresearch of L N has been supported by the National Science Foundation (NSF) Grants DMS 1 212 125 and DMS 1 616 904.

No new data were created or analysed in this study.