Image Reconstruction from Non-Cartesian K-Space Data¶
This tutorial notebook demonstrates how image is reconstructed from non-cartesian k-space data. In this tutorial, we'll write the reconstruction algorithms from scratch. However, some of the basic functionalities will be implemented using Python's built-in library. In this tutorial we'll use a phantom (shepp logan) and obtain the variable density k-space data from it and then reconstruct the image using the k-space data.
NB: This tutorial has been adapted from Mark Chiew's MATLAB implementation. You can find the MATLAB implementation here
Loading the libraries¶
import numpy as np
import sigpy as sp
import matplotlib.pyplot as plt
from scipy.linalg import norm
from skimage.transform import resize
from scipy.sparse.linalg import cg, minres
from scipy.signal import convolve2d
Warning : Pytorch installed but can import
# Load the phantom
N_x = 64 # size of image
x = sp.shepp_logan((N_x, N_x), dtype=np.float64)
x = x / np.max(x) # normalize image
# Show our source image
plt.figure()
plt.imshow(x, cmap='gray', vmin=0, vmax=1)
plt.title('Source Image')
plt.colorbar()
plt.show()
Define arbitrary k-space sampling locations¶
The 2D non-cartesian k-space samples need to be defined over a $2D (k_x, k_y)$ plane, where the number and location of samples in defined by our k-space readout trajectory. In this tutorial, we simulate a randomly perturbed variable density spiral. However, there are other trajectories also like cartesian, radial and any trajectory can be described simply as an array of $(k_x, k_y)$ samples.
Now, if we assume our image space to be defined by voxel units(i.e. $\Delta x=1$ in voxel unit), then $k_x^{max} = \frac{1}{2\Delta x}$ and $k_y^{max} = \frac{1}{2\Delta y}$. That means, our $(k_x, k_y)$ or k-space should extend between $[-0.5, 0.5]$. We'll acquire some random samples $(64\times64 = 4096)$ from the whole k-space.
N_k = 64**2 # number of k-samples
t = np.linspace(0, np.sqrt(0.5), N_k) # dummy variable to parameterize spiral
k_x = (1 + np.random.randn(N_k) / 20) * t**2 * np.cos(2 * np.pi * 32 * t) # spiral kx-coords
k_y = (1 + np.random.randn(N_k) / 20) * t**2 * np.sin(2 * np.pi * 32 * t) # spiral ky-coords
# Show scatterplot of trajectory
plt.figure()
plt.scatter(k_x, k_y, s=5)
plt.axis('square')
plt.grid(True)
plt.xlabel(f'$k_x$')
plt.ylabel(f'$k_y$')
plt.title('k-space Sampling Locations')
plt.show()
Define Discrete Fourier Encoding Matrix¶
In this part, we explicitly construct our Forward Model which represents the acquisition of measurements from the scanner. MRI scanner takes samples from k-space (2D/3D) but here we're simulating it here by converting an image from image space to k-space and taking samples from the k-space. So, our forward model can be written as $d = Fx$, where $d$ represents our k-space data which we actually obtain from the scanner, $F$ is the Fourier encoding matrix (represents Fourier transform) and $x$ is the reconstructed image we should get from our acquired measurements.
The data at k-space location corresponds to the sum over all space of the image multiplied with a 2D complex exponential with spatial frequencies $k_x$ and $k_y$ in the $x$ and $y$ directions respectively. This describes a linear function (no surprise, as the Fourier transform is linear), and we can represent it in the discrete case as an encoding matrix $F$. Each row of the encoding matrix represents the measurement at a single k-space location.
NOTE: Our forward model is discrete here but in real-world scenerio, the forward encoding of MR data is essentially a continuous process.
Explanation: In MR, we actually measure the magnetization at the time of readout where the spins live in continuous space. So, the MR signal is the integral of magnetization over continuous $x$ and $y$. The equation can be expressed as: $$ d(k_x, k_y) = \int_{x=-\infty}^{\infty}\int_{y=-\infty}^{\infty}M(x, y)e^{-j2\pi[k_xx + k_yy]}\,dx\,dy $$
In our example, we replace the continuous image by a grid of voxels. So, we can write $M(x, y)$ as as: $$ M(x, y) \approx \sum_{n=1}^{N}x_nb_n(x, y) $$ where, $x_n$ represents a single pixel/voxel, $b_n(x, y)$ is the basis function of that voxel (A basis function defines the shape of one voxel in continuous space; we choose a rectangle because it corresponds to a constant-valued voxel.)
Inserting $M(x, y)$ into the continuous function, we get: $$ d(k_x, k_y) = \sum_{n=1}^{N}x_n\int b_n(x, y)e^{-j2\pi[k_xx + k_yy]}\,dx\,dy $$ If voxel basis functions are small rectangles of width $\Delta x$ and $\Delta y$, the integral approximately equals: $$ e^{-j2\pi[k_xx + k_yy]}\,\Delta x\,\Delta y $$ Thus, the discrete forward model becomes: $$ d(k_x, k_y) = \sum_{n=1}^{N}x_ne^{j2\pi[k_xx_n + k_yy_n]} $$ or in matrix form we can simply write it as: $$ d = Fx $$
If $x\in\mathbb{C}^N$ is the vectorized image and $d\in\mathbb{C}^M$ are the measurements then:
- columns of F correspond to $n = 1$ to $N$ (voxels).
- rows of F correspond to $m = 1$ to $M$ (k-space samples).
Each element is: $$ F_{m, n} = e^{-j2\pi[k_{mx}x_n + k_{my}y_n]} $$
# Define discrete Fourier encoding matrix
xidx, yidx = np.meshgrid(np.arange(-N_x / 2, N_x / 2), np.arange(-N_x / 2, N_x / 2))
# Loop over each k-location to construct its required spatial phase modulation
F = np.zeros((N_k, N_x, N_x), dtype=complex)
for i in range(N_k):
F[i, :, :] = np.exp(1j * 2 * np.pi * (k_x[i] * xidx + k_y[i] * yidx)) # 2D Fourier exponential
# Reshape F so that each row is a single k-space encoding
F = F.reshape(N_k, -1)
# Show an example encoding
plt.figure()
plt.imshow(np.angle(F[2000, :].reshape(N_x, N_x)), cmap='hsv', vmin=-np.pi, vmax=np.pi)
plt.title('Phase of Example 2D Fourier Encoding Modulation')
plt.colorbar()
plt.show()
Helper Functions¶
# HELPER FUNCTION
def nrmse(x, y, s):
"""
Normalized Root-Mean-Square Error.
"""
print(f'{s} NRMSE: {norm(x.flatten() - y.flatten()) / norm(y.flatten())}')
def plt_image(x, y, N, title):
"""
Plots Image Estimate Alongside a 5x Magnified Difference Image
"""
plt.figure()
plt.subplot(1, 2, 1)
plt.imshow(np.abs(x).reshape(N, N), cmap='gray', vmin=0, vmax=1)
plt.title(title)
plt.subplot(1, 2, 2)
plt.imshow(np.abs(x).reshape(N, N) - y, cmap='gray', vmin=-0.2, vmax=0.2)
plt.title('Difference x5')
plt.show()
Perform Forword Encoding/Measurement¶
We simulate data acquisition by running our image through the forward model. We first vectorize our image, then simply perform a matrix-vector multplication. In reality, we start our image reconstruction problems here, with d provided by the scanner, and the job of image reconstruction to find the image that corresponds "best" to the measured data in some sense. That is, we solve the inverse problem of finding $x$ given $d$ and $F$.
d = F @ x.flatten()
Direct Inverse Reconstruction¶
We now have the k-space samples. So, a inverse Fourier transform should give us the image back. However, a direct inverse is often impractical or impossible in non-Cartesian imaging because:
- F is often never explicitly constructed.
- F has no inverse unless it is square (same number of k-space points and voxels) and non-singular.
- when F is square and non-singular, it can still be very poorly conditioned.
- when F is large enough, explicit inversion is slow and computationally intensive.
In this case, we explicitly defined our $F$ matrix to be square and thus explicit inverse reconstruction is done to examine the impact of system conditioning.
# Direct Inverse Reconstruction
print(f'Condition number: {np.linalg.cond(F):.2G}') # condition number of F
est0 = np.linalg.solve(F, d) # compute a naive inverse (warning: slow)
# Print NRMSE to the ground truth and show image and difference image
nrmse(est0, x, 'Direct Inverse')
plt_image(est0, x, N_x, 'Direct Inverse Est.')
Condition number: 9.1E+19 Direct Inverse NRMSE: 556.0718844833709
Direct Regularized Least Squares¶
To improve conditioning, and to accommodate the general case where the inverse problem is ill-posed (more or fewer k-space samples than voxels), we now choose to solve for $x$ in the least-squares sense, with an additional L2-regularization factor if desired/necessary. The extra term can help identify a unique solution and prevent over-fitting. We want to minimize for $x$: $$ C(x) = ||Fx - d||_2^2 + \lambda||Lx||_2^{2} $$ where $L$ is some other linear operator or transform, and $\lambda$ is a relative weighting parameter between the data consistency (first term) and $L_2$ regularizer (second term). $C(x)$ is often refered to as the cost function. We solve for $x$ by finding the value of $x$ that minimizes the cost. Note that the optimal value of $x$ depends on how you define your cost (i.e. what regularisation is used, at what weighting), so what you get out depends entirely on what you ask for.
To directly solve this, we simply take the gradient (i.e. multivariate derivative) of and set it to zero to find the value of $x$ that optimises it. We take the gradient because our image is multivariate, where each voxel (or index in $x$) corresponds to its own variable. $$ \nabla C(x) = 2F^{*}(Fx - d) + 2\lambda L^{*}(Lx) $$ $$ \nabla C(x) = 2F^{*}Fx -2F^{*}d + 2\lambda L^{*}Lx $$ Set $\nabla C(x) = 0$: $$ F^{*}Fx - F^{*}d + \lambda L^{*}Lx = 0 $$ $$ (F^{*}F + \lambda L^{*}L)x = F^{*}d $$ So, out optimal $x$ can be found at: $$ x = (F^{*}F + \lambda L^{*}L)^{-1}F^{*}d $$ For $L_2$ regularization penalties, the optimal estimator is linear (i.e., it doesn't depend on $x$ directly). Furthermore, when $\lambda = 0$ (no regularization), this reduces to the simple pseudoinverse. Note that $F^*$ denotes the adjoint (or conjugate transpose in discrete cases) of $F$. It is also called the hermitian matrix and often written as $F^H$. Here we choose $L = I$, the identity matrix, which effectively penalises the energy of $x$.
# Direct Regularised Least Squares
lambda_ = 1E-4 # regularisation weighting
# Compute the regularized matrix
A = F.conj().T @ F + lambda_ * np.eye(N_x**2)
# Solve the linear system instead of explicitly computing the inverse
E = np.linalg.solve(A, F.conj().T) # E = A * F'
# Compute the condition number of A (not E, as E is not square)
print(f'Condition number: {np.linalg.cond(A):.2G}') # condition number of A
# Perform the reconstruction
est1 = E @ d # regularised reconstruction
# Print NRMSE to the ground truth and show image and difference image
nrmse(est1, x, 'Regularised Inverse')
plt_image(est1, x, N_x, 'Regularised Inverse')
Condition number: 2E+10 Regularised Inverse NRMSE: 0.3512716868141881
Iterative Regularised Least Squares¶
Because direct inversion is still slow and costly, we can solve the same cost function with no explicit inverse operations, by using iterative schemes. Let's consider the simplest case of a steepest gradient descent algorithm: $$ x^{n+1} = x^n - \mu \nabla C(x^n) $$ where $x^n$ is the n-th estimate of $x$, $\mu$ is the step size and $\nabla C(x^n)$ is the gradient of the cost evaluated at $x^n$. We can simplify the gradient $\nabla C$ as: $$ \nabla C(x) = Ax - b $$ where $A = (F^{*}F + \lambda L^{*}L)$ and $b = F^{*}d$.
While it does seem like extra work, these procedures, and other first order (gradient only) iterative methods like it (e.g. conjugate gradient methods) are often much more computationally efficient, particularly when the dimensionality of $x$ gets large.
Here, we continue using $L = I$, and we define a small fixed step size for simplicity (although you can estimate a more efficient, adaptive step size $\mu_n$ at every iteration). We define a maxinum number of iterations, but will stop iterating if the relative change in our estimate $x^n$ is small enough.
# HELPER FUNCTION
def grad_desc(x, grad, step, max_iter, tol):
"""
Steepest Gradient Descent
"""
ii = 0 # iteration counter
dx = np.inf # relative change measure
eps = 1e-10 # small value to avoid division by zero
while ii < max_iter and dx > tol: # check loop exit conditions
tmp = step * grad(x) # compute gradient step
dx = norm(tmp) / (norm(x) + eps) # compute relative change metric
x = x - tmp # update estimate
ii += 1 # update iteration count
return x
# Simplifying substitutions
A = F.conj().T @ F + lambda_ * np.eye(N_x**2)
b = F.conj().T @ d
# Define the gradient as a function, grad(x) = gradient(C(x))
grad = lambda x: A @ x - b
# Define a fixed step size, max number of iterations, and relative change tolerance
step = 1E-6
max_iter = 1000
tol = 1E-6
# Define an initial guess (starting point)
est2 = np.zeros(N_x**2)
# Run through steepest gradient descent
est2 = grad_desc(est2, grad, step, max_iter, tol)
# Print NRMSE to the ground truth and show image and difference image
nrmse(est2, x, 'Iterative L2 Reg')
plt_image(est2, x, N_x, 'Iterative L2 Reg')
Iterative L2 Reg NRMSE: 0.4108886640780802
Using Built-in Linear Solvers for L2 Penalties¶
We can solve the equation $Ax = b$ using linear solvers such as Conjugate Gradient. You can learn more about CG and other iterative optimization algorithms from here
# Using CG
est3, _ = cg(A, b, maxiter=max_iter)
nrmse(est3, x, 'CG')
plt_image(est3, x, N_x, 'CG Estimate')
CG NRMSE: 0.3772678270996327
Compressed Sensing (L1 Regularized) Reconstruction Using Total Variation (TV)¶
In the compressed sensing problem, we want to minimize for: $$ C(x) = \frac{1}{2}||Fx - d||_2^{2} + \lambda||Lx||_1 $$ where $||Lx||_1$ is the sparsity enforcing term, and $L$ is a sparsifying transform. Now, as before we want to take the gradient of $C(x)$ and solve for $x$. However, $||Lx||_1$ is not differentiable everywhere (think of absolute value function at the origin). So, we chose to use huber loss function as an approximation with: $$ h_\alpha(z) = \begin{cases} \dfrac{z^2}{2\alpha}, & |z| < \alpha, \\[8pt] |z| - \dfrac{\alpha}{2}, & |z| \ge \alpha, \end{cases} \qquad\text{and}\qquad \nabla h_\alpha(z) = \begin{cases} \dfrac{z}{\alpha}, & |z| < \alpha, \\[8pt] \dfrac{z}{|z|}, & |z| \ge \alpha. \end{cases} $$ such that $||Lx||_1 \approx \sum_i h_\alpha((Lx)_i)$
def huber(x, a):
"""
Differentiable Huber Loss to Replace L1.
"""
y = np.zeros_like(x)
ii = np.abs(x) < a # find values of x < a
y[ii] = np.abs(x[ii])**2 / (2 * a) # smooth quadratic part
ii = np.abs(x) >= a # find values of x >= a
y[ii] = np.abs(x[ii]) - a / 2 # abs(x) part
return y
def huber_grad(x, a):
"""
Gradient of Differentiable Huber Loss.
"""
y = np.zeros_like(x)
ii = np.abs(x) < a # find values of x < a
y[ii] = x[ii] / a # smooth quadratic part
ii = np.abs(x) >= a # find values of x >= a
y[ii] = x[ii] / np.abs(x[ii]) # abs(x) part
return y
# Plotting the L1 and Huber Approx and their gradients
xx = np.linspace(-5e-6, 5e-6, 1000)
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(xx, np.abs(xx), linewidth=3, label='L1 Norm')
plt.plot(xx, huber(xx, 1e-6), linewidth=3, label='Huber Approx')
plt.title('Smoothed loss')
plt.xlim([-5e-6, 5e-6])
plt.ylim([0, 5e-6])
plt.xlabel('x')
plt.ylabel('|x|')
plt.legend()
plt.grid(True)
plt.subplot(1, 2, 2)
dx = xx[1] - xx[0] # Numerical gradient of abs(x)
grad_abs = np.diff(np.abs(xx)) / dx
plt.plot(xx[:-1], grad_abs, linewidth=3, label='Gradient of L1 Norm')
plt.plot(xx, huber_grad(xx, 1e-6), linewidth=3, label='Gradient of Huber Loss')
plt.title('Gradients')
plt.xlim([-5e-6, 5e-6])
plt.ylim([-3, 3])
plt.xlabel('x')
plt.ylabel(r'$\nabla |x|$')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
Now with Huber approximation, we want to minimize for $$\tilde{C}(x) = \tfrac{1}{2}\|F x - d\|_2^2 + \lambda\, h_\alpha(Lx)$$ and $$\nabla \tilde{C}(x) = F^{*}(Fx - d) + \lambda\, L^{*}\nabla h_\alpha(Lx)$$ where $F$ is the MRI forward operator, $d$ is the measured k-space, $Lx$ image gradients or sparsifying transform (e.g, TV) and $h_\alpha$ is the Huber penalty applied to each element of $Lx$. So, it combines data consistency and Huber-smoothed total variation regularization.
However, in the $\nabla \tilde{C}(x)$, although $Lx$ is linear, $\nabla h_\alpha(z)$ is non-linear in $z$ because Huber switches between quadratic and linear depending on $|z|$. So $\nabla h_\alpha(Lx)$ is non-linear in $x$. Therefore, we cannot write $Ax = b$ and thus we cannot solve it using Conjugate Gradient or any linear solvers. Instead, we must use gradient descent or proximal algorithms. The steepest descent depends only on the gradient of the cost function, which is now well defined due to the Huber smoothing.
Now to compute $L^*\nabla h_\alpha(Lx)$ we need $L$(the finite forward difference operator) and $L*$ the adjoint operator.
Here we use a first-order finite-difference operator $D$ with circular boundary conditions to penalise spatial total variation, so $L=D$, which is defined as:
$$ D(f) = \begin{bmatrix} \dfrac{\partial}{\partial x} \\ \dfrac{\partial}{\partial y} \end{bmatrix} f = \begin{bmatrix} \dfrac{f - \mathrm{circshift}(f,\Delta x)}{\Delta x} \\[10pt] \dfrac{f - \mathrm{circshift}(f,\Delta y)}{\Delta y} \end{bmatrix}, $$
and
$$ D^{*}(g) = \begin{bmatrix} \dfrac{\partial^{*}}{\partial x} \quad \dfrac{\partial^{*}}{\partial y} \end{bmatrix} g = \dfrac{\partial^{*} g_x}{\partial x} + \dfrac{\partial^{*} g_y}{\partial y} = \dfrac{g_x - \mathrm{circshift}(g_x,-\Delta x)}{\Delta x} + \dfrac{g_y - \mathrm{circshift}(g_y,-\Delta y)}{\Delta y}. $$
Here the input to $D$ is a single image, and its output is two images (an $x$-difference and a $y$-difference). The input to $D^{*}$ is a pair of images, and the output is a single image. Succinctly, $$ D : (f) \mapsto (f_x,f_y), \qquad D^{*} : (f_x,f_y) \mapsto (f). $$ which is like a gradient/divergence pair.
a = 1E-16 # Huber-smoothing constant
lambda_ = 5E1 # sparsity regularisation factor
FF = F.conj().T @ F # pre-compute F'F
b = F.conj().T @ d # pre-compute F'd
# Define forward and adjoint finite difference operators
def D_fwd(a, N_x):
a = a.reshape(N_x, N_x)
Dx = (a - np.roll(a, -1, axis=0)).flatten()
Dy = (a - np.roll(a, -1, axis=1)).flatten()
return np.column_stack((Dx, Dy))
def D_adj(b, N_x):
b = b.reshape(N_x, N_x, 2)
dx = (b[:, :, 0] - np.roll(b[:, :, 0], 1, axis=0)).flatten()
dy = (b[:, :, 1] - np.roll(b[:, :, 1], 1, axis=1)).flatten()
return dx + dy
L = lambda x: D_fwd(x, N_x) # define forward sparsifying transform
Ladj = lambda x: D_adj(x, N_x) # define adjoint sparsifying transform
# Define gradient of CS cost
grad = lambda x: FF @ x - b + lambda_ * Ladj(huber_grad(L(x), a))
est_TV = np.zeros(N_x**2) # initial guess
step = 1E-6 # step size
tol = 1E-6 # minimum change tolerance
# Run through steepest gradient descent
est_TV = grad_desc(est_TV, grad, step, max_iter, tol)
# Print NRMSE to the ground truth and show image and difference image
nrmse(est_TV, x, 'CS-TV 1')
plt_image(est_TV, x, N_x, f'CS-TV estimate, {max_iter} iters')
CS-TV 1 NRMSE: 0.2878061081322516
# Run for more iterations
est_TV = grad_desc(est_TV, grad, step, 2*max_iter, tol)
nrmse(est_TV, x, 'CS-TV 2')
plt_image(est_TV, x, N_x, f'CS-TV estimate, {2 * max_iter} iters')
CS-TV 2 NRMSE: 0.15548629701792
# Run for even more iterations
est_TV = grad_desc(est_TV, grad, step, 3*max_iter, tol)
nrmse(est_TV, x, 'CS-TV 3')
plt_image(est_TV, x, N_x, f'CS-TV estimate, {3 * max_iter} iters')
CS-TV 3 NRMSE: 0.12078289690537965
# Run for even more iterations
est_TV = grad_desc(est_TV, grad, step, 5*max_iter, tol)
nrmse(est_TV, x, 'CS-TV 4')
plt_image(est_TV, x, N_x, f'CS-TV estimate, {5 * max_iter} iters')
CS-TV 4 NRMSE: 0.10434487283818164
We performed an iterative, non-Cartesian image reconstruction by optimising both L2-regularized and compressed sensing (L1) problems, and we did so without relying on any external dependencies. One of the goals of this tutorial was to demystify some of the steps involved in non-Cartesian MR image reconstruction and iterative, non-linear reconstruction algorithms, by explicitly defining everything in plain Python.
There are many ways in which the procedures above can be sped up or further optimized. For example, if using Fessler's NUFFT, the computation y=F*x can be replaced by the function call y=nufft(x,...), and similarly, x=F'*y can be replaced by x=nufft_adj(y,...). Faster convergence can also be found using conjugate gradient or quasi-newton methods (if the gradient is well defined), or by using one of many fast algorithms that can solve L1-regularized problems.