March 21, 2019

Matrices for Linear Maps

If you already knew some linear algebra before reading my posts, you might be wondering where the heck all the matrices are. The goal of this post is to connect the theory of linear maps and vector spaces to the theory of matrices and computation.

The key observation is that linear maps between finite-dimensional vector spaces can be completely described solely in terms of how they act on basis vectors. Let's explore this idea in a more precise setting.

Suppose $T:U\to V$ is a linear map with $\dim U = n$ and $\dim V = m$. Let $(e_1,\ldots,e_n)$ be a basis for $U$ and let $(f_1,\ldots,f_m)$ be a basis for $V$. Then for any vector $u\in U$, we may express $u$ uniquely as a linear combination of basis vectors,

$$u=\sum_{i=1}^n a_i e_i,$$

where $a_i\in\F$ for $1\le i\le n$. Since $T$ is a linear map, this means that

$$\begin{align}
Tu &= T\left(\sum_{i=1}^n a_i e_i\right) \\
&= \sum_{i=1}^n a_i Te_i. \tag{1}
\end{align}$$

We can go further and decompose each $Te_i\in V$ as a linear combination of basis vectors in $V$,

$$Te_i = \sum_{j=1}^m b_{j,i} f_j,$$

where $b_{j,i}\in\F$ for $1\le j\le m$. Thus, we may rewrite equation $(1)$ as

$$Tu = \sum_{i=1}^n \sum_{j=1}^m a_i b_{j,i} f_j.$$

Since the scalars $a_i$ depend only on the input vector $u$ and the basis vectors $f_j$ are fixed, this means that the way $T$ acts on $u$ is determined entirely by the scalars $b_{j,i}$.

To put it another way, say we've chosen bases for $U$ and $V$. Then everything about the linear map is encoded in a collection of $m\times n$ scalars, which describe how the map acts on basis vectors in $U$ in terms of basis vectors in $V$.

If we were, for instance, to write the scalars $b_{j,i}$ in the following form:

$$\begin{bmatrix}
b
\end{bmatrix}_{j,i} = \begin{bmatrix}
b_{1,1} & b_{1,2} & \cdots & b_{1,n} \\
b_{2,1} & b_{2,2} & \cdots & b_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
b_{m,1} & b_{m,2} & \cdots & b_{m,n}
\end{bmatrix},$$

then we would have a compact and easy to read way of describing the linear map $T$.

This is what we call an $\mathbf{m\times n}$ matrix. In this case, $[b]_{j,i}$ is the matrix corresponding to the linear transformation $T$ with respect to the bases $(e_1,\ldots,e_n)$ for $U$ and $(f_1,\ldots,f_m)$ for $V$. We sometimes write $\M(T)$ to denote the matrix for $T$, assuming we have already established which bases we are using for our domain and codomain.

It can be a bit confusing at first trying to remember what the shape of a linear map's matrix should be. The number of columns corresponds to the dimension of the domain, and the number of rows corresponds to the dimension of the codomain. The $i$th column of the matrix is really just the components of $Te_i$. That is, we may think of the matrix $\M(T)$ as

$$\M(T) = \begin{bmatrix}
\mid & \mid & & \mid \\
Te_1 & Te_2 & \cdots & Te_n \\
\mid & \mid & & \mid
\end{bmatrix}.$$

The scalar $b_{j,i}$ in the $j$th row and $i$th column of $\M(T)$ is thus the coefficient of the $j$th basis vector $f_j$ for $V$ in the representation of $Te_i$.

If $T:\F^n\to\F^m$ is a linear map between vector spaces $\F^n$ and $\F^m$, where $\F$ is a field, then unless otherwise specified it should be assumed that its matrix $\M(T)$ is with respect to the standard bases on $\F^n$ and $\F^m$. That is, the collections of $n$- and $m$-tuples, respectively, with a $1$ in a single slot and zeros everywhere else. Otherwise, it does not make sense to talk about a matrix for a linear map without first explicitly choosing bases for the vector spaces involved. This is very important.

Example. Recall the zero map we defined last time, $0:U\to V$ where $U(u)=0$ for every $u\in U$. Suppose $(e_1,\ldots,e_n)$ is any basis for $U$ and $(f_1,\ldots,f_m)$ is any basis for $V$.

To compute the matrix for the zero map with respect to these bases, all we need to do is figure out how it acts on basis vectors. Choose and $i$ for which $1\le i\le n$. Then

$$\begin{align}
0(e_i) &= \sum_{j=1}^m b_{j,i} f_j \\
&= 0
\end{align}$$

for some scalars $b_{j,i}\in\F$. But since $(f_1,\ldots,f_m)$ is a basis for $V$, it is linearly independent and so this is only possible if $b_{j,i}=0$ for $1\le j\le m$. That means the matrix for the zero map looks like this:

$$\M(0) = \begin{bmatrix}
0 & 0 & \cdots & 0 \\
0 & 0 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 0
\end{bmatrix}.$$

And it looks like this for any choice of bases for $U$ and $V$, since there's only one way to write the zero vector in terms of any basis.

We call this the zero matrix of dimension $m\times n$.

Example. Recall the identity map we defined last time, $I:V\to V$ where $Iv=v$ for every $v\in V$. Suppose $(e_1,\ldots,e_n)$ is any basis for $V$.

How does $I$ act on basis vectors? Choose any $i$ with $1\le i\le n$. Then

$$\begin{align}
Ie_i &= \sum_{j=1}^n b_{j,i} e_j \\
&= e_i.
\end{align}$$

Since $(e_1,\ldots,e_n)$ is linearly independent, this is only possible if $b_{j,i} = 0$ for every $j\ne i$ and $b_{i,i} = 1$. That is,

$$b_{j,i} = \begin{cases}
0 & \text{if } j\ne i, \\
1 & \text{if } j=i.
\end{cases} \tag{2}$$

This means that the matrix for the identity map is a square matrix which looks like this:

$$\M(I) = \begin{bmatrix}
1 & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 1
\end{bmatrix},$$

with $1$s along the main diagonal and $0$s everywhere else. And it looks like this for any choice of basis for $V$, as long as we use the same basis for both copies of $V$ (the domain and the codomain). Otherwise, all bets are off.

We call this the identity matrix of dimension $n\times n$.

We also give a name to formula $(2)$ for the scalars $b_{j,i}$. The Kronecker delta is a function of two variables defined by

$$\delta_{j,i} = \begin{cases}
0 & \text{if } j\ne i, \\
1 & \text{if } j=i.
\end{cases}$$

We can rewrite the identity map in terms of the Kronecker delta, if we want:

$$\M(I) = \begin{bmatrix}
\delta_{1,1} & \delta_{1,2} & \cdots & \delta_{1,n} \\
\delta_{2,1} & \delta_{2,2} & \cdots & \delta_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
\delta_{n,1} & \delta_{n,2} & \cdots & \delta_{n,n}
\end{bmatrix}.$$

There are a number of important operations we can define on matrices.

Definition. If $\begin{bmatrix} a \end{bmatrix}_{j,i}$ and $\begin{bmatrix} b \end{bmatrix}_{j,i}$ are $m\times n$ matrices, their sum is defined component-wise:

$$\begin{align}
\begin{bmatrix} a \end{bmatrix}_{j,i} + \begin{bmatrix} b \end{bmatrix}_{j,i} &= \begin{bmatrix}
a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n}
\end{bmatrix} + \begin{bmatrix}
b_{1,1} & b_{1,2} & \cdots & b_{1,n} \\
b_{2,1} & b_{2,2} & \cdots & b_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
b_{m,1} & b_{m,2} & \cdots & b_{m,n}
\end{bmatrix} \\[1em]
&= \begin{bmatrix}
a_{1,1} + b_{1,1} & a_{1,2} + b_{1,2} & \cdots & a_{1,n} + b_{1,n} \\
a_{2,1} + b_{2,1} & a_{2,2} + b_{2,2} & \cdots & a_{2,n} + b_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} + b_{m,1} & a_{m,2} + b_{m,2} & \cdots & a_{m,n} + b_{m,n}
\end{bmatrix}.
\end{align}$$

It is easy to show that with this definition and the normal operation of addition of functions, $M(T_1 + T_2) = M(T_1) + M(T_2)$ for any linear maps $T_1,T_2:U\to V$.

Addition of matrices is not defined for matrices of different sizes, since this would be like trying to add functions with different domains and codomains.

Definition. If $\begin{bmatrix} a \end{bmatrix}_{j,i}$ is an $m\times n$ matrix, its scalar product with $k\in\F$ is defined component-wise:

$$\begin{align}
k\begin{bmatrix} a \end{bmatrix}_{j,i} &= k\begin{bmatrix}
a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n}
\end{bmatrix} \\[1em]
&= \phantom{k}\begin{bmatrix}
ka_{1,1} & ka_{1,2} & \cdots & ka_{1,n} \\
ka_{2,1} & ka_{2,2} & \cdots & ka_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
ka_{m,1} & ka_{m,2} & \cdots & ka_{m,n}
\end{bmatrix}.
\end{align}$$

So far we have seen a lot of matrices but I have not shown you what they are useful for. The rest of this post will address their uses.

Suppose we have a linear map $T:U\to V$ and bases $(e_1,\ldots,e_n)$ for $U$ and $(f_1,\ldots,f_m)$ for $V$. Then for any $u\in U$, we may write

$$u = \sum_{i=1}^n u_i e_i,$$

where the scalars $u_i\in\F$ are unique. We can define a new sort of matrix — the matrix of the vector $u$, as follows:

$$\M(u) = \begin{bmatrix}
u_1 \\
u_2 \\
\vdots \\
u_n
\end{bmatrix}.$$

The natural question to ask is this: given the matrix $M(T)$ for the linear map $T$ and the matrix $M(u)$ for the vector $u$, what does the matrix $M(Tu)$ for the vector $Tu\in V$ look like? All we need to do is calculate its components in terms of the basis $(f_1,\ldots,f_m)$,

$$Tu = \sum_{i=1}^n \sum_{j=1}^m u_i b_{j,i} f_j.$$

So the matrix for $Tu$ is

$$\begin{align}
\M(Tu) &= \begin{bmatrix}
(Tu)_1 \\
(Tu)_2 \\
\vdots \\
(Tu)_m
\end{bmatrix} \\[1em]
&= \begin{bmatrix}
\sum_{i=1}^n u_i b_{1,i} \\
\sum_{i=1}^n u_i b_{2,i} \\
\vdots \\
\sum_{i=1}^n u_i b_{m,i}
\end{bmatrix}.
\end{align}$$

Observe that all of the information about $\M(Tu)$ is encoded in the matrices $\M(T)$ and $\M(u)$. So we make the following definition:

Definition. If $\begin{bmatrix} b \end{bmatrix}_{j,i}$ is an $m\times n$ matrix for a linear map and $\begin{bmatrix} u \end{bmatrix}_{i}$ is an $n\times 1$ matrix for a vector, we define matrix-vector multiplication as follows:

$$\begin{align}
\begin{bmatrix} b \end{bmatrix}_{j,i} \begin{bmatrix} u \end{bmatrix}_{i} &= \begin{bmatrix}
b_{1,1} & b_{1,2} & \cdots & b_{1,n} \\
b_{2,1} & b_{2,2} & \cdots & b_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
b_{m,1} & b_{m,2} & \cdots & b_{m,n}
\end{bmatrix} \begin{bmatrix}
u_1 \\
u_2 \\
\vdots \\
u_n
\end{bmatrix} \\[1em]
&= \begin{bmatrix}
\sum_{i=1}^n u_i b_{1,i} \\
\sum_{i=1}^n u_i b_{2,i} \\
\vdots \\
\sum_{i=1}^n u_i b_{m,i}
\end{bmatrix} \\[1em]
&= \begin{bmatrix} \sum_{i=1}^n u_i b_{j,i} \end{bmatrix}_j.
\end{align}$$

By definition then, we have that $\M(Tu)=\M(T)\M(u)$. This means if we have a matrix representation of a linear map and a matrix representation of a vector, both in terms of some bases, then we work entirely in terms of matrices. Incidentally, this is why it is common to write $Tu$ to mean $T(u)$ — because the action of the linear map $T$ on the vector $u$ can be calculated via multiplication of their matrices.

Now, suppose we have three vector spaces $U$, $V$ and $W$, of dimensions $n$, $m$ and $l$, respectively. Let $T:U\to V$ and $S:V\to W$ be linear maps, and let $(e_1,\ldots,e_n)$ be a basis for $U$, $(f_1,\ldots,f_m)$ be a basis for $V$, and $(g_1,\ldots,g_l)$ be a basis for $W$. We have already seen that if

$$u=\sum_{i=1}^n u_i e_i$$

and

$$Te_i = \sum_{i=1}^m b_{j,i}f_j$$

then

$$Tu = \sum_{i=1}^n \sum_{i=1}^m u_i b_{j,i} f_j.$$

Let's take this one step further, and compute $S(Tu)$. Suppose $S$ acts on each basis vector $f_j$ as follows:

$$Sf_j = \sum_{k=1}^l c_{k,j}g_k.$$

Then

$$\begin{align}
S(Tu) &= S\left(\sum_{i=1}^n \sum_{i=1}^m u_i b_{j,i} f_j\right) \\
&= \sum_{i=1}^n \sum_{i=1}^m u_i b_{j,i} Sf_j \\
&= \sum_{i=1}^n \sum_{i=1}^m \sum_{k=1}^l u_i b_{j,i} c_{k,j} g_k.
\end{align}$$

We thus make the following definition:

Definition. If $\begin{bmatrix} b \end{bmatrix}_{j,i}$ is an $m\times n$ matrix and $\begin{bmatrix} c \end{bmatrix}_{k,j}$ is an $l\times m$ matrix, we defined matrix-matrix multiplication as follows:

$$\begin{align}
\begin{bmatrix} c \end{bmatrix}_{k,j} \begin{bmatrix} b \end{bmatrix}_{j,i} &= \begin{bmatrix}
c_{1,1} & c_{1,2} & \cdots & c_{1,m} \\
c_{2,1} & c_{2,2} & \cdots & c_{2,m} \\
\vdots & \vdots & \ddots & \vdots \\
c_{l,1} & c_{l,2} & \cdots & c_{l,m}
\end{bmatrix}\begin{bmatrix}
b_{1,1} & b_{1,2} & \cdots & b_{1,n} \\
b_{2,1} & b_{2,2} & \cdots & b_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
b_{m,1} & b_{m,2} & \cdots & b_{m,n}
\end{bmatrix} \\[1em]
&= \begin{bmatrix}
\sum_{j=1}^m b_{j,1}c_{1,j} & \sum_{j=1}^m b_{j,2}c_{1,j} & \cdots & \sum_{j=1}^m b_{j,n}c_{1,j} \\
\sum_{j=1}^m b_{j,1}c_{2,j} & \sum_{j=1}^m b_{j,2}c_{2,j} & \cdots & \sum_{j=1}^m b_{j,n}c_{2,j} \\
\vdots & \vdots & \ddots & \vdots \\
\sum_{j=1}^m b_{j,1}c_{l,j} & \sum_{j=1}^m b_{j,2}c_{l,j} & \cdots & \sum_{j=1}^m b_{j,n}c_{l,j}
\end{bmatrix} \\[1em]
&= \begin{bmatrix} \sum_{j=1}^m b_{j,i} c_{k,j} \end{bmatrix}_{k,i}.
\end{align}$$

Note that the product is an $l\times n$ matrix.

By definition then, we have that $\M(S)\M(T)=\M(S\circ T)$. Which is super awesome! Matrix multiplication corresponds to function composition of linear maps. And of course the dimensions of the product should be $l\times n$, since $S\circ T:U\to W$. We often write $S\circ T$ as simply $ST$, since this is reminiscent of multiplication.

I'm going to end here for now because all of these indices are making my head spin. But we'll see matrices again all over the place.