March 21, 2019

Matrices for Linear Maps

Matrices for Linear Maps

If you already knew some linear algebra before reading my posts, you might be wondering where the heck all the matrices are. The goal of this post is to connect the theory of linear maps and vector spaces to the theory of matrices and computation.

The key observation is that linear maps between finite-dimensional vector spaces can be completely described solely in terms of how they act on basis vectors. Let's explore this idea in a more precise setting.

Suppose T:UV is a linear map with dimU=n and dimV=m. Let (e1,,en) be a basis for U and let (f1,,fm) be a basis for V. Then for any vector uU, we may express u uniquely as a linear combination of basis vectors,

u=i=1naiei,

where aiF for 1in. Since T is a linear map, this means that

Tu=T(i=1naiei)(1)=i=1naiTei.

We can go further and decompose each TeiV as a linear combination of basis vectors in V,

Tei=j=1mbj,ifj,

where bj,iF for 1jm. Thus, we may rewrite equation (1) as

Tu=i=1nj=1maibj,ifj.

Since the scalars ai depend only on the input vector u and the basis vectors fj are fixed, this means that the way T acts on u is determined entirely by the scalars bj,i.

To put it another way, say we've chosen bases for U and V. Then everything about the linear map is encoded in a collection of m×n scalars, which describe how the map acts on basis vectors in U in terms of basis vectors in V.

If we were, for instance, to write the scalars bj,i in the following form:

[b]j,i=[b1,1b1,2b1,nb2,1b2,2b2,nbm,1bm,2bm,n],

then we would have a compact and easy to read way of describing the linear map T.

This is what we call an m×n matrix. In this case, [b]j,i is the matrix corresponding to the linear transformation T with respect to the bases (e1,,en) for U and (f1,,fm) for V. We sometimes write M(T) to denote the matrix for T, assuming we have already established which bases we are using for our domain and codomain.

It can be a bit confusing at first trying to remember what the shape of a linear map's matrix should be. The number of columns corresponds to the dimension of the domain, and the number of rows corresponds to the dimension of the codomain. The ith column of the matrix is really just the components of Tei. That is, we may think of the matrix M(T) as

M(T)=[Te1Te2Ten].

The scalar bj,i in the jth row and ith column of M(T) is thus the coefficient of the jth basis vector fj for V in the representation of Tei.

If T:FnFm is a linear map between vector spaces Fn and Fm, where F is a field, then unless otherwise specified it should be assumed that its matrix M(T) is with respect to the standard bases on Fn and Fm. That is, the collections of n- and m-tuples, respectively, with a 1 in a single slot and zeros everywhere else. Otherwise, it does not make sense to talk about a matrix for a linear map without first explicitly choosing bases for the vector spaces involved. This is very important.

Example. Recall the zero map we defined last time, 0:UV where U(u)=0 for every uU. Suppose (e1,,en) is any basis for U and (f1,,fm) is any basis for V.

To compute the matrix for the zero map with respect to these bases, all we need to do is figure out how it acts on basis vectors. Choose and i for which 1in. Then

0(ei)=j=1mbj,ifj=0

for some scalars bj,iF. But since (f1,,fm) is a basis for V, it is linearly independent and so this is only possible if bj,i=0 for 1jm. That means the matrix for the zero map looks like this:

M(0)=[000000000].

And it looks like this for any choice of bases for U and V, since there's only one way to write the zero vector in terms of any basis.

We call this the zero matrix of dimension m×n.

Example. Recall the identity map we defined last time, I:VV where Iv=v for every vV. Suppose (e1,,en) is any basis for V.

How does I act on basis vectors? Choose any i with 1in. Then

Iei=j=1nbj,iej=ei.

Since (e1,,en) is linearly independent, this is only possible if bj,i=0 for every ji and bi,i=1. That is,

(2)bj,i={0if ji,1if j=i.

This means that the matrix for the identity map is a square matrix which looks like this:

M(I)=[100010001],

with 1s along the main diagonal and 0s everywhere else. And it looks like this for any choice of basis for V, as long as we use the same basis for both copies of V (the domain and the codomain). Otherwise, all bets are off.

We call this the identity matrix of dimension n×n.

We also give a name to formula (2) for the scalars bj,i. The Kronecker delta is a function of two variables defined by

δj,i={0if ji,1if j=i.

We can rewrite the identity map in terms of the Kronecker delta, if we want:

M(I)=[δ1,1δ1,2δ1,nδ2,1δ2,2δ2,nδn,1δn,2δn,n].

There are a number of important operations we can define on matrices.

Definition. If [a]j,i and [b]j,i are m×n matrices, their sum is defined component-wise:

[a]j,i+[b]j,i=[a1,1a1,2a1,na2,1a2,2a2,nam,1am,2am,n]+[b1,1b1,2b1,nb2,1b2,2b2,nbm,1bm,2bm,n]=[a1,1+b1,1a1,2+b1,2a1,n+b1,na2,1+b2,1a2,2+b2,2a2,n+b2,nam,1+bm,1am,2+bm,2am,n+bm,n].

It is easy to show that with this definition and the normal operation of addition of functions, M(T1+T2)=M(T1)+M(T2) for any linear maps T1,T2:UV.

Addition of matrices is not defined for matrices of different sizes, since this would be like trying to add functions with different domains and codomains.

Definition. If [a]j,i is an m×n matrix, its scalar product with kF is defined component-wise:

k[a]j,i=k[a1,1a1,2a1,na2,1a2,2a2,nam,1am,2am,n]=k[ka1,1ka1,2ka1,nka2,1ka2,2ka2,nkam,1kam,2kam,n].

So far we have seen a lot of matrices but I have not shown you what they are useful for. The rest of this post will address their uses.

Suppose we have a linear map T:UV and bases (e1,,en) for U and (f1,,fm) for V. Then for any uU, we may write

u=i=1nuiei,

where the scalars uiF are unique. We can define a new sort of matrix — the matrix of the vector u, as follows:

M(u)=[u1u2un].

The natural question to ask is this: given the matrix M(T) for the linear map T and the matrix M(u) for the vector u, what does the matrix M(Tu) for the vector TuV look like? All we need to do is calculate its components in terms of the basis (f1,,fm),

Tu=i=1nj=1muibj,ifj.

So the matrix for Tu is

M(Tu)=[(Tu)1(Tu)2(Tu)m]=[i=1nuib1,ii=1nuib2,ii=1nuibm,i].

Observe that all of the information about M(Tu) is encoded in the matrices M(T) and M(u). So we make the following definition:

Definition. If [b]j,i is an m×n matrix for a linear map and [u]i is an n×1 matrix for a vector, we define matrix-vector multiplication as follows:

[b]j,i[u]i=[b1,1b1,2b1,nb2,1b2,2b2,nbm,1bm,2bm,n][u1u2un]=[i=1nuib1,ii=1nuib2,ii=1nuibm,i]=[i=1nuibj,i]j.

By definition then, we have that M(Tu)=M(T)M(u). This means if we have a matrix representation of a linear map and a matrix representation of a vector, both in terms of some bases, then we work entirely in terms of matrices. Incidentally, this is why it is common to write Tu to mean T(u) — because the action of the linear map T on the vector u can be calculated via multiplication of their matrices.

Now, suppose we have three vector spaces U, V and W, of dimensions n, m and l, respectively. Let T:UV and S:VW be linear maps, and let (e1,,en) be a basis for U, (f1,,fm) be a basis for V, and (g1,,gl) be a basis for W. We have already seen that if

u=i=1nuiei

and

Tei=i=1mbj,ifj

then

Tu=i=1ni=1muibj,ifj.

Let's take this one step further, and compute S(Tu). Suppose S acts on each basis vector fj as follows:

Sfj=k=1lck,jgk.

Then

S(Tu)=S(i=1ni=1muibj,ifj)=i=1ni=1muibj,iSfj=i=1ni=1mk=1luibj,ick,jgk.

We thus make the following definition:

Definition. If [b]j,i is an m×n matrix and [c]k,j is an l×m matrix, we defined matrix-matrix multiplication as follows:

[c]k,j[b]j,i=[c1,1c1,2c1,mc2,1c2,2c2,mcl,1cl,2cl,m][b1,1b1,2b1,nb2,1b2,2b2,nbm,1bm,2bm,n]=[j=1mbj,1c1,jj=1mbj,2c1,jj=1mbj,nc1,jj=1mbj,1c2,jj=1mbj,2c2,jj=1mbj,nc2,jj=1mbj,1cl,jj=1mbj,2cl,jj=1mbj,ncl,j]=[j=1mbj,ick,j]k,i.

Note that the product is an l×n matrix.

By definition then, we have that M(S)M(T)=M(ST). Which is super awesome! Matrix multiplication corresponds to function composition of linear maps. And of course the dimensions of the product should be l×n, since ST:UW. We often write ST as simply ST, since this is reminiscent of multiplication.

I'm going to end here for now because all of these indices are making my head spin. But we'll see matrices again all over the place.

Menu