March 16, 2019

Vector Spaces (2) - Direct Sums, Span and Linear Independence

Vector Spaces (2) - Direct Sums, Span and Linear Independence
  1. Direct Sums
  2. Span and Linear Combinations
  3. Linear Independence

Direct Sums

I left off last time with an example of a sum of subspaces with a rather important property. Namely, every vector in the sum had a unique representation as a sum of vectors in the component subspaces. Not all sums have this property, but it is so important that it deserves special consideration.

Definition. Let $U$ and $W$ be subspaces of a vector space $V$ for which $V=U+W$. This sum is a direct sum if for every $\v\in V$, the representation

$$\v=\u+\w,$$

where $\u\in U$ and $\w\in W$, is unique. If a sum is direct, it is expressed symbolically as

$$ V=U\oplus W.$$

Similarly, if $U_1,U_2,\ldots,U_n$ are subspaces of $V$ for which

$$\begin{align}
V &= \sum_{i=1}^n U_i \\
&= U_1 + U_2 + \cdots + U_n,
\end{align}$$

then this sum is direct if for every $\v\in V$, the representation

$$\begin{align}
\v &= \sum_{i=1}^n \u_i \\
&= \u_1 + \u_2 + \cdots + \u_n,
\end{align}$$

where $\u_i\in U_i$ for every $i$, is unique. Such a direct sum is expressed symbolically as

$$\begin{align}
V &= \bigoplus_{i=1}^n U_i \\
&= U_1 \oplus U_2 \oplus \cdots \oplus U_n.
\end{align}$$

Notice that a direct sum of linear subspaces is not really its own thing. It is a normal sum which happens to also have the property of being direct. You do not start with two subspaces and take their direct sum. You take the sum of subspaces, and that sum may happen to be direct.

We have already seen an example of a sum which is direct. You may find it more enlightening if we exhibit an example of a sum which is not direct.

Example. Consider the subspaces

$$\begin{align}
U &= \set{(x,y,0)\in\R^3\mid x,y\in\R}, \\
W &= \set{(0,y,z)\in\R^3\mid y,x\in\R}
\end{align}$$

of the vector space $\R^3$. Clearly $\R^3=U+W$. The vector $(1,1,1)\in\R^3$ can be expressed as

$$(1,1,1)=(1,1,0)+(0,0,1),$$

where $(1,1,0)\in U$ and $(0,0,1)\in W$. However, it can also be expressed as

$$(1,1,1)=(1,2,0)+(0,-1,1),$$

where $(1,2,0)\in U$ and $(0,-1,1)\in W$. In fact, there are infinitely many ways to decompose this into such a sum — the numbers in the second components just need to add up to $1$. Since the representation of $(1,1,1)$ as a sum of a vector in $U$ and a vector in $W$ is not unique, it follows that the sum $R^3=U+W$ is not direct.

It is sometimes difficult to determine whether a sum is direct using only the definition. This would require some sort of insight as to why it must be direct, or the demonstration of a counterexample. This next proposition gives us a somewhat simplified method for checking whether a sum of subspaces is direct.

Proposition. Suppose $U_1,U_2,\ldots,U_n$ are subspaces of a vector space $V$ for which

$$V=\sum_{i=1}^n U_i.$$

Then

$$V=\bigoplus_{i=1}^n U_i$$

if and only if the choices of $\u_i\in U_i$ which satisfy

$$\zv=\u_1+\u_2+\cdots+\u_n$$

are the vectors $\u_1=\u_2=\cdots=\u_n=\zv$.

Proof. First, suppose that

$$V=\bigoplus_{i=1}^n U_i,$$

and for each $i$ choose $\u_i\in U_i$ so that

$$\zv=\u_1+\u_2+\cdots+\u_n.$$

Because the above sum is direct, this representation of $\zv$ must be unique. It must be the case then that $\u_1=\u_2=\cdots=\u_n=\zv$.

Suppose next that the only way to satisfy

$$\zv=\u_1+\u_2+\cdots+\u_n$$

where $\u_i\in U_i$ for each $i$, is by taking $\u_1=\u_2=\cdots=\u_n=\zv$. For any $\v\in V$, we may write

$$\v=\v_1+\v_2+\cdots+\v_n, \tag{1}$$

where $\v_i\in U_i$ for each $i$. Suppose that we can also write

$$\v=\w_1+\w_2+\cdots+\w_n, \tag{2}$$

where $\w_i\in U_i$ for each $i$ Subtracting $(2)$ from $(1)$ yields

$$\zv=(\v_1-\w_1)+(\v_2-\w_2)+\cdots+(\v_n-\w_n).$$

For each $i$, we have that $\v_i-\w_i\in U_i$, since each $U_i$ is a subspace and thus closed under vector addition. Thus, since we have assumed $\zv$ to have a unique expression of this form, it follows that $\v_i-\w_i=\zv$ for all $i$. That is, $\v_i=\w_i$ for all $i$. We have shown that the expression for $\v$ is unique, and thus

$$V=\bigoplus_{i=1}^n U_i.$$

This result tells us that all we need to do in order to check that a sum of subspaces is direct is to make sure that there is only one way to express the zero vector as a sum of vectors from each subspace. This is certainly simpler than checking that every vector has a unique representation of this form. However, in the case of two subspaces, we can do even better.

Corollary. Suppose $U$ and $W$ are subspaces of a vector space $V$ and that $V=U+W$. Then $V=U\oplus W$ if and only if $U\cap W=\set{\zv}$.

Proof. Suppose first that $V=U\oplus W$ and choose $\v\in U\cap W$. Then $\v\in U$ and $\v\in W$, so we also have that $-\v\in W$ because $W$ is a subspace of $V$ and so it is closed under scalar multiplication (in this case by $-1$). We can thus write

$$\zv=\v+(-\v),$$

and this expression is unique by the above proposition. It follows that $\v=-\v-\zv$, and thus $U\cap W =\set{\zv}$.

Next, suppose that $U\cap W=\set{\zv}$ and let

$$ \zv=\u+\w,$$

where $\u\in U$ and $\w\in W$. Then $\u=-\w$ and so $-\w\in U$, which implies that $\w\in U$. Since $\w\in U$ and $w\in W$, it follows that $\w\in U\cap W$. By hypothesis, this means that $\w=\zv$ and thus $\u=\zv$. We have shown that the only way to decompose $\zv$ in the form above is as the sum of zero vectors. Thus, it follows from the above proposition that $V=U\oplus W$, completing the proof.

This result tells us that the sum of two subspaces is direct whenever their intersection is trivial. Unfortunately, this result does not generalize to sums of three or more subspaces. In fact, in the case of three or more subspaces, even pairwise intersections being trivial is not enough to guarantee that a sum is direct.

Span and Linear Combinations

Given a vector space $V$ and a vector $\v\in V$, what is the smallest subspace of $V$ containing $\v$? It's not a trick question — the answer is actually somewhat obvious. It's the set $U=\set{a\v\in V\mid a\in\F}$ of all scalar multiples of $\v$. It is certainly clear that $U$ satisfies the criteria of a subspace. Furthermore, the only proper subspace of $U$ is $\set{\zv}$, so it is certainly the smallest subspace of $V$ which contains $\v$. This is a very important idea, and it leads us to our next definition.

Definition. If $V$ is a vector space with $\v\in V$, then the span of $\v$ is the subspace

$$\span\v = \set{a\v\in V\mid a\in\F}.$$

Furthermore, if $(\v_1,\ldots,\v_n)$ is a list of vectors in $V$, then the span of $(v_1,\ldots,v_n)$ is the subspace

$$\begin{align}
\span(\v_1,\ldots,\v_n) &= \sum_{i=1}^n \span\v_i \\
&= \sum_{i=1}^n \set{a\v_i\in V\mid a\in\F} \\
&= \bset{\sum_{i=1}^n a\v_i\in V\mid a\in\F}.
\end{align}$$

For consistency, we define the span of the empty list to be

$$\span()=\set{\zv}.$$

If $V=\span(\v_1,\ldots,\v_n)$, we say that $V$ is spanned by the list $(\v_1,\ldots,\v_n)$, or that this list spans $V$.

It is important to distinguish between lists and sets. In a list, the order of the entries matters and entries can be repeated more than once.

In the above definition, we first defined the span of one vector to be the smallest subspace containing that vector. We then defined the span of a list of vectors to be the sum of the spans of each vector in the list.

Equivalently, the span of a list of vectors is a subspace containing every possible sum of scalar multiples of those vectors. In order to avoid having to repeat the phrase "sum of scalar multiples" over and ove, we define a new term which means the same thing.

Definition. Let $V$ be a vector space with $\v_1,\v_2,\ldots,\v_n\in V$. A linear combination of the vectors $\v_1,\v_2,\ldots,\v_n$ is an expression of the form

$$\sum_{i=1}^n a_i\v_i = a_1\v_1+a_2\v_2+\cdots+a_n\v_n,$$

where $a_1,a_2,\ldots,a_n\in\F$.

Example. Consider the vectors $(1,2)$, $(3,4)$ and $(0,-1)$ in the vector space $\R^2$. Some linear combinations of these vectors include

$$\begin{align}
(9,9)&=3(1,2)+2(3,4)+5(0,-1), \\
(-1,0)&=2(1,2)-\phantom{1}(3,4)+0(0,-1), \\
(1,2)&=\phantom{1}(1,2)+0(3,4)+0(0,-1).
\end{align}$$

With this definition, we can now think of the span of a list of vectors as the set of all linear combinations of those vectors. In addition, we can now see that a vector space $V$ is spanned by a list of vectors if every vector in $V$ can be expressed as a linear combination of vectors in that list.

Although we are not quite ready to define the dimension of a vector space, we now have the necessary machinery to distinguish between finite-dimensional and infinite-dimensional vector spaces.

Definition. A vector space $V$ is finite-dimensional if it is spanned by a finite list of vectors in $V$.

If no finite list of vectors spans $V$, then $V$ is infinite-dimensional.

We will be primarily concerned with finite-dimensional vector spaces. These have much nicer properties and are considerably easier to work with and conceptualize. Infinite-dimensional vector spaces can be extremely poorly behaved and are dealt with in a branch of mathematics called functional analysis.

Example. For any positive integer $n$, the vector space $\R^n$ is finite-dimensional. This is because

$$\begin{align}
\R &= \span(1), \\
\R^2 &= \span\big((1,0), (0,1)\big), \\
\R^3 &= \span\big((1,0,0), (0,1,0), (0,0,1)\big), \\
\R^4 &= \span\big((1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1)\big), \\
&\;\; \vdots
\end{align}$$

Of course, there are other lists of vectors that span each $\R^n$, but to show that a vector space is finite-dimensional, we need only demonstrate that one such list exists.

Example. We have already been introduced to an infinite-dimensional vector space, namely $\P(\F)$. This is the set of polynomials with coefficients in some field $\F$.

To show that it is infinite-dimensional, consider any finite list of polynomials in $\P(\F)$ and let $n$ be the highest degree of the polynomials in the list. Then the polynomial $x^{n+1}$ is not in the span of this list, so it follows that no finite list of polynomials spans $\P(\F)$.

Example. On the other hand, the vector space $\P_n(\F)$ of polynomials whose degree is at most $n$ is finite-dimensional.

It is tempting to try to define the dimension of a finite-dimensional vector space using the machinery we have already developed. We could, perhaps, attempt to define the dimension of $V$ as the length of a list of vectors which spans $V$. Unfortunately, this is not well-defined because a vector space can have many spanning lists of different lengths. For example,

$$\begin{align}
\R^2 &= \span\big((0,1),(1,0)\big) \\
&= \span\big((0,1),(1,0),(1,0)\big) \\
&= \span\big((1,2),(2,3),(3,4),(4,5)\big),
\end{align}$$

because any vector in $\R^2$ can be expressed as a linear combination of vectors in each list. In fact, we can always take a list which spans a vector space $V$ and add to it any vector in $V$, resulting in a new list which also spans $V$. This means that the length of a spanning list is certainly not unique, and so we cannot define dimension in this way.

Linear Independence

Intuition tells us that we are on the correct track, though. It should be somewhat obvious that the vector space $\R^2$ cannot possibly be spanned by any list containing less than two vectors. If this is truly the case, it should make sense to define the dimension of $\R^2$ to be $two$. We will therefore work toward figuring out the minimum length of a spanning list of vectors. The important concept of linear independence, which we will now define, will help us with this.

Definition. A list $(\v_1,\v_2,\ldots,\v_n)$ of vectors in a vector space $V$ is linearly independent if the only choices of $a_i\in F$ which satisfy

$$\sum_{i=1}^n a_i\v_i = \zv \tag{3}$$

are the scalars $a_1=a_2=\cdots=a_n=0$.

If there exist $a_i\in\F$, not all zero, for which $(3)$ holds, then the list $(\v_1,\v_2,\ldots,\v_n)$ is linearly dependent.

In every vector space, any list containing the zero vector must be linearly dependent. This is because we can take the coefficient of $\zv$ to be any scalar.

Let's look at some more interesting examples.

Example. In $\R^2$, the list

$$\big((0,1),(1,0)\big)$$

is linearly independent because

$$a_1(0,1)+a_2(1,0)=(0,0)$$

only if $a_1=a_2=0$. No matter how hard we try, we cannot come up with any other linear combination of these vectors which yields the zero vector.

However, if we add $(1,0)$ to the list, the resulting list

$\big((0,1),(1,0),(1,0)\big)$$

is linearly dependent because

$$0(0,1)+(1,0)-(1,0)=(0,0).$$

Example. As a less trivial example, the list

$$\big((1,2),(2,3),(3,4),(4,5)\big)$$

is also linearly dependent because

$$(1,1)+(2,3)-5(3,4)+3(4,5)=(0,0).$$

Up until now, we have sort of been weaving a twisted web of interconnected ideas. Our next theorem relates them all and gives an alternate characterization of linear independence in terms of spans and direct sums.

Theorem. A list $(\v_1,\v_2,\ldots,\v_n)$ of nonzero vectors in a vector space $V$ is linearly independent if and only if

$$\span(\v_1,\v_2,\ldots,\v_n) = \bigoplus_{i=1}^n \span\v_i.$$

Proof. First, suppose that the list $(\v_1,\v_2,\ldots,\v_n)$ is linearly independent. Then by the definition of span, we have that

$$\span(\v_1,\v_2,\ldots,\v_n) = \sum_{i=1}^n \span\v_i.$$

From our earlier proposition, we need to show that only choices of $\u_i\in\span\v_i$ for which

$$\zv=\u_1+\u_2+\cdots+\u_n$$

are the vectors

$$\u_1=\u_2=\cdots=\u_n=\zv.$$

Since $u_i\in\span\v_i$ for each $i$, we have that $\u_i=a_i\v_i$ for some $a_i\in\F$. Since $(\v_1,\v_2,\ldots,\v_n)$ is linearly independent, we have that

$$\begin{align}
\zv &= a_1\v_1+a_2\v_2+\cdots+a_n\v_n \\
&= \u_1+\u_2+\cdots+\u_n
\end{align}$$

only if $a_1=a_2=\cdots=a_n=0$, which implies that $\u_i=\zv$ for all $i$. If follows then that the desired sum is direct.

Suppose next that

$$\span(\v_1,\v_2,\ldots,\v_n) = \bigoplus_{i=1}^n \span\v_i.$$

Again from the above proposition, we know that the only choices of $\u_i\in\span\v_i$ which satisfy

$$\u_1=\u_2=\cdots=\u_n=\zv.$$

Since each $\u_i\in\span\v_i$, it follows that for each $i$, there is some $a_i\in\F$ for which $\u_i=a_i\v_i$. Since each $\v_i\ne 0$, it must be that each $a_i=0$. Thus, the list $(\v_1,\v_2,\ldots,\v_n)$ is linearly independent.

This theorem establishes a somewhat different way of thinking about linear independence, and since the two are equivalent we are free to use either as a definition of linear independence.

Linearly depdendent lists have an interesting, yet fairly undesirable, property which is explained in the next theorem. This result is commonly called the linear dependence lemma, and it will be crucial for many future arguments. It states than any linearly dependent list contains a vector which is in the span of the previous vectors in the list. Furthermore, it asserts that removing this vector from the list does not affect the list's span.

Linear Dependence Lemma/Steinitz Exchange Lemma. Suppose $(\v_1,\v_2,\ldots,\v_n)$ is a linearly dependent list of vectors in a vector space $V$ for which $\v_1\ne\zv$. Then there exists a natural number $i\ge 2$ for which

$(a) \;\;$ $\v_i\in\span(\v_1,\v_2,\ldots,\v_n)$,
$(b) \;\;$ removing $\v_i$ from $(\v_1,\v_2,\ldots,\v_n)$ results in a list with the same span.

Proof. Since $(\v_1,\v_2,\ldots,\v_n)$ is linearly dependent, there exist $a_1,a_2,\ldots,a_n\in\F$, not all zero, for which

$$a_1\v_1+a_2\v_2+\cdots+a_n\v_n = 0.$$

We have assumed that $\v_1\ne\zv$, so it is possible that $a_1=0$, but not all of $a_2,a_3,\ldots,a_n$ can equal zero. Let $i$ be the largest index for which $a_i\ne 0$. Then

$$a_1\v_1+a_2\v_2+\cdots+a_i\v_i = 0,$$

so we may write

$$\v_i = -\frac{a_1}{a_i}\v_1-\frac{a_2}{a_i}\v_2 - \cdots - \frac{a_{i-1}}{a_i}\v_{i-1}.$$

Since we have expressed $\v_i$ as a linear combination of $\v_1,\v_2,\ldots,\v_{i-1}$, it follows that $\v_i\in\span(\v_1,\v_2,\ldots,\v_n)$. This completes the proof of $(a)$.

Next, choose any $\u\in\span(\v_1,\v_2,\ldots,\v_n)$. By definition, there exist $a_1,a_2,\ldots,a_n\in\F$ for which

$$\u=a_1\v_1+a_2\v_2+\cdots+a_n\v_n.$$

Using the result of $(a)$, we can replace $v_i$ in the above equation with a linear combination of the vectors $\v_1,\v_2,\ldots,\v_{i-1}$. Thus, $\u$ is also in the span of the list which results from removing $\v_i$.

Now we will establish that linearly independent lists possess the property I mentioned earlier. Namely, we will show that linearly independent lists are the shortest possible spanning lists.

Theorem. In any finite-dimensional vector space $V$, the length of a list of vectors which spans $V$ is never shorter than any linearly independent list.

Proof. Suppose $(\u_1,\u_2,\ldots,\u_m)$ is a list of vectors which spans $V$, and that $(\v_1,\v_2,\ldots,\v_n)$ is a linearly independent list. This implies $\v_i\ne\zv$ for every $i$. We will show that $m\ge n$ through the following algorithmic construction.

Step 1.
Because $(\u_1,\u_2,\ldots,\u_m)$ spans $V$, adding any vector to this list must result in a list which is linearly dependent. Namely, the list $L_{1,1}=(\v_1,\u_1,\u_2,\ldots,\u_m)$ is linearly dependent. It follows from the linear dependence lemma that there exists a positive integer $i\le m$ for which the removal of $\u_i$ from $L_{1,1}$ results in a new list $L_{1,2}$ which still spans $V$. Notice that $L_{1,2}$ is a list of length $m$. To simplify matters, we replace the index of every $\u_j$ for which $j>i$ with $j-1$, so that we can write

$$L_{1,2}=(\v_1,\u_1,\u_2,\ldots,\u_{m-1}).$$

Step k.
Because the list $L_{k-1,2}$ from the previous step spans $V$, adding any vector to this list must result in a list which is linearly dependent. Namely, the list $L_{k,1}$ formed by adding $\v_k$ to the beginning of $L_{k-1,1}$ is linearly dependent. It follows from the linear dependence lemma that we can remove some vector from $L_{k,1}$ without affecting its span. Since

$$L_{k,1}=(\v_k,\v_{k-1},\ldots,\v_1,\u_1,\u_2,\ldots,\u_{m-k+1})$$

and $(\v_k,\v_{k-1},\ldots,\v_1)$ is linearly independent, there must exist a positive integer $i\le m-k+1$ for which the removal of $\u_i$ from $L_{k,1}$ results in a new list $L_{k,2}$ which still spans $V$. Notice that $L_{k,2}$ is a list of length $m$. Next, we replace the index of every $\u_j$ for which $j>i$ with $j-1$, so that we can write

$$L_{k,2}=(\v_k,\v_{k-1},\ldots,\v_1,\u_1,\u_2,\ldots,\u_{m-k}).$$

At the end of step $n$, we have used up every $\v_i$ and so the algorithm terminates. It cannot terminate before this because each list $K_{k,1}$ is linearly dependent, and so $m-n\ge 0$. We conclude that $m\ge n$, as desired.

Algorithmic proofs like the above are common in linear algebra. Linear algebra has applications in almost every field, partially because it lends itself so nicely to computation.

I will end here for now, and next time I will talk about bases for vector spaces.