July 3, 2021

Inner Products

I started writing a post on Riemannian metrics on smooth manifolds, only to realize that I had not talked about inner products here yet. Rather than trying to cram all the relevant knowledge into the introductory section of another post (which is what I initially attempted), I decided that the topic was deserving of its own post. So here we are.

An inner product is an additional bit of structure which we can impose on a vector space which allows the definition of things like the "magnitude" (or "norm") of vectors, as well as the "angle" between two vectors.

Inner products are generally motivated by the dot product in Euclidean geometry, so that is where we shall start. Suppose we have two vectors, $\vec{x}=\sum_{i=1}^n x^i e_i$ and $\vec{y}=\sum_{i=1}^n y^i e_i$ in $\R^n$, where $(e_1,\ldots,e_n)$ is the standard basis. Traditionally, we think of such vectors as arrows sitting in space and possessing a magnitude and direction. The dot product of $\vec{x}$ and $\vec{y}$ is defined to be the real number

$$
\vec{x}\cdot\vec{y} = \sum_{i=1}^n x^i y^i.
$$

That is, it is simply the sum of products of corresponding components. The norm (or magnitude) of the vector $\vec{x}$ is given by

$$
\abs{\vec{x}} = \sqrt{\sum_{i=1}^n (x^i)^2}.
$$

Here, of course, $i$ denotes an upper index but $(x^i)^2$ denotes the square of the $i$^th component of $\vec{x}$.

Note, however, that

$$
\abs {\vec{x}} = \sqrt{\vec{x}\cdot\vec{x}}.
$$

That is, we can recover the norm of a vector from the dot product by simply taking the square root of the dot product of that vector with itself!

Of course, for any vector $\vec{x}\in\R^n$, it is true that

$$
\vec{x}\cdot\vec{x} \ge 0.
$$

It is also always true that, for any two vectors $\vec{x}$ and $\vec{y}$ in $\R^n$,

$$
\vec{x}\cdot\vec{y} = \vec{y}\cdot\vec{x}.
$$

The above two results are seen easily enough from the definition of the dot product.

Furthermore, as I mentioned above, it is possible to establish a relationship between the dot product of two vectors and the angle between them.

Theorem. If $\vec{x}=\sum_{i=1}^n x^i e_i$ and $\vec{y}=\sum_{i=1}^n y^i e_i$ are vectors in $\R^n$, then

$$\vec{x}\cdot\vec{y} = \abs{\vec{x}}\abs{\vec{y}}\cos\theta.$$

where $\theta$ is the angle between the vectors $\vec{x}$ and $\vec{y}$.

Proof. The key realization here is that the vectors $\vec{x}$, $\vec{y}$ and $\vec{x} - \vec{y}$ form the legs of a triangle in some plane, and thus we may freely use the familiar tools of trigonometry — namely, the law of cosines. (Of course, the vector $\vec{x} - \vec{y}$ is technically based at the origin, as are all vectors in $\R^n$, but it is useful to visualize it as being translated to form the third leg of the triangle.)

We will proceed by expanding $\abs {\vec{x}-\vec{y}}$ in two different ways, after which the result will be obtained by equating the two expressions.

First, observe that the law of cosines immediately yields
$$\abs{\vec{x} - \vec{y}} = \abs{\vec{x}}^2 + \abs{\vec{y}}^2 - 2\abs{\vec{x}}\abs{\vec{y}}\cos\theta.$$

On the other hand, using the definition of the norm and the dot product, we see that
$$\begin{align}
\abs{\vec{x} - \vec{y}} &= (\vec{x} - \vec{y}) \cdot (\vec{x} -\vec{y}) \\
&= \left( \sum_{i=1}^n (x^i - y^i)e_i \right) \cdot \left( \sum_{i=1}^n (x^i - y^i)e_i \right) \\
&= \sum_{i=1}^n (x^i - y^i)^2 \\
&= \sum_{i=1}^n \left( (x^i)^2 + (y^i)^2 - 2x^iy^i\right) \\
&= \sum_{i=1}^n (x^i)^2 + \sum_{i=1}^n (y^i)^2 - 2\sum_{i=1}^n x^iy^i \\
&= \abs{\vec{x}}^2 + \abs{\vec{y}}^2 - 2\vec{x}\cdot\vec{y}.
\end{align}$$

Equating the two expressions we've found for $\abs{\vec{x} - \vec{y}}$ gives

$$\abs{\vec{x}}^2 + \abs{\vec{y}}^2 - 2\abs{\vec{x}}\abs{\vec{y}}\cos\theta = \abs{\vec{x}}^2 + \abs{\vec{y}}^2 - 2\vec{x}\cdot\vec{y},$$

from which the result follows from some basic algebraic manipulation.

Of course, this means that we can recover the angle between two vectors from their dot product by a simple rearrangement of the above formula:

$$
\theta = \arccos\left(\frac{\vec{x}\cdot\vec{y}}{\abs{\vec{x}}\abs{\vec{y}}}\right).
$$

Note of course that if two vectors are at a right angle to each other, their dot product is always zero because $\cos\frac{\pi}{2}=0$. We can flip this on its head by defining two vectors to be orthogonal if their dot product is zero. Similarly, if two vectors are colinear then their dot product is simply the product of their norms because $\cos 0 = 1$.

This is all to say that, as long as we are equipped with the dot product, we can determine magnitudes of vectors and angles between them. This is our motivation for generalizing the dot product to the concept of an inner product on arbitrary vector spaces. Doing so will allow us to define magnitude and angle in a manner precisely analogous to the above.

Here's one last important property of the dot product:

Theorem. The dot product in $\R^n$ is a bilinear map.

Proof. We need show that the dot product is homogeneous and additive in each argument. Choose any vectors $\vec{x}=\sum_{i=1}^n x^i e_i$, $\vec{y}=\sum_{i=1}^n y^i e_i$ and $\vec{z}=\sum_{i=1}^n z^i e_i$ in $\R^n$.

To see that it is homogeneous, note that, for any real number $k$,

$$\begin{align}
k(\vec{x}\cdot\vec{y}) &= k\sum_{i=1}^n x^iy^i \\
&= \sum_{i=1}^n kx^iy^i \\[.7em]
&= (k\vec{x})\cdot\vec{y} \\[1em]
&= \vec{x}\cdot(k\vec{y}).
\end{align}$$

To see that it is additive in the first argument,

$$\begin{align}
(\vec{x}+\vec{y}) \cdot \vec{z} &= \sum_{i=1}^n (\vec{x}+\vec{y})^i z^i \\
&= \sum_{i=1}^n (x^i+y^i) z^i \\
&= \sum_{i=1}^n (x^i z^i + y^i z^i) \\
&= \sum_{i=1}^n x^i z^i + \sum_{i=1}^n y^i z^i\\
&= \vec{x}\cdot\vec{z} + \vec{y}\cdot\vec{z}.
\end{align}$$

Additivity in the second argument follows from an analogous computation.

I could of course go on and on about the dot product. However, for our purposes it serves merely as motivation for the definition of the more general concept of an inner product on a real vector space.

Definition. An inner product on a vector space $V$ over the field $\R$ is a bilinear map $g:V\times V\to\R$ satisfying two additional properties:

Symmetry
$g(u, v) = g(v, u)$ for all vectors $u$ and $v$ in $V$.

Positive Definiteness
$g(v, v) \ge 0$ for any vector $v$ in $V$.

A vector space with an inner product is sometimes called an inner product space, and can be written as the pair $(V, g)$.

It is possible to define inner products on complex vector spaces as well, but doing so imposes an additional requirement and is not necessary for the purposes of this blog.

Immediately we see that the dot product in Euclidean space is an example of an inner product. Furthermore, it should be evident that we can define inner products on any finite-dimensional vector space over the field $\R$, since each such vector space is isomorphic to Euclidean space, and we can use this isomorphism to transform the regular dot product into an inner product on that vector space. That is, since there exists an isomorphism $T:V\to\R^n$, where $n=\dim V$, we can define an inner product $g$ on $V$ so that $g(u, v) = T(u) \cdot T(v)$, and all the axioms of an inner product follow easily for $g$.

In any inner product space, we can immediately define a few important concepts, just as we did above for the dot product.

Definition. Given an inner product space $(V, g)$, we make the following definitions:

The norm of a vector $v\in V$ is given by $$\abs {v}=\sqrt{g(v, v)}.$$

The angle between two nonzero vectors $u,v\in V$ is given by $$\theta = \arccos\left(\frac{g(u, v)}{\abs {u}\abs {v}}\right).$$

Two vectors are orthogonal if their inner product is $0$. This occurs either when the angle between them is $\frac{\pi}{2}$ or at least one of them is the zero vector.

Since an inner product $g$ is bilinear (that is, a rank $(0, 2)$ tensor), if $(e_1,\ldots, e_n)$ is any basis for $V$ and $(e^1,\ldots,e^n)$ is the corresponding dual basis for $V^*$ then we may write any inner product $g$ as a linear combination of tensor products

$$g=\sum_{i=1}^n \sum_{j=1}^n g_{ij} e^i \otimes e^j,$$

for some collection of real numbers $(g_{ij})_{i,j=1}^n$. These scalars may of course be computed by

$$g_{ij} = g(e_i, e_j).$$

(If you don't believe me, try plugging any two basis vectors into the tensor product expression above and using the definitions of the tensor product and dual basis.)

That is, as is usually the case in linear algebra, if we know how an inner product $g$ acts on all possible pairs of basis vectors, then we can use bilinearity to extrapolate how it will act on any arbitrary pair of vectors.

We can in fact make some additional assertions about the scalars $(g_{ij})$. Symmetry of the inner product demands that $g_{ij} = g_{ji}$ for all $i,j\in{1,\ldots, n}$, whereas positive definiteness requires that $g_{ii}\ge 0$ for all $i\in{1,\ldots,n}$. If we were to write out these scalars in the form of a matrix, we would see that these requirements translate to the matrix being symmetrical around the main diagonal, with all elements along the diagonal being nonnegative. Interestingly, this also means that we are free to define an inner product by choosing a basis for $V$ and some collection of scalars satisfying these properties, if we so desire.

Now, there are many many interesting properties of inner products. So many, in fact, that this post will barely scratch the surface. To start with, there is a generalized Pythagorean Theorem which holds for any inner product space.

Pythagorean Theorem. If $(V, g)$ is an inner product space and $u$ and $v$ are orthogonal vectors in $V$, then

$$\abs{u-v}^2 = \abs{u}^2 + \abs{v}^2.$$

Proof. Since $u$ and $v$ are orthogonal, by definition we have that $g(u, v) = 0$. Thus, using the definition of the norm and the bilinearity and symmetry of $g$,

$$\begin{align}
\abs{u-v}^2 &= g(u-v, u-v) \\
&= g(u, u-v) -g(v, u-v) \\
&= g(u, u) - g(u, v) - g(v, u) + g(v, v) \\
&= g(u, u) + g(v, v) - 2g(u, v) \\
&= g(u, u) + g(v, v) \\
&= \abs{u}^2 + \abs{v}^2.
\end{align}$$

I am sure the above theorem has many proofs, as does the following theorem. In fact, I think I have seen more proofs of the Cauchy-Schwarz Inequality than I have of any other result in all of mathematics (besides maybe the Fundamental Theorem of Algebra). Here is one.

Cauchy-Schwarz Inequality. If $(V,g)$ is an inner product space and $u$ and $v$ are any vectors in $V$, then
$$
\abs{g(u, v)} \le \abs{u} \abs{v}.
$$
Proof. If either $u$ or $v$ is the zero vector then $g(u, v) = 0$ and either $\abs{u}=0$ or $\abs{v} = 0$ (or both), and so the inequality holds and is strict. It is also strict in the case where $u$ and $v$ are colinear, as can be seen by the formula for the angle between two vectors (of course in this case the angle would be 0).

Suppose then that $u$ and $v$ are nonzero. Then we may define a new vector

$$w = \frac{u}{\abs{u}} - \frac{v}{\abs{v}}.$$

Of course, since $g$ is an inner product it is positive definite, meaning that $g(w, w) \ge 0$. Thus,

$$\begin{align}
g(w, w) &= g \left( \frac{u}{\abs{u}} - \frac{v}{\abs{v}}, \frac{u}{\abs{u}} - \frac{v}{\abs{v}} \right) \\
&= g \left( \frac{u}{\abs{u}}, \frac{u}{\abs{u}} \right) + g \left( \frac{v}{\abs{v}}, \frac{v}{\abs{v}} \right) - 2g \left( \frac{u}{\abs{u}}, \frac{v}{\abs{v}} \right) \\
&= \frac{1}{\abs{u}^2} g(u, u) + \frac{1}{\abs{v}^2} g(v, v) - \frac{2}{\abs{u}\abs{v}} g(u, v) \\
&= \frac{\abs{u}^2}{\abs{u}^2} + \frac{\abs{v}^2}{\abs{v}^2} - \frac{2}{\abs{u}\abs{v}} g(u, v) \\
&= 2 - \frac{2}{\abs{u}\abs{v}} g(u, v) \\
&\ge 0.
\end{align}$$

The result follows from the above inequality via division by two and a simple rearrangement of terms.

We are nearing the end of the results we need, although I could go on and on if time allowed. I'll conclude by mentioning that $\frac{u}{\abs{u}}$ will always be a vector whose norm is $1$ and which is colinear with the vector $u$. We sometimes refer to this act of dividing a vector by its norm as "normalizing" the vector. We call a normalized vector (a vector whose norm is one) a unit vector, and a collection of orthogonal unit vectors is called orthonormal. As we will see, it is often useful to construct an orthonormal basis for an inner product space, and this can always be done (for finite-dimensional inner product spaces at least) using a process called the Gram-Schmidt orthonormalization algorithm. Having an orthonormal basis is often desirable because it makes computations considerably simpler, but it is rarely necessary. I may detail this algorithm at some point in the future, but for now I will instead go to sleep. Nighty night.