Tag Archives: glowna

7. Systems of linear differential equations and equations of higher order

Part 1.: Problems, solutions.
Part 2.: Problems.

Systems of differential linear equations

A system of differential linear equations is a system of the following form

    \[\begin{cases}x_1'(t)=a_{1,1}x_1+a_{1,2}x_2+\ldots+ a_{1,n}x_n+f_1(t)\\ x_1'(t)=a_{1,1}x_1+a_{1,2}x_2+\ldots+ a_{1,n}x_n+f_1(t)\\ x_2'(t)=a_{2,1}x_1+a_{2,2}x_2+\ldots+     a_{2,n}x_n+f_2(t)\\ \ldots \\ x_n'(t)=a_{n,1}x_1+a_{n,2}x_2+\ldots+ a_{n,n}x_n+f_n(t)\\ \end{cases}\]

Additionally we may have boundary conditions: x_1(t_0)=b_1,x_2(t_0)=b_2,\ldots, x_n(t_0)=b_n. A solution is a vector of functions x_1(t),\ldots, x_n(t), which satisfies the system (if there are no boundary conditions given one should find all such functions.)
and boundary conditions (then we shall get one such vector).

Such a system can be written in a matrix form as:

    \[\left[\begin{array}{c}x_1'\\x_2'\\\ldots\\ x_n'\end{array}\right]=\left[\begin{array}{cccc}a_{1,1}& a_{1,2}&\ldots & a_{1,n}\\a_{2,1}& a_{2,2}&\ldots & a_{2,n}\\&&\ldots&\\a_{n,1}&     a_{n,2}&\ldots & a_{n,n}\end{array}\right]\cdot \left[\begin{array}{c}x_1\\x_2\\\ldots\\ x_n\end{array}\right]+ \left[\begin{array}{c}f_1\\f_2\\\ldots\\ f_n\end{array}\right] \]

If f_1(t)=f_2(t)=\ldots = f_n(t)=0, the system is called homogeneous.

    \[\begin{cases}x'(t)=2x(t)+3y(t)+t\\ y'(t)=4x(t)+3y(t)+1-t\end{cases}\]

is thus an example of a system of differential linear equations and

    \[\begin{cases}x'(t)=2x(t)+3y(t)\\ y'(t)=4x(t)+3y(t)\end{cases}\]

is homogeneous. x(0)=1,y(0)=-1 are exemplary boundary conditions for this system.

Eigenvectors and solutions to a homogeneous system of differential linear equations

Notice that if (a_1,\ldots, a_n) is an eigenvector for an eigenvalue \lambda of matrix A of a homogeneous system of differential linear equations, then a_1e^{\lambda t}, \ldots, a_ne^{\lambda t} is its solution. Indeed,

    \[\left[\begin{array}{c}(a_1e^{\lambda     t})'\\(a_2e^{\lambda t})'\\\ldots\\ (a_ne^{\lambda t})'\end{array}\right]=\lambda \left[\begin{array}{c}a_1e^{\lambda t}\\a_2e^{\lambda t}\\\ldots\\ a_ne^{\lambda t}\end{array}\right]\]

by the properties of derivative, but also by the definition of an eigenvector,

    \[\left[\begin{array}{cccc}a_{1,1}& a_{1,2}&\ldots & a_{1,n}\\a_{2,1}& a_{2,2}&\ldots & a_{2,n}\\&&\ldots&\\a_{n,1}& a_{n,2}&\ldots & a_{n,n}\end{array}\right]\cdot \left[\begin{array}{c}a_1e^{\lambda     t}\\a_2e^{\lambda t}\\\ldots\\ a_ne^{\lambda t}\end{array}\right]=\left[\begin{array}{cccc}a_{1,1}& a_{1,2}&\ldots & a_{1,n}\\a_{2,1}& a_{2,2}&\ldots & a_{2,n}\\&&\ldots&\\a_{n,1}& a_{n,2}&\ldots & a_{n,n}\end{array}\right]\cdot     \left[\begin{array}{c}a_1\\a_2\\\ldots\\ a_n\end{array}\right]\cdot e^{\lambda t}=\]

    \[=\lambda \left[\begin{array}{c}a_1e^{\lambda t}\\a_2e^{\lambda t}\\\ldots\\ a_ne^{\lambda t}\end{array}\right].\]

This means, that is matrix A can be diagonalized (i.e. there exists a basis consisting of eigenvectors) then every solution of such system is a combination of basic solutions obtained this way.

Solving homogeneous system of differential linear equations — an example

Let us solve

    \[\begin{cases}x'(t)=2x(t)+3y(t)\\ y'(t)=4x(t)+3y(t)\end{cases}.\]

We are looking for eigenvalues of matrix


Its characteristic polynomial is (\lambda-2)(\lambda-3)-12=\lambda^2-5\lambda-6=(\lambda+1)(\lambda-6). So we get eigenvalues -1 and 6.

First we calculate a basis consisting of eigenvectors. For -1 the eigenspace is described by 3x+3y=0, so it is spanned by (-1,1). For 6 it is described by -4x+3y=0, and thus spanned by (3,4).

Thus we get basic solutions: (-e^{-t},e^{-t}) i (3e^{6t},4e^{6t}), and each solution is their combination:

    \[C_1(-e^{-t},e^{-t})+C_2(3e^{6t},4e^{6t})=(-C_1e^{-t}+3C_2 e^{6t}, C_1e^{-t}+4C_2 e^{6t}),\]


    \[\begin{cases} x(t)=-C_1e^{-t}+3C_2     e^{6t}\\ y(t)=C_1e^{-t}+4C_2 e^{6t} \end{cases}\]

Complex eigenvalues

The non-real eigenvalues of a real matrix go in pairs \lambda, \bar{\lambda}. Their eigenvectors are also conjunct. This is the reason why in the general solution (after changing e^{a+ib} to e^a(\cos b+i\sin b)) it is possible to extract C_1+C_2 and i(C_1-C_2), where C_1,C_2 are the constants of combination of basic solutions related to \lambda and \bar{\lambda} respectively. Since we are looking for real solution we can choose C_1, C_2\in \mathbb{C},
in such a way that C_1+C_2\in\mathbb{R} and i(C_1-C_2)\in \mathbb{R} (their imaginary parts are negative of each other, and they have the same real parts). Thus, we can introduce real constants A=C_1+C_2 and B=i(C_1-C_2).

Equivalently, instead of basic solutions ve^{\lambda t} and \bar{v}e^{\bar{\lambda}t} one can take \Re(v e^{\lambda t}) and \Im(v e^{\lambda t}).

Non-homogeneous systems of differential linear equations

The idea is similar to the idea of solving single non-homogeneous equation. We first solve the homogeneous version of the system and then change the constants to functions.

E.g. to solve

    \[\begin{cases}x'(t)=2x(t)+3y(t)+t\\ y'(t)=4x(t)+3y(t)+1\end{cases}\]

first we solve

    \[\begin{cases}x'(t)=2x(t)+3y(t)\\ y'(t)=4x(t)+3y(t)\end{cases},\]

which gives the solution

    \[(-C_1e^{-t}+3C_2 e^{6t},     C_1e^{-t}+4C_2 e^{6t})\]

(see the previous section).

Changing the constants to functions we get:

    \[(-C_1(t)e^{-t}+3C_2(t) e^{6t}, C_1(t)e^{-t}+4C_2(t) e^{6t})\]

so the derivatives are:

    \[(-C'_1(t)e^{-t}+3C'_2(t) e^{6t}+C_1(t)e^{-t}+18C_2(t) e^{6t}, C'_1(t)e^{-t}+4C'_2(t) e^{6t}-C_1(t)e^{-t}+24C_2(t) e^{6t}),\]

by substitution we get to:

    \[\begin{cases}-C'_1(t)e^{-t}+3C'_2(t) e^{6t}+C_1(t)e^{-t}+18C_2(t) e^{6t}=2(-C_1e^{-t}+3C_2 e^{6t})+3(C_1e^{-t}+4C_2 e^{6t})+t\\ C'_1(t)e^{-t}+4C'_2(t) e^{6t}-C_1(t)e^{-t}+24C_2(t) e^{6t}=4(-C_1(t)e^{-t}+3C_2(t) e^{6t})+3(C_1(t)e^{-t}+4C_2(t) e^{6t})+1-t\end{cases},\]

which gives the following system of equations C'_1, C'_2:

    \[\begin{cases}-C'_1(t)e^{-t}+3C'_2(t) e^{6t}=t\\ C'_1(t)e^{-t}+4C'_2(t) e^{6t}=1-t\end{cases},\]

so the matrix of this system is

    \[\left[\begin{array}{cc|c}-e^{-t}&3e^{6t}&t\\e^{-t}&4e^{6t}&1-t\end{array}\right]\underrightarrow{w_2+w_1} \left[\begin{array}{cc|c}-e^{-t}&3e^{6t}&t\\0&7e^{6t}&1\end{array}\right]\underrightarrow{w_1\cdot (-e^{t}), w_2\cdot \frac{1}{7}e^{-6t}}\]

    \[\left[\begin{array}{cc|c}1&-3e^{7t}&-te^{t}\\0&1&\frac{1}{7}e^{-6t}\end{array}\right]\underrightarrow{w_1+3e^{7t}w_2} \left[\begin{array}{cc|c}1&0&-te^{t}+\frac{3}{7}e^{t}\\0&1&\frac{1}{7}e^{-6t}\end{array}\right]\]


    \[\begin{cases} C'_1(t)=-te^{t}+\frac{3}{7}e^{t}\\ C'_2(t)=\frac{1}{7}e^{-6t}\end{cases}.\]

Całkując dostajemy

    \[\begin{cases} C_1(t)=\frac{e^t}{t}\left(10-7t\right)+D_1\\ C_2(t)=-\frac{1}{42}e^{-6t}+D_2\end{cases}.\]

So finally:

    \[\begin{cases} x(t)=-\left(\frac{e^t}{t}\left(10-7t\right)+D_1\right)e^{-t}+3\left(-\frac{1}{42}e^{-6t}+D_2\right) e^{6t}\\ y(t)=\left(\frac{e^t}{t}\left(10-7t\right)+D_1\right)e^{-t}+4\left(-\frac{1}{42}e^{-6t}+D_2\right) e^{6t} \end{cases}.\]

What if the matrix is not diagonalizable?

Our considerations so far assume that matrix A of the system is diagonalizable (at least in the complex numbers). What if it is not the case? How to get the missing basic solutions?

Notice first, that if v is an eigenvector for eigenvalue \lambda, then if there exists v_1 such that A\cdot v_1= \lambda v_1+v, i.e. satisfying the system

    \[\left[\begin{array}{c|c}A-\lambda I&v\end{array}\right],\]

v_1e^{\lambda t}+vte^{\lambda t} is such that

    \[(v_1e^{\lambda t}+vte^{\lambda t})'=\lambda v_1e^{\lambda t}+ve^{\lambda t}+\lambda vte^{\lambda t}=A(v_1e^{\lambda t}+vte^{\lambda t}),\]

thus it is also a basic solution!

If we still need more basic solutions, we can find v_2 such that A\cdot v_2= \lambda v_2+v_1, i.e. a vector satisfying

    \[\left[\begin{array}{c|c}A-\lambda I&v_1\end{array}\right],\]

and then v_2e^{\lambda t}v_1te^{\lambda     t}+v\frac{t^2}{2}e^{\lambda t} is such that

    \[(v_2e^{\lambda t}v_1te^{\lambda t}+v\frac{t^2}{2}e^{\lambda t})'=\]

    \[=\lambda v_2e^{\lambda t}+v_1e^{\lambda t}+\lambda v_1te^{\lambda t}+vte^{\lambda t}+\lambda v\frac{t^2}{2}e^{\lambda t}=A(v_2e^{\lambda     t}v_1te^{\lambda t}+v\frac{t^2}{2}e^{\lambda t}),\]

thus it is also a basic solution! An so on!

It can be proved that if \lambda is a k-fold eigenvalue with only one dimensional eigenspace with basis v, then there exist vectors v_1,v_2, \ldots, v_{k-1} described above and giving the missing basic solutions. These vectors v,     v_1,\ldots, v_{k-1} are called Jordan’s vectors or the series of generalized eigenvectors.

E.g. let

    \[\begin{cases} x'=y\\ y'=z\\ z'=8x-12y+6z \end{cases}.\]

We are looking for eigenvalues of


and there is only one: 2. Solving the system of equations


we get one-dimensional eigenspace spanned by (1,2,4). Thus we find any Jordan’s vector satisfying


e.g. (1,3,8). We need one more:


e.g. (1,3,9). Thus we get basic solutions

    \[(e^{2t}, 2e^{2t},     4e^{2t})\]

    \[(e^{2t}+te^{2t}, 3e^{2t}+2te^{2t}, 8e^{2t}+4te^{2t})\]


    \[(e^{2t}+te^{2t}+\frac{t^2}{2}e^{2t}, 3e^{2t}+3te^{2t}+t^2e^{2t}, 9e^{2t}+8te^{2t}+2t^2e^{2t}).\]

Linear differential equations of higher order

It is a linear equation with derivatives of higher order, e.g.: x^{(3)}-6x''+12x'-8x=0.

Every such equation can be transformed into a system of equations of first order by stipulating y=x', z=x'', etc. Then:

    \[\begin{cases} x'=y\\ y'=z\\ z'=8x-12y+6z \end{cases}.\]

This is the system solved in the previous section. We get a solution for x from it by looking at the first coordinate, so the general solution of the considered equation is


14. Polynomials and hypersurfaces

Part 1.: Problems, solutions.
Part 2.: Problems.

Polynomials and polynomial functions

A polynomial of degree n over field K with variables x_1,\ldots, x_k is an expression of form

    \[\sum_{1\leq i_1\leq \ldots\leq i_j\leq k, 0\leq j\leq n} a_{i_1,\ldots, i_j}x_{i_1}\cdot\ldots\cdot x_{i_j},\]

a_{i_1,\ldots, i_j}\in K with at least one a_{i_1,\ldots, i_j} for j=k not equal to zero, and the space of all polynomials over K with variables x_1,\ldots, x_k is denoted by K[x_1,\ldots, x_k]. E.g. x_1^2x_2+3x^3-x_1x_3+5 is a polynomial of degree 3 in K[x_1,\ldots, x_k].

Given an affine k-dimensional space H and a basic system p_0;v_1,\ldots, v_k, a polynomial p\in K[x_1,\ldots, x_k] defines a function, called a polynomial function, p\colon H\to K given as

    \[p(p_0+a_1v_1+\ldots+a_kv_k)=p(a_1,\ldots, a_k)\]

which obviously abuses the notation, since p denotes the polynomial itself and the polynomial function. But for infinite fields it is not a problem,, because we have a bijection between these two sets.

Notice also, that if f\colon H\to K can be written as a polynomial function in a basic system, it can written in this form in any basic system. Moreover, the related polynomial is of the same degree.

Hypersurfaces and algebraic sets

X\subseteq P is an algebraic set, if

    \[X=\{p\in H\colon f_1(p)=0\land\ldots\land f_m(p)=0\},\]

where f_1,\ldots, f_m are polynomial functions. It is a hypersurface, if

    \[X=\{p\in H\colon f(p)=0\},\]

where f is a polynomial funkction.

It is easy to notice that over \mathbb{R} it it the same thing, since we can take f=f_1^2+\ldots+f_m^2.

Equivalence relation on polynomial functions and hypersurfaces

Two polynomial functions f,g\colon H\to K are equivalent, if there exist basic systems p;\mathcal{A} and q;\mathcal{B}, such that f in p;\mathcal{A} is described by the same polynomial as g in q;\mathcal{B}.

Two hypersurfaces are equivalent, if the functions describing them are equivalent. It is so if and only if the second hypersurface is an image of the first one under an affine isomorphism on H.

Canonical form of a polynomial function: hypersurfaces of second degree

Fix a hypersurface of second degree, i.e. described by a polynomial function of second degree. We want to find an equivalent polynomial function of a simplest possible form, i.e. in a canonical form. In other words, we will be looking for a basic system in which the equation describing the hypersurface is as simple as possible.

Every polynomial function of second degree is a sum of a quadratic form and a an affine function, i.e. e.g. if f((x,y))=2x^2+4xy+y^2+4x+4y+4 then f((x,y))=q(x,y)+\varphi(x,y), where q(x,y)=2x^2+4xy+y^2 and \varphi(x,y)=4x+4y+4.

For every polynomial function of second degree (thus every equation describing a hypersurface of second degree) it is possible to find a basic system in which the function takes one of the following forms:


a_1,\ldots, a_r\neq 0, or


r<n, a_1,\ldots, a_r\neq 0, where r is the rank of q.

How to transform the function to such a form? First we have to diagonalize the form q and describe the function \varphi in the new basis. It changes the basis but not the origin of the basic system. Now, for each variable x appearing with a non-zero coefficient b in \varphi for which we have x^2 with a non-zero coefficient a, we can introduce a new variable x'=(x+b/2a), because then ax'^2=ax^2+bx+c. This changes the constant and the origin from p to p-bv/2a, where v is the basic vector related to the variable x. If there are no other variables in \varphi, we have already reached the first form. If there are other variables (for which there is no x^2 in q), then we make from them and the constant coefficient a new variable x_n', which obviously change both the basis and the origin of the basic system.


where x'=x+y, y'=y, so x=x'-y', and for the basic system we get


And so

    \[\ldots= 2x'^2-y'^2+4x'-4y'+4y'+4=2x'^2-y'^2+4x'+4=2(x'+1)^2-y'^2+2=2x''^2-y''^2+2,\]

where x''=x'+1 and y''=y', so the final basic system is (-1,0);(1,0),(-1,1).

In the second case, let f(x,y,z)=x^2-2xy+y^2+y+2z+1, and then:


where x'=x+y, y'=y, z'=z, and the basic system is (0,0,0);(1,0,0),(-1,1,0),(0,0,1).

    \[\ldots = x''^2+z'',\]

where x''=x', y''= y' and z''= y'+2z'+1, and since


so the final basic system is: (0,0,-1/2);(1,0,0),(-1,1,-1/2),(0,0,1/2).

Canonical form of a polynomial function of second degree: hypersurfaces over \mathbb{R} or \mathbb{C}

Notice that additionally an equation can be always divided by a free variable (if it is non-zero), changing it into 1 and if we are over \mathbb{R}, for every expression ax^2 the basic vector related to x can be divided by \sqrt{|a|} which changes this expression to \pm x^2 (and over \mathbb{C} even by \sqrt{a} changing this expression to x^2).

Thus for every equation of second degree over \mathbb{R}, there is a basic system in which the equation takes one of the following forms:

    \[\pm x_1^2\pm\ldots\pm x_r^2+1=0,\]

1\leq r\leq n, lub

    \[x_1^2\pm\ldots\pm x_r^2=0,\]

2\leq r\leq n-1, lub

    \[\pm x_1^2\pm \ldots\pm x_r^2+x_n=0,\]

1\leq r<n.

Thus for every equation of second degree over \mathbb{C}, there is a basic system in which the equation takes one of the following forms:


1\leq r\leq n, lub


2\leq r\leq n-1, lub


1\leq r<n.

E.g. reformulating further on the equation 2x''^2-y''^2+2=0 is basic system (-1,0);(1,0),(-1,1), we see that it is equivalent to x''^2-\frac{1}{2}y''^2+1=0, i.e. with equation x'''^2-y'''^2+1=0, which is in basic system (-1,0);(1,0),(-1,1)/\sqrt{2} (because y'''=\sqrt{2}y''). Meanwhile, over \mathbb{C} it is equivalent to x'''^2+y'''^2+1=0, in basic system (-1,0);(1,0),(i,-i)/\sqrt{2}, (because this time y'''=\sqrt{2}iy'').

Centres of symmetry

A point p is a centre of symmetry of a hypersurface described by equation f(x)=0, if for every v\in T(H) we have f(p+v)=0 if and only if f(p-v)=0. It is easy to prove that p is a centre of symmetry if and only if it is a critical point of the function f, i.e. its partial derivatives are zero at these points.

Thus, if we consider the canonical forms of the equations, we see that a hypersurface described by


has a centre of symmetry and it is in it if and only if c=0. On the other hand, a hypersurface described by


has no centre of symmetry.

Affine types of hypersurfaces of second degree over \mathbb{R}

The above means that the type of equation which describes a given hyperspace is a constant. I.e. if in a basic system it takes one of the forms

    \[\pm x_1^2\pm\ldots\pm x_r^2+1=0,\]

1\leq r\leq n, or

    \[x_1^2\pm\ldots\pm x_r^2=0,\]

2\leq r\leq n-1, or

    \[\pm x_1^2\pm \ldots\pm x_r^2+x_n=0,\]

1\leq r<n, in any other basic system it cannot take any of the other forms.

Moreover, if s is the number of variables with coefficient +1 in the above equation, then if the equation is

    \[\pm x_1^2\pm\ldots\pm x_r^2+1=0,\]

then s is the same for every basic system in which this equation is in the canonical form.
If the equation is of form

    \[x_1^2\pm\ldots\pm x_r^2=0,\]

2\leq r\leq n-1, or

    \[\pm x_1^2\pm \ldots\pm x_r^2+x_n=0,\]

1\leq r<n, then we also get such a result but up to multiplication by -1, i.e. in every basic system in which the equation is in the canonical form, we get s or r-s variables with coefficient +1.

These possibilities are called the affine types of hypersurfaces.

We shall say that the hypersurface is proper, if it is not included in any hyperspace.

Affine types of proper curves of second degree in \mathbb{R}^2

Thus we have the following affine types of proper curves of second degree in \mathbb{R}^2:


two parallel lines






a pair of intersecting lines



Affine types of proper surfaces of second degree in \mathbb{R}^3

Thus we have the following affine types of proper surfaces of second degree in \mathbb{R}^3 (images by Wikipedia):


a pair of parallel planes


hyperbolic cylinder


elliptic cylinder


hyperboloid of two sheets


hyperboloid of one sheet




pair of intersecting planes


elliptic cone


parabolic cylinder


elliptic paraboloid


hyperbolic paraboloid

6. Linear algebra, continued

Part 1.: Problems, solutions.
Part 2.: Problems, solutions.
Part 3.: Problems, solutions.
Part 4.: Problems, solutions.
Part 5.: Problems, solutions.


If you have tried to imagine a linear map of a plane, you usually imagine that it stretches or squeezes the plane along some directions. It would be nice to know whether a given linear map \varphi\colon V\to V (such a map, which the same domain and range, is called an endomorphism), it is actually of this type. In other words we would like to know whether there exists a non-zero vector v and a scalar \lambda, such that \varphi simply multiples v by \lambda (so it stretches or squeezes the space in the direction of v), so:

    \[\varphi(v)=\lambda v.\]

If v and \lambda have such properties, then v is said to be an eigenvector of \varphi and \lambda an eigenvalue.

Determining eigenvalues

Notice that if \lambda is an eigenvalue of a map \varphi and v is its eigenvector, then \varphi(v)-\lambda v=0. Therefore if M is a matrix of \varphi (in standard basis), then

    \[0=Mv-\lambda v=Mv-\lambda I v=(M-\lambda I)v\]

, where I is the identity matrix.

Since multiplication of a matrix by a vector gives a linear combination of its columns and v is a non-zero vector, we see that the columns of M-\lambda I can be non-trivially combined to get the zero vector! It is possible if and only if \det (M-\lambda I)=0.

How to find the eigenvalues of a map? Simply one needs to solve the equation \det (M-\lambda I)=0. E.g let \varphi (x,y,z)=(2x,x+y,-x+z). Then:



    \[M(\varphi)_{st}^{st}-\lambda I=\left[\begin{array}{ccc}2-\lambda&0&0\\1&1-\lambda&0\\-1&0&1-\lambda\end{array}\right].\]

So we have to solve the following:

    \[\det (M(\varphi)_{st}^{st}-\lambda I)=(2-\lambda)(1-\lambda)^2=0\]

And therefore the eigenvalues are: 2 and 1.


Now let’s find eigenvectors related to subsequent eigenvalues. Notice that since \varphi is a linear map, if v, v' are eigenvectors for an eigenvalue \lambda, then for any scalar a also av and v+v' are eigenvectors for \lambda. Therefore, the set of all eigenvectors for \lambda forms a linear subspace. Notice that v satisfies the equation

    \[(M-\lambda I)v=0\]

so the space of eigenvectors (i.e. eigenspace) for \lambda (denoted as V_{(\lambda)}) is given by the following system of equations:

    \[(M-\lambda I)v=0\]

and we can easily find its basis.

In our example, let us find a basis of V_{(1)}, so let \lambda=1. Then:

    \[M(\varphi)_{st}^{st}-\lambda I=\left[\begin{array}{ccc}1&0&0\\1&0&0\\-1&0&0\end{array}\right]\]

Therefore, we have the following system of equations:


The space of solutions is V_{(1)}=\{(0,y,z)\colon y,z\in\mathbb{R}\}, and its basis is (0,1,0),(0,0,1). Indeed, \varphi((0,1,0))=1\cdot (0,1,0) and \varphi((0,0,1))=1\cdot (0,0,1).

Let’s find a basis of V_{(2)}, so let \lambda=2. Then:

    \[M(\varphi)_{st}^{st}-\lambda I=\left[\begin{array}{ccc}0&0&0\\1&-1&0\\-1&0&-1\end{array}\right]\]

The system of equations:


In the reduced ,,stair-like” form:


The space of solutions is V_{(2)}=\{(-z,-z,z)\colon y,z\in\mathbb{R}\}, and its basis is (-1,-1,1). Indeed, \varphi((-1,-1,1))=(-2,-2,2)=2\cdot (-1,-1,1).

Eigenvector basis

If the sum of dimensions of spaces related to the eigenvalues of a given map equals the dimension of the whole space (as in our example: 1+2=3), then the basis of the whole space which consists of the vectors from the bases of subspaces related to the eigenvalues is called an eigenvector basis (in our case: \mathcal{A}=\{(0,1,0),(0,0,1),(-1,-1,1)\}).

If a map has an eigenvector basis, then it can be actually described by means of squeezing and stretching in the directions of eigenvectors. Notice that the matrix of such a map in an eigenvector basis is an diagonal matrix (has non-zero elements only on its diagonal) with eigenvalues related to subsequent eigenvectors on its diagonal. In our example:


It may happen that a map has no eigenvectors (e.g. a rotation of the plane) or that the subspaces of eigenvectors are to small (e.g. a 10 degree rotation of a three-dimensional space around an axis had only one-dimensional space of eigenvectors).

Diagonalization of a matrix

A matrix M is diagonalizable, if there exists a matrix C, such that:

    \[M=C\cdot D\cdot C^{-1},\]

where D is a diagonal matrix.

How to check it and diagonalize a matrix if it is possible? Simply consider a linear map \varphi such that M is its matrix in standard basis. Matrix M is diagonalizable, if and only if \varphi has eigenvector basis \oA. Then:

    \[M=M(id)_{\mathcal{A}}^{st}\cdot D\cdot (M(id)_{\mathcal{A}}^{st})^{-1}\]

and D=M(\varphi)_{\mathcal{A}}^{\mathcal{A}}, since (M(id)_{\mathcal{A}}^{st})^{-1}=M(id)_{st}^{\mathcal{A}}.

E.g. we know that


is diagonalizable, since \varphi related to this matrix has an eigenvector basis. furthermore in this case:




Calculating diagonalizable matrices

Diagonalization of a matrix has the following interesting matrix. We can use it to calculate a power of a matrix, if it is diagonalizable. Notice that if \mathcal{A} is a basis, then:




    \[=M(id)_{\mathcal{A}}^{st}\cdot M\left(\underbrace{\varphi\circ\ldots\circ\varphi}_{n}\right)_{\mathcal{A}}^{\mathcal{A}}\cdot M(id)_{st}^{\mathcal{A}} = M(id)_{\mathcal{A}}^{st}\cdot \left(M\left(\varphi\right)_{\mathcal{A}}^{\mathcal{A}}\right)^{n}\cdot M(id)_{st}^{\mathcal{A}}\]

but if \mathcal{A} is a basis of eigenvectors, then M\left(\varphi\right)_{\mathcal{A}}^{\mathcal{A}} is a diagonal matrix so calculating its power is simply calculating powers of the elements on the diagonal.

Let us show it on an example. Let us calculate:


We have:





    \[\left[\begin{array}{ccc}2&0&0\\1&1&0\\-1&0&1\end{array}\right]^{5}=C \cdot \left[\begin{array}{ccc}1&0&0\\0&1&0\\0&0&2\end{array}\right]^{5}\cdot C^{-1}=C \cdot \left[\begin{array}{ccc}1&0&0\\0&1&0\\0&0&32\end{array}\right]^{5}\cdot C^{-1}=\left[\begin{array}{ccc}32&0&0\\31&1&0\\-31&0&1\end{array}\right].\]

Scalar product

To study angles between vectors it will be convenient to use scalar product. The scalar product of two vectors \alpha,\beta, is \left<\alpha,\beta\right>. Standard scalar product in \mathbb{R}^n is sum of products on subsequent places, so e.g.: \left<(1,2,-1),(2,0,1)\right>=1\cdot 2+2\cdot 0+(-1)\cdot 1=2+0-1=1.

Length of a vector and angles between vectors

By Pitagoras Theorem it is easy to see that \left<\alpha,\alpha\right> is the square of the length of a vector, e.g. \left<(3,4),(3,4)\right>=9+16=25=|\alpha|^2. The length of a vector \alpha, also called the norm of \alpha, will be denoted as |\alpha|. We get that:


Assume now that we are given three vectors p,q and r forming a triangle. So r=p-q. Let \theta be the angle between p i q. The law of cosines states that:



    \[\left <p-q,p-q\right>=\left <p,p\right>+\left<q,q\right>-2|p||q|\cos\theta,\]


    \[\left <p,p\right>+\left<q,q\right>-2\left <p,q\right>=\left <p,p\right>+\left<q,q\right>-2|p||q|\cos\theta,\]

So cosine of an angle between vectors is given by the following formula:

    \[\cos\theta=\frac{\left <p,q\right>}{|p||q|}.\]

One more application of scalar product is calculating perpendicular projection of a vector onto a direction given by a second vector. Let r be the perpendicular projection of p onto direction given by q. It will have the same direction as q and length |p|\cos \theta, where \theta is the angle between p and q. Therefore:

    \[r=|p|\cos \theta \cdot \frac{q}{|q|}=q\cdot \frac{\left <p,q\right>}{\left<q,q\right>},\]

because: \frac{q}{|q|} is the vector of length 1 in direction of q.

Perpendicularity, perpendicular spaces

We know that two vectors v,w are perpendicular if cosine of the angle between them equals zero. Therefore v\bot w if and only if \left<v,w\right>=0.

Notice that if we would like to find all vectors w perpendicular to v, then the above is the equation we have to solve. Moreover this is a linear uniform equation. If we would like to find the set of vectors perpendicular to all vectors from a given list, then we will get a system of uniform linear equations. So given a linear subspace V, the set V^\bot (called the orthogonal complement of V) of all vectors perpendicular to all the vectors from V is also a linear subspace! It is the space of solutions of some system of linear equations.

For example, let V=lin((1,1,0,-1),(-1,0,2,0)). A vector (x,y,z,t) is perpendicular to those vectors (and so also to every vector of V), if \left<(1,1,0,-1), (x,y,z,t)\right>=0 and \left<(-1,0,2,0), (x,y,z,t)\right>=0, in other words, if it satisfies the following system of equations:



    \[\left[\begin{array}{cccc|c}1&1&0&-1&0\\-1&0&2&0&0\end{array}\right]\underrightarrow{w_2+w_1} \left[\begin{array}{cccc|c}1&1&0&-1&0\\0&1&2&-1&0\end{array}\right]\underrightarrow{w_1-w_2} \left[\begin{array}{cccc|c}1&0&-2&0&0\\0&1&2&-1&0\end{array}\right]\]

So the general solution has the following form: (2z,-2z+t,z,t), and therefore we have the following basis of V^\bot: (2,-2,1,0),(0,1,0,1).

Notice that the coefficients in a system of equations describing given linear space are vectors which span the perpendicular space! Which is a new insight into our method of finding a system of equations for a space given by its spanning vectors.

Isometries of linear euclidean spaces

A linear mapping \varphi\colon V_1\to V_2, where \langle V_1, \langle\cdot,\cdot \rangle_1\rangle, \langle V_2, \langle\cdot,\cdot \rangle_2\rangle are linear euclidean spaces is an isometry if the following equivalent conditions hold

  • is a linear isomorphism and preserves the inner product

        \[\forall_{v,w\in V} \langle v,w\rangle_1=\langle \varphi(v),\varphi(w)\rangle_2,\]

  • is a linear isomorphism and preserves the lengths of vectors

        \[\forall_{v\in V} \|v\|_1=\|\varphi(v)\|_2,\]

Linear isometries: projection onto a linear subspace and reflection across a linear subspace

Notice that a projection onto a linear subspace V and reflection across V are linear mappings. Moreover, it is easy to see its eigenvectors:

  • since projection does not change vectors in V, they are eigenvectors with eigenvalue 1. On the other hand, vectors from V^\bot are multiplied by zero, so they are eigenvectors with eigenvalue zero.
  • since reflection does not change vectors in V, they are eigenvectors with eigenvalue 1. On the other hand, vectors from V^\bot are multiplied by -1, so they are eigenvectors with eigenvalue -1.

So basis consisting of vectors from a basis of V and of vectors from a basis of V^\bot is a basis of eigenvectors of both those maps. Which make it possible to calculate their formulas.

E.g. let V=\lin((1,0,1),(0,1,-1)). Therefore basis V^\bot is \{(-1,1,1)\}. So if \phi is the projection onto V, and \psi is the reflection across V, then (1,0,1),(0,1,-1) are eigenvectors with eigenvalue 1 of both maps. Also (-1,1,1) is an eigenvector with eigenvalue zero for \phi, and -1 for \psi. Therefor basis \mathcal{A}=\{(1,0,1),(0,1,-1),(-1,1,1)\} is a basis of eigenvectors of both maps, and::



Let us calculate their formulas. We have:












Gram’s determinant

Given a system of vectors v_1,\ldots, v_k in an euclidean space, the matrix G(v_1,\ldots , v_k)\in M_{k\times k}(\mathbb{R})=[\langle v_i,v_j\rangle]_{i,j\leq k} is called the Gram’s matrix of this system of vectors and its determinant is called Gram’s determinant W(v_1,\ldots, v_k). Immediately on can notice, that Gram’s matrix is symmetrical.

Notice in particular, that if the columns of a matrix A contain the coordinates of those vectors in a othonormal basis, then G(v_1,\ldots , v_k)=A^TA. Thus, if the number of vectors k equals the dimension of the whole space (i.e. if A is a square matrix), then W(v_1,\ldots , v_k)=(\det A)^2.

In particular, always W(v_1,\ldots , v_k)\geq 0. And it is equal to zero if and only if the system of vectors is linearly dependent.

Parallelepipeds and simplexes

Let p\in H oraz v_1,\ldots, v_k\in T(H) are linearly independent. Then the set

    \[R(p_0;v_1,\ldots, v_n)=\left\{p_0+\sum_{i=1}^k a_iv_i\colon a_i\in [0,1]\right\}\]

is called a k-parallelepiped given by this system.

Given affine independent system of points p_0,\ldots, p_k\in H, the set

    \[S(p_0,\ldots, p_k)=\left\{\sum_{i=0}^k a_ip_i\colon \sum_{i=0}^k a_i=1\land  \forall_{0\leq i\leq k} a_i\geq 0\right\}\]

is called a k-dimensional simplex with vertices in p_0,\ldots, p_k. Obviously, one dimensional simplex is a line segment, two dimensional is a triangle, and three-dimensional is a tetrahedron.

k-dimension measure of parallelepipeds and simplexes

k-dimensional measure \mu_k is a generalization of length (k=1), area (k=2) and volume (k=3).

To explain how to calculate the k-dimensional measure of a k-dimensional simplex or parallelepiped, we have to notice that if v_1,\ldots, v_k is a system of linearly independent vectors in V, v\in V and w is the projection of v onto (lin(v_1,\ldots, v_k))^\bot, then
We have w=v-w', where w'\in lin(v_1,\ldots, v_k). Let w'=\sum_{i=1}^k a_i v_i. Thus,

    \[W(v_1,\ldots, v_k,v)=W(v_1,\ldots, v_k,w+w')=\]

    \[=\left|\begin{array}{cccc}\langle v_1,v_1\rangle &\ldots &\langle v_1,v_k\rangle & \langle v_1,w'\rangle\\ \langle v_2,v_1\rangle &\ldots &\langle v_2,v_k\rangle & \langle v_2,w'\rangle\\ &\ldots&&\ldots \\ \langle v_k,v_1\rangle &\ldots &\langle v_k,v_k\rangle & \langle v_k,w'\rangle\\\langle w',v_1\rangle &\ldots &\langle w',v_k\rangle & \langle w,w\rangle+\langle w',w'\rangle\end{array}\right|,\]


    \[\langle v_i,w+w'\rangle=\langle v_i,w\rangle+\langle v_i,w'\rangle=\langle v_i,w'\rangle,\]


    \[\langle w,w'\rangle=0.\]


    \[W(v_1,\ldots, v_k,v)=\left|\begin{array}{cccc}\langle v_1,v_1\rangle &\ldots &\langle v_1,v_k\rangle & \sum_{i=1}^k a_i\langle v_1,v_i\rangle\\ \langle v_2,v_1\rangle &\ldots &\langle v_2,v_k\rangle & \sum_{i=1}^k a_i\langle v_2,v_i\rangle\\ &\ldots&&\ldots \\ \langle v_k,v_1\rangle &\ldots &\langle v_k,v_k\rangle & \sum_{i=1}^k a_i\langle v_k,v_i\rangle\\\langle w',v_1\rangle &\ldots &\langle w',v_k\rangle & \sum_{i=1}^k a_i\langle w',v_i\rangle\end{array}\right|+\]

    \[+\left|\begin{array}{cccc}\langle v_1,v_1\rangle &\ldots &\langle v_1,v_k\rangle & 0\\ \langle v_2,v_1\rangle &\ldots &\langle v_2,v_k\rangle & 0\\ &\ldots&&\ldots \\ \langle v_k,v_1\rangle &\ldots &\langle v_k,v_k\rangle & 0\\\langle w',v_1\rangle &\ldots &\langle w',v_k\rangle &  \langle w,w\rangle\end{array}\right|.\]

But the first is equal to zero because the last column is a combination of the other columns. And the second is equal to \langle w,w\rangle W(v_1,\ldots, v_k) (expansion by the last column). Thus

    \[W(v_1,\ldots, v_k,v)=\|w\|^2W(v_1,\ldots, v_k).\]

Since, we know that k-dimensional measure of a parallelepiped should be the product of k-1-dimension measure of its base by its height (length of an appropriate projection), using induction you can easily come to the conclusion that

    \[\mu_{k}(R(p_0;v_1,\ldots,v_k))=\sqrt{W(v_1,\ldots, v_k)}.\]


    \[\mu_{k}(S(p_0,\ldots,p_k))=\frac{1}{k!}\sqrt{W(p_1-p_0,\ldots, p_k-p_0)}.\]

Orientation of a space

We say that bases \mathcal{A}, \mathcal{B} of a vector space V are consistently oriented if \det M(id)_{\mathcal{A}}^{\mathcal{B}}>0, and inconsistently oriented if \det M(id)_{\mathcal{A}}^{\mathcal{B}}<0. Being consistently oriented is thus an equivalence relation on the set of all bases which has two equivalence classes.

An orientation of a vector space is choosing one of those two equivalence classes (e.g. by choosing a basis). Then bases from this equivalence class are said to have positive orientation and all the rest have negative orientation.

12. Quadratic forms

Quadratic form

A quadratic form is a function which assigns a number to each vector, in such a way that it is sum of products of two coordinates, e.g. q(x,y,z)=x^2-2xy+4xz-5z^2. The square of the norm is also an example of a quadratic form (\|(x,y,z)\|^2=x^2+y^2+z^2).

In other words, a quadratic form can be described as q(v)=h(v,v), where h is a bilinear form. Over field other than \mathbb{Z}_2 always take a symmetric bilinear form, if the characteristic of the field is not equal to 2, and we are going to make such an assumption from now on.

Positive and negative definite forms

We can classify forms with respect to possible sign of results:

  • form q\colon V\to\mathbb{R} is positively definite, if for all v\in V, we get q(v)>0.
  • form q\colon V\to\mathbb{R} is negatively definite, if for all v\in V, we get q(v)<0.
  • form q\colon V\to\mathbb{R} is positively semidefinite, if for all v\in V, we get q(v)\geq 0.
  • form q\colon V\to\mathbb{R} is negatively semidefinite, if for all v\in V, we get q(v)\leq 0.

Obviously a form may not fall in any of those categories, if for some v,w\in V we have q(v)>0 and q(w)<0. Such forms are called indefinite.

Matrix of a form

The matrix of a quadratic form q with respect to basis \mathcal{A} is the matrix G(h,\mathcal{A}), where h is a symmetric bilinear form such that h(v,v)=q(v). E.g., let q(x,y,z)=x^2-2xy+4xz-5z^2, then:

    \[q(x,y,z)=[x,y,z]\cdot \left[\begin{array}{ccc}1&-1&2\\-1&0&0\\2&0&-5\end{array}\right]\cdot \left[\begin{array}{c}x\\y\\z\end{array}\right]\]

so notice, that the coefficients are divided by 2 outside the diagonal, because the same expression is generated twice.

Sylvester’s criterion

Sylvester’s criterion determines whether a form is positively definite or negatively definite. Notice that it does not tell anything about the categories with semidefinite forms!

How does it work? We study determinants of minors: let A_k be the matrix of size k\times k in the left upper corner of the matrix of a form we study. Let n\times n be the size of the matrix of this form. Sylvester’s criterion consists of the two following facts:

  • if for any k\leq n we have \det A_k > 0, the form is positively definite,
  • if for any k\leq n we have \det A_k > 0 for even k, and \det A_k<0, for odd k, then the form is negatively definite.

E.g. let q(x,y)=2x^2+y^2-2xy, so its matrix: \left[\begin{array}{cc}2&-1\\-1&1\end{array}\right], so A_1=[2] and A_2=\left[\begin{array}{cc}2&-1\\-1&1\end{array}\right], therefore \det A_1=2>0 and \det A_2=2-1=1>0, so form q is positively definite.

E.g., let q(x,y)=-x^2-5y^2-4xy-z^2, so its matrix: \left[\begin{array}{ccc}-1&-2&0\\-2&-5&0\\0&0&-1\end{array}\right], so A_1=[-1] and A_2=\left[\begin{array}{cc}-1&-2\\-2&-5\end{array}\right], also A_3=\left[\begin{array}{ccc}-1&-2&0\\-2&1&0\\0&0&-1\end{array}\right], therefore \det A_1=-1<0 and \det A_2=5-4=1>0 and \det A_3=-5+4=-1<0, so form q is negatively definite.

Finally let q(x,y)=-2x^2+y^2-2xy, its matrix: \left[\begin{array}{cc}-2&-1\\-1&1\end{array}\right], so A_1=[-2] and A_2=\left[\begin{array}{cc}-2&-1\\-1&1\end{array}\right], therefore \det A_1=-2<0 and \det A_2=-2-1=-3<0, so form q is neither positively definite nor negatively definite.

Diagonalization of quadratic forms

But to check everything (including semi definiteness), we have to diagonalize the form, i.e. find a basis in which its matrix is diagonal (a diagonal congruent matrix). Then, obviously if:

  • it has only positive entrees on the diagonal, then q is positive definite,
  • it has only negative entrees on the diagonal, then q is negative definite,
  • it has only nonnegative entrees on the diagonal, then q is positive definite,
  • it has only nonpositive entrees on the diagonal, then q is negative semi definite,
  • it has a positive and a negative entree on the diagonal, then q is nondefinite.

It can be done it the tree following methods

Diagonalization of a form: complementing to squares

We may complement a formula of a form to squares making sure to use all expressions with the first variable first, and then all with the second one, and so on.




where x'=x+y-2z, y'=y-5z i z'=z, so the form is non-definite. The basis \mathcal{A}, in which the formula is expressed is (1,0,0),(-1,1,0),(-3,5,1),


Diagonalization of a form: orthogobal basis

We may also find an orthogonal basis with respect to the symmetrical bilinear form related to the considered quadratic form. Then the entrees on the diagonal are the values of the form on the vectors from this basis.

Diagonalization of a form: eigenvalues

Finally, we shall remind ourselves that there exists a basis consisting of eigenvectors of a self-adjoint endomorphism described by the same matrix, which is orthogonal with respect to the symmetrical bilinear form related to the considered quadratic form. Then the entrees on the diagonal are the eigenvalues of the matrix.

E.g.: let q(x,y)=-2x^2+y^2-2xy, the matrix: \left[\begin{array}{cc}-2&-1\\-1&1\end{array}\right], so its characteristic polynomial: (-2-\lambda)(1-\lambda)+1=\lambda^2+\lambda-1 has zeroes in \frac{-1-\sqrt{5}}{2} and \frac{-1+\sqrt{5}}{2}, so it has eigenvalues of both signs, so is q is indefinite.

4. Basic linear algebra

Part 1.: Problems, solutions.
Part 2.: Problems, solutions.
Part 3.: Problems, solutions.
Part 4.: Problems, solutions.
Part 5.: Problems, solutions.
Part 6.: Problems, solutions.
Part 7.: Problems, solutions
Part 8.: Problems, solutions.
Part 9.: Problems, solutions.

A system of k linear equations with variables x_{1}, x_{2},\ldots, x_{n} is simply a set of k equations, in which the variables appear without any powers or multiplication between them:

    \[\begin{cases} a_{1,1}x_1+a_{2,1}x_2+\ldots+a_{n,1}x_n=b_1\\ a_{1,2}x_1+a_{2,2}x_2+\ldots+a_{n,2}x_n=b_2\\ \ldots\\ a_{1,k}x_1+a_{2,k}x_2+\ldots+a_{n,k}x_n=b_k \end{cases},\]

where a_{i,j},b_j are some real numbers. E.g. the following is a system of 3 linear equations with four variables:

    \[\begin{cases}x_{1}+3x_{2}+x_{3}+5x_{4}=2\\ 2x_{1}+7x_{2}+9x_{3}+2x_{4}=4\\ 4x_{1}+13x_{2}+11x_{3}+12x_{4}=8\end{cases}\]

A solution of such a system of linear equations is a tuple of four numbers (four is the number of variables in this case), which satisfies all the equations in the system (after substitution of those numbers under the variables). E.g. (2,0,0,0) in our case. But a system can have more than one solutions. For example, (22,-7,1,0) is also a solution to the above system. More precisely a system of linear equations can have 0, 1, or infinitely many solutions.

A general solution of a system of linear equations is just a description of a set of all its solutions (which sometimes may mean saying that there are no solutions at all, or pointing out the only one). How to find a general solution? We will use so called Gaussian elimination method. Less formally we will call it transformation of our system of equations into a echelon form.

The first step is to write down a matrix of a given system of equations. A matrix is simply an rectangular table with numbers. Matrices will play more and more important role in this course, but now we can just think of a matrix of a system of equations as of an abbreviated notation of the system itself. Simply we write down the coefficients separating the column of free coefficients with a line:

    \[\left[\begin{array}{cccc|c}1 & 3 & 1 & 5 & 2\\ 2 & 7 & 9 & 2 & 4\\ 4 & 13 & 11 & 12 & 8\end{array}\right]\]

Next we make some operations on this matrix. Those operations simplify the matrix, but we shall make sure that they do not change the set of solutions of the corresponding system of equations. Therefore only three types of operations are permitted:

  • subtracting from a row another row multiplied by a number (corresponds to subtracting an equation from another one),
  • swapping two rows (corresponds to swapping two equations)
  • multiplying a row by a non-zero number (corresponds to multiplying both sides of an equation by a non-zero number)

And our aim is to achieve a staircase-like echelon form of a matrix (meaning that you can draw a stairs through a matrix, under which you will have only zeros). It means that in each corresponding equation there will be less variables than in the previous one.

What should we do to achieve such a form? The best method is to generate the echelon from the left side using the first row. Under 1 in the left upper corner we would like to have zeros. To achieve this, we have to subtract the first row multiplied by 2 from the second one and also subtract the first row multiplied by 4 from the third one. So:

    \[\underrightarrow{w_{2}-2w_{1}, w_{3}-4w_{1}} \left[\begin{array}{cccc|c}1 & 3 & 1 & 5 & 2\\ 0 & 1 & 7 & -8 & 0\\ 0 & 1 & 7 & -8 & 0\end{array}\right]\]

So now we would like to have a zero under 1 in the second row. Therefore we subtract the second row from the third:

    \[\underrightarrow{w_{3}-w_{2}} \left[\begin{array}{cccc|c}1 & 3 & 1 & 5 & 2\\ 0 & 1 & 7 & -8 & 0\\ 0 & 0 & 0 & 0 & 0\end{array}\right]\]

And so we have achieved an echelon form! We have leading coefficient in the first row on the first variable and a leading coefficient in the second row on the second variable.

The next step is to reduce the matrix (transform it into a reduced echelon form). We would like to have zeros also above the leading coefficients. In our case we only need to do something with 3 on the second place in the first row. To have a zero there we have to subtract the second row multiplied by 3 from the first one.

    \[\underrightarrow{w_{1}-3w_{2}} \left[\begin{array}{cccc|c}1 & 0 & -20 & 29 & 2\\ 0 & 1 & 7 & -8 & 0\\ 0 & 0 & 0 & 0 & 0\end{array}\right],\]

We need also to make sure that the leading coefficients are always ones (sometimes we have to multiply a row by a fraction number). But in our case it is already done. We have achieved a reduced echelon form. To have a general solution it suffices to write down corresponding equations moving the free variables to the right side of the equations:

    \[\begin{cases}x_{1}=2+20x_{3}-29 x_{4}\\ x_{2}=-7x_{3}+8x_{4} \end{cases}\]

We can also write it in parametrized form substituting to a vector (x_1,x_2,x_3,x_4) all we know in the general solution, so: \{(2+20x_{3}-29 x_{4}, -7x_{3}+8x_{4}, x_{3},x_{4})\colon x_3,x_4\in\mathbb{R}\}. Notice that by substituting x_3 and x_4 by any real numbers we will get a tuple (sequence of four numbers) which is a solution to our system of equations (it has infinitely many solutions). E.g. substituting x_3=1,x_4=0 we get the solution (22,-7,1,0), which was mentioned above.

Finally let us introduce some terminology. A system of linear equations is:

  • homogeneous if all constant terms are zero,
  • inconsistent if it has no solutions.

For example the following set:

    \[\begin{cases} x+y=2\\x-2y=-1\end{cases}\]

is not homogeneous, has exactly one solution, so it is not inconsistent. Meanwhile:

    \[\begin{cases} x+y+z=0\\x-y-z=0\\2x+y=0\end{cases}\]

is homogeneous and has exactly one solution, so is not inconsistent. On the other hand, the following set of equations is inconsistent:

    \[\begin{cases} x+y+z=0\\-2x-2y-2z=-1\end{cases}\]

Note that a system of linear equations can have exactly one, infinitely many or no solutions.


In many applications we will use the notion of determinant of a matrix. The determinant of a matrix makes sense for square matrices only and is defined recursively:

  • \det[a]=a
  •     \[\det \left[\begin{array}{cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\\ldots&\ldots&&\ldots\\ a_{n,1}&a_{n,2}&\ldots&a_{n,n}\end{array}\right]=\]

        \[=a_{1,1}\det A_{1,1}-a_{1,2}\det A_{1,2}+a_{1,3}\det A_{1,3}-\ldots\pm a_{1,n}\det A_{1,m},\]

where A_{i,j} is matrix A with i-th row and j-th column crossed out. So (the determinant is denoted by \det or by using absolute value style brackets around a matrix):





And so on. E.g.:



    \[=1\cdot 4-0+2\cdot 19-0=42\]

Laplace expansion

The above definition is only a special case of a more general fact called Laplace expansion. Instead of using the first row we can use any row or column (choose always the one with most zeros). So:

    \[\det \left[\begin{array}{cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\\ldots&\ldots&&\ldots\\ a_{n,1}&a_{n,2}&\ldots&a_{n,n}\end{array}\right]=\]

    \[=(-1)^{i+1}\left(a_{i,1}\det A_{i,1}-a_{i,2}\det A_{i,2}+a_{i,3}\det A_{i,3}-\ldots\pm a_{i,n}\det A_{i,n}\right),\]

for any row w_i. Analogical fact is true for any column.

E.g. for the below matrix it is easiest to use the third column:

    \[\left|\begin{array}{cccc}1&1&0&-1\\2&0&-1&-1\\3&-1&0&1\\0&1&0&-2\end{array}\right|=(-1)^3\cdot(-1)\cdot \left|\begin{array}{cccc}1&1&-1\\3&-1&1\\0&1&-2\end{array}\right|=4\]

Determinant and operations on a matrix

Notice first that from the Laplace expansion we easily get that if a matrix has a row of zeros (or column) its determinant equals zero.

Consider now different operations on rows of a matrix, which we use to calculate a ,,stair-like” form of a matrix. Using Laplace expansion we can prove that swapping two rows multiplies the determinant by -1 — indeed calculating the determinant using the first column we see that the signs in the sum may change, but also the rows in the minor matrices get swapped.

Immediately we can notice that multiplying a row by a number multiplies also the determinant by this number — you can see it easily calculating Laplace expansion using this row.

Therefore multiplying whole matrix by a number multiplies the determinant by this number many times, precisely:

    \[\det (aA)=a^n \det A, \]

where A is a matrix of size n\times n.

Notice also, that the determinant of a matrix with two identical rows equals zero, because swapping those rows does not change the matrix but multiplies the determinant by -1, so \det A=-\det A, therefore \det A=0. So because of the row multiplication rule, if two rows in a matrix are linearly dependent, then its determinant equals 0.

Also the Laplace expansion implies that if matrices A, B, C differ only by i-th row in the way that this row in matrix C is a sum of i-th rows in matrices B and C, then the determinant of C is the sum of determinants of A and B, e.g.:


But it can be easily seen that in general \det(A+B)\neq \det A+\det B!

Finally, consider the most important operation of adding to a row another row multiplied by a number. Then we actually deal with the situation described above. The resulting matrix is matrix C, which differs from A and B only by the row we sum to. Matrix A is the original matrix and matrix C is matrix A, in which we substitute the row we sum to with the row we are summing multiplied by a number. Therefore \det A=\det B+\det C, but C has two linearly dependent rows, so \det C=0 and \det B=\det A. Therefore the operation of adding a row multiplied by a number to another row does not change the determinant of a matrix.

Calculating the determinant via triangular form of matrix

If you look closely enough you will see that the Laplace expansion also implies that the determinant of a matrix in an echelon form (usually called triangular for square matrices) equals the product of elements on the diagonal of the matrix, so e.g.:

    \[\left|\begin{array}{ccc}-1&3&-1\\0&1&2\\0&0&3\end{array}\right|=(-1)\cdot 1\cdot 3=-3.\]

Because we know how the elementary operations change, to calculate the determinant of a matrix we can calculate a triangular form, calculate its determinant and recreate the determinant of the original matrix. This method is especially useful for large matrices, e.g.:



    \[\left[\begin{array}{ccccc}1&2&0&3&1\\0&1&1&-3&0\\0&3&1&-8&-4\\0&0&0&0&3\\0&0&0&-2&0\end{array}\right]\underrightarrow{w_3-3w_2, w_4\leftrightarrow w_5} \left[\begin{array}{ccccc}1&2&0&3&1\\0&1&1&-3&0\\0&0&-2&1&-4\\0&0&0&-2&0\\0&0&0&0&3\end{array}\right]\]

Therefore, the determinant of the last matrix is 1\cdot 1\cdot (-2)\cdot (-2)\cdot 3=12. On our way we have swapped rows once and we have multiplied one row by \frac{1}{2}, therefore the determinant of the first matrix equals \frac{12\cdot(-1)}{\frac{1}{2}}=-24.

The above fact also implies how to calculate the determinant of a matrix which is in the block form: \left[\begin{array}{cc}A&C\\0&B\end{array}\right] with left bottom block of zeros. The determinant of such a matrix equals \det A\cdot \det B, e.g.:

    \[\left|\begin{array}{ccccc}1&2&0&3&1\\2&6&2&0&2\\3&9&1&1&-1\\0&0&0&3&4\\0&0&0&1&1\end{array}\right|= \left|\begin{array}{ccc}1&2&0\\2&6&2\\3&9&1\end{array}\right|\cdot\left|\begin{array}{cc}3&4\\1&1\end{array}\right|.\]

Cramer’s rule

Given a system of n equations with n variables we may try to solve it with Cramer’s rule. Let A be the matrix of this system without the column of free coefficients. Let A_{i} be the matrix A, in which instead of i-th column we put the column of free coefficients. Then:

  • if \det A\neq 0, the system has exactly one solution. The solution is given by the following formula: x_{i}=\frac{\det A_i}{\det A},
  • if \det A=0, and at least one of \det A_{i} is not equal to 0, the system has no solutions,
  • if \det A=0 and for every i, \det A_{i}=0, there can be zero or infinitely many solutions — Cramer’s method does not give any precise answer.

E.g. let us solve the following system of equations:

    \[\begin{cases} x+2y+z=1\\2x+5y+2z=-1\\-x-2y=0\end{cases}\]



Since \det A=1, this system has exactly one solution. To determine it we calculate the other determinants:

    \[\det A_1= \left|\begin{array}{ccc}1&2&1\\-1&5&2\\0&-2&0\end{array}\right|=6\]

    \[\det A_2= \left|\begin{array}{ccc}1&1&1\\2&-1&2\\-1&0&0\end{array}\right|=-3\]

    \[\det A_3= \left|\begin{array}{ccc}1&2&1\\2&5&-1\\-1&-2&0\end{array}\right|=1\]

And so x=\frac{6}{1}=6, y=\frac{-3}{1}=-3, z=\frac{1}{1}=1.

Vector spaces

Generally speaking a vector space is a ,,space” consisting of vectors, which can be:

  • added and addition is associative ((u+v)+w=u+(v+w)) and commutative (u+v=v+u)
  • multiplied by a number and multiplication is distributive with respect to the addition(a(v+w)=av+bw), and with respect to the addition of numbers ((a+b)v=av+bv) and compatible with multiplication of numbers (a(bv)=(ab)v)

in which there exists a zero vector 0 such that v+0=v, for any vector v and for each vector v there exists a vector w (inverse to v) such that v+w=0.

The plane is a classic example of a vector space. Vectors have a form of a pair of numbers (x,y), and can be added one to another (x_1,y_1)+(x_2,y_2)=(x_1+x_2,y_1+y_2) and multiplied by a number a\cdot(x,y)=(ax,ay). There exists a zero vector (0,0) and for each vector (x,y) there is an inverse vector, (-x,-y).

But obviously a space consisting of vectors of the form of sequence of larger number of numbers, e.g. 4, (x,y,z,t) is also a vector space. Although it is not easy to imagine this space geometrically, addition and multiplication works in the same way as in the two-dimensional case.

Vector subspaces

A vector subspace is a subset X of a vector space closed under the operations on vectors, meaning that if vectors u,v are in X, then u+v is also in X and for any number a, also au is in X.

To prove that a subset is a vector subspace we have to prove that for any two vectors in this subset their sum is also in this subset and that for any vector in the subset and any number, their multiplication is in the subset. To prove that a subset is not a vector subspace it suffices to find two vectors in the subset with the sum outside it or a vector in the subset and a number such that their multiplication does not belong to the subset.

For example, the line y=0 is a vector subspace of the plane, since if u,v belong to this line, then they are of form (x_1,0) i (x_2,0) and their sum is (x_1+x_2,0) and also belongs to the line. Similarly for every number a, a(x_1,0)=(ax_1,0) belongs to this line.

On the other hand the set \{(x,y)\colon x\geq0\} is not a vector subspace, because vector (1,0), which is in the set, multiplied by -1 gives (-1,0), which is not in the set.

Linear combinations

Given a finite set of vectors v_1,v_2,\ldots v_n, any vector of form a_1v_1+a_2v_2+\ldots a_nv_n, where a_1,a_2,\ldots a_n are some numbers is called their linear combination. For example vector (-2,1) is a linear combination of vectors (1,1) and (2,1) because (-2,1)=4(1,1)-3(2,1). How to calculate those coefficients without simply guessing them? Notice that we look for numbers a,b, such that a(1,1)+b(2,1)=(-2,1). This is actually a system of two linear equations:


it suffices to solve it:

    \[\left[\begin{array}{cc|c}1&2&-2\\1&1&1\end{array}\right]\underrightarrow{w_2-w_1} \left[\begin{array}{cc|c}1&2&-2\\0&-1&3\end{array}\right] \underrightarrow{(-1)\cdot w_2}\]


so a=4,b=-3. If this system were contradictory, it would mean that our vector is not a linear combination of those two vectors.

The set of all linear combinations of vectors v_1,\ldots, v_n is denoted as \text{lin}(v_1,\ldots,v_n).

Linear independence

A set of vectors will be called linearly independent, if none of them is a linear combination of others. It is quite an impractical definition, because to check whether a set of four vectors is linearly independent directly from this definition we have to check that four systems of equations are contradictory. We need a different but equivalent definition.

Notice that if a vector is a linear combination of others, e.g. v_5=a_1 v_1+a_2 v_3+a_3 v_3+a_4 v_4, then the zero vector can be written as a_1 v_1+a_2 v_3+a_3 v_3+a_4 v_4 + (-1)v_5, so the zero vector is a linear combination of vectors from our set in which at least one coefficient is non-zero. It seems that we are close to the another definition of linear independence: a set of vectors is linearly independent, if in every their combination giving the zero vector all the coefficients are zeros. In other words the only way to get the zero vector is the trivial way: multiplying them all by zero.

How to check that a given set of vectors is linearly independent? Obviously we have to calculate the coefficients of their linear combination giving the zero vector (if the only solution is that all the coefficients are zero, then the set is linearly independent). So we need to check whether a system of linear equations has only one solution.

For example, let us check whether (1,1), (2,1) are linearly independent. We solve a system of equations:

    \[\left[\begin{array}{cc|c}1&2&0\\1&1&0\end{array}\right]\underrightarrow{w_2-w_1} \left[\begin{array}{cc|c}1&2&-2\\0&-1&3\end{array}\right]\]

We already know that the system has only one solution because we have a stair in every row. Therefore this set of vectors is linearly independent.

There is also a second method of checking whether a set of vector is linearly independent. Notice that the operations on rows of a matrix are just creating linear combinations of the vectors written in the rows of the matrix. If we manage to get a row of zeros, it means we have a non-trivial linear combination of rows giving the zero vector, and so the set of vectors written in the rows is not linearly independent. Let us check whether (1,2,1),(-2,0,-2),(1,3,1) are linearly independent.

    \[\left[\begin{array}{ccc}1&2&1\\-2&0&-2\\1&3&1\end{array}\right]\underrightarrow{w_2+2w_1, w_3-w_1}\left[\begin{array}{ccc}1&2&1\\0&4&0\\0&1&0\end{array}\right]\underrightarrow{w_2\leftrightarrow w_3}\]

    \[\left[\begin{array}{ccc}1&2&1\\0&1&0\\0&4&0\end{array}\right]\underrightarrow{w_3-4 w_2}\left[\begin{array}{ccc}1&2&1\\0&1&0\\0&0&0\end{array}\right]\]

we get a row of zeros, so (1,2,1),(-2,0,-2),(1,3,1) are not linearly independent.

Describing a vector space

How to describe mathematically a given vector subspace? There are few ways:

  1. list a set of vectors which spans it,
  2. write out its basis,
  3. give a homogeneous system of linear equations which describes it.

The above needs few more words of explanation. We would also like to know how to move from one of the descriptions to another.

A spanning system of vectors

It is easy to notice that the set of all linear combinations of a given set of vectors is always a vector space. We check it in the case of three vectors, let \alpha,\beta\in\text{lin}(v_1,v_2,v_3), then \alpha=av_1+bv_2+cv_3, \beta=dv_1+fv_2+gv_3, for some numbers a, b, c, d, f, g. Therefore, (\alpha+\beta)= (a+d)v_1+(b+f)v_2+(c+g)v_3 and for any number k, k\alpha=kav_1+kav_2+kav_3 are also linear combinations of v_1, v_2, v_3, so they are in \text{lin}(v_1,v_2,v_3), so we are dealing with a vector space.

Therefore, \text{lin}((2,2,3),(1,1,1),(-1,-1,0)), or \text{lin}((1,0),(0,1)) are examples of vector spaces, defined by listing the sets of vectors which span them.


A system of vectors which spans a given vector space and additionally is linearly independent is called its basis. Basis of a vector space is not unique, e.g. (1,0),(0,1) (so called standard basis), and also (-1,0),(2,1) are both bases of the plane. But in every basis of a given vector space you will find the same number of vectors. This number is called the dimension of the given vector space V and denoted as \text{dim}V.

Obviously, a basis is in particular a system of vector which spans a given space.

Every vector in the given space is a linear combination of vectors from the basis. Furthermore, the coefficients in this combinations are uniquely determined and called coordinates. So, e.g. the vector (1,1) has coordinates 1,1, with respect to the basis (-1,0),(2,1) and (0,-1) with respect to the same basis has coordinates -2,-1. Obviously, we calculate the coordinates of a given vector with respect to a given basis by solving a system of equations. We have done it already before while determining whether a vector is a linear combination of other vectors. This time the system will always have exactly one solution.

A system of equations describing a space

Notice that given a homogeneous system of linear equations and two its solutions, both their sum and a solution multiplied by any number are also solutions of the system. Let us check it in the following example. Vectors (1,-1,0) and (0,2,2) are solutions of the equation x+y-z=0. Indeed, 1+(-1)-0=0 and 0+2-2=0, so also (1+0)+(-1+2)-(0+2)=(1+(-1)-0)+(0+2-2)=0 and 5\cdot 1+ 5\cdot(-1)-5\cdot 0= 5\cdot(1+(-1)-0)=0.

Therefore, a set of solutions to a homogeneous system of linear equations is always a vector space! And a homogeneous system itself is one of possible descriptions of a vector space.

Now let us learn how to when given one form of description find another one.

Finding a basis from a set of vectors which spans the space

Given a set of vectors we would like to find a linearly independent set, which spans the same space. We have mentioned that in a matrix when we carry out the operations on rows we do not change the set of possible linear combinations of vectors written in the rows. Furthermore, notice that in a echelon form of matrix all vectors from the non-zero rows are linearly independent. Indeed, to get the zero vector in the combination of them, we need to multiply the first one by zero because of the first place. Therefore, we need to multiply the second one by zero because of the second place. An so on. Thus, all we need to do is to write the given vectors in the rows of a matrix and calculate its echelon form. The non-zero rows form a basis of the space.

For example, let us find a basis of the space \text{lin}((1,0,1,),(1,1,1),(-2,-3,-2)).

    \[\left[\begin{array}{ccc}1&0&1\\1&1&1\\-2&-3&-2\end{array}\right]\underrightarrow{w_2-w_1, w_3+2w_1}  \left[\begin{array}{ccc}1&0&1\\0&1&0\\0&-1&0\end{array}\right]\underrightarrow{w_3+w_2}  \left[\begin{array}{ccc}1&0&1\\0&1&0\\0&0&0\end{array}\right]\]

So (1,0,1),(0,1,0) is a basis of the given space, and the space has dimension 2.

Finding a basis of a space described by a system of equations

We start by finding a general solution to the given system in the parametrized form. E.g.


    \[\left[\begin{array}{ccccc|c}1&0&-1&1&-1&0\\1&1&0&1&0&0\end{array}\right]\underrightarrow{w_2-w_1} \left[\begin{array}{ccccc|c}1&0&-1&1&-1&0\\0&1&1&0&1&0\end{array}\right]\]

so the general solution is of form: (z-w+t,-z-t,z,w,t), so every vector in the given space is of form (z-w+t,-z-t,z,w,t), for some z,w,t. But notice that (z-w+t,-z-t,z,w,t)=z(1,-1,1,0,0)+w(-1,0,0,1,0)+t(1,-1,0,0,1). Therefore every vector in the space is a linear combination of vectors (1,-1,1,0,0),(-1,0,0,1,0),(1,-1,0,0,1), which I have calculated by substituting 1 for one parameter and zero for others. It is also easy to see that those three vectors are linearly independent. So a (1,-1,1,0,0),(-1,0,0,1,0),(1,-1,0,0,1) is a basis of the space and the space has dimension 3. Notice also that 3=5-2, and 5 is the number of the variables, and 2 is the number of independent equations in the system.

Finding a system of equations describing a vector space given by a system of vectors which spans it (or a basis)

Assume that we study the vector space \text{lin}((1,2,-1,0),(1,1,0,1),(0,1,-1,-1)). Therefore, our system of equations will have 4 variables. Assume that there will be an equation of form ax+by+cz+dw=0 in it. The three given vectors have to satisfy this equation. So a+2b-c=0, a+b+d=0, b-c-d=0, which actually gives a system of equations which has to be fulfilled by the coefficients of the system we are looking for. A system of equations describing the coefficients of a system of equations, looks a bit confusing, but bear with me. Let us write this system down:

    \[\begin{cases}a+2b-c=0\\ a+b+d=0\\ b-c-d=0\end{cases}\]

And solve it:




So the general solution, which will tell us what are the coefficients in the system we are looking for, is the following: (-c-2d,c+d,c,d). Notice that the system we are trying to find is not unique, many systems of equations are equivalent. Actually, the set of possible coefficients of an equation which is fulfilled by any vector in the given space forms also a vector space. We write down its basis using the calculated general solution: (-1,1,1,0),(-2,1,0,1). If we take those two vectors and write down two equations with such coefficients we will be able to get any equation which is fulfilled by any vector of our given space by combining them linearly. It means the system we look for is the system with coefficients from the calculated basis, namely:


And we are done!

Matrix multiplication

To multiply matrices, the first has to have the same number of columns as the second one of rows. The resulting matrix will have as many rows as the first one, and as many columns as the second one. We multiply the rows of the first matrix by the columns of the second in the sense that in the resulting matrix in a place in i-th row and j-th column we write the result of multiplication of i-th row of the first matrix with the j-th column of the second one, where by multiplication of row and column we mean multiplication of pairs of subsequent numbers summed up. E.g.:


    \[=\left[\begin{array}{cc}2\cdot(-1)+3\cdot(-5)+1\cdot 0&2\cdot 4+3\cdot 2+1\cdot 1\\ (-1)\cdot(-1)+2\cdot(-5)+3\cdot 0&(-1)\cdot 4+2\cdot 2+3\cdot 1\\ 0\cdot(-1)+1\cdot(-5)+0\cdot 0&0\cdot 4+1\cdot 2+0\cdot 1\\ 1\cdot(-1)+0\cdot(-5)+(-2)\cdot 0&1\cdot 4+0\cdot 2+(-2)\cdot 1\end{array}\right]=\]

    \[=\left[\begin{array}{cc}-17&13\\ -9&3\\ -5&2\\ -1&2\end{array}\right]\]

Matrix multiplication is associative, which means that for any A,B,C, A\cdot(B\cdot C)=(A\cdot B)\cdot C. But is not commutative! Notice that if the matrices are not square it is impossible to multiply them conversely. If they are square the result may be different.

Inverse matrix

A matrix B is inverse to matrix A, if A\cdot B=I, where I is the identity matrix (the matrix with ones on the diagonal and zeros everywhere else). The inverse matrix is denoted as A^{-1}. Since \det A\cdot B=\det A\cdot \det B and \det I=1, we see that \det A^{-1}=\frac{1}{\det A}. This implies that only matrices with non-zero determinants can have their inverses. Therefore we call such matrices invertible.

How to calculate the inverse of a given matrix? We have mentioned recently that the operations on rows of a matrix leading to the reduced “stair-like” for is actually multiplication by a matrix. Imagine that we transform the matrix [A|I] consisting of matrix A along with the identity matrix into the reduced “stair-like” form. Since A is a square matrix with non-zero determinant, we will get identity matrix on the left side: [I|B]. But notice that if C is the matrix of rows operations, then C\cdot [A|I]=[I|B]. Therefore C\cdot A=I and C\cdot I=B. The first equation implies that C=A^{-1}. The second that B=C=A^{-1}. So we get the inverse matrix on the right after those operations!

E.g. let us calculate the inverse of the following matrix:



    \[\left[\begin{array}{ccc|ccc}1&2&1&1&0&0\\2&5&2&0&1&0\\-1&-2&0&0&0&1\end{array}\right]\overrightarrow{w_2-2w_1,w_3+w_1} \left[\begin{array}{ccc|ccc}1&2&1&1&0&0\\0&1&0&-2&1&0\\0&0&1&1&0&1\end{array}\right]\overrightarrow{w_1-w_3}\]

    \[\left[\begin{array}{ccc|ccc}1&2&0&0&0&-1\\0&1&0&-2&1&0\\0&0&1&1&0&1\end{array}\right]\overrightarrow{w_1-2w_2} \left[\begin{array}{ccc|ccc}1&0&0&4&-2&-1\\0&1&0&-2&1&0\\0&0&1&1&0&1\end{array}\right]\]

And therefore:


Linear maps

A linear map \phi is a map which maps vectors from a given space V, to vectors from another linear space W (\phi\colon V\to W), and satisfying the linear condition, which says that for every vectors v,v'\in V and numbers a,b we have \phi(av+bv')=a\phi(v)+b\phi(v'). E.g. a rotation around (0,0) is a linear map \mathbb{R}^2\to\mathbb{R}^2. Given two vectors and scalars we will get the same vector regardless of whether we rotate the vector first and then multiply by numbers and add them, or multiply by numbers, add and then rotate.

Therefore, to prove that a given map is a linear map we need to prove that for any two vectors and any two numbers it satisfies the linear condition. E.g. \varphi\colon \mathbb{R}^2\to\mathbb{R} given as \varphi(x,y)=-x+2y is a linear map because if (x,y),(x',y')\in\mathbb{R}^2 and a,b\in\mathbb{R}, then \varphi(a(x,y)+b(x',y'))=\varphi((ax+bx',ay+by'))=-ax-bx'+2ay+2by'=a(-x+2y)+b(-x'+2y')=a\varphi((x,y))+b\varphi((x',y')).

Meanwhile, to disprove that a map is linear we need to find an example of two vectors along with two numbers such that the linear condition fail for them. E.g. \psi\colon \mathbb{R}\to\mathbb{R}^2 given as \psi(x)=(2x+1,0) is not a linear map because \psi(1+1)=\psi(2)=(5,0)\neq (6,0)=(3,0)+(3,0)=\psi(1)+\psi(1).

Usually, linear maps will be given by their formulas. E.g. \psi\colon\mathbb{R}^2\to\mathbb{R}^2, \psi((x,y))=(y,-x). Then to see what \psi does to a given vector, e.g. (1,2), we substitute it to the formula: \psi((1,2))=(2,-1). By the way, it is easy to see, that this is simply the rotation around (0,0) by 90 degrees clockwise. Less geometrical example: let \phi\colon\mathbb{R}^4\to\mathbb{R}^2, \phi((x,y,z,t))=(x+3y-t,3y-2z+t), therefore \phi maps vector (1,2,0,-1) to vector (1+6-(-1),6-0+(-1))=(8,5).

Sometimes we can define a linear map by giving its values on the vectors from a given basis only. This suffices to determine this map. E.g. let \varphi\colon\mathbb{R}^2\to\mathbb{R}^3 be given in the following form: \varphi((1,-1))=(1,2,1), \varphi((-1,0))=(-1,0,1) (vectors (1,-1), (-1,0) constitute a basis of the plane). We can calculate the formula of this map. First calculate the coefficients of the standard basis in the given one (by solving a system of equations or by guessing). In this case we see that (1,0)=0\cdot(1,-1)-1(-1,0) (coefficients: 0,-1 and (0,1)=-(1,-1)-(-1,0) (coefficients: -1,-1). Therefore, \varphi((x,y))=x\varphi((1,0))+y\varphi((0,1))=x(-\varphi((-1,0)))+y(-\varphi((1,-1))-\varphi((-1,0)))=x(-(-1,0,1))+y(-(1,2,1)-(-1,0,1))=x(1,0,-1)+y(0,-2,-2)=(x,-2y,-x-2y).

In the above example we can also come to the conclusion what is \varphi((1,0)) and \varphi((0,1)) by writing down a matrix consisting of the vectors of the given basis on the left hand side and values on the right hand side. Each operation on the rows of the matrix does not change the principle that the vector is on the left, and its value on the right, because \varphi is a linear map. So after transforming the matrix to the reduced echelon form on the right hand side one will find \varphi((1,0)) and \varphi((0,1)).

    \[\left[\begin{array}{cc|ccc}1&-1&1&2&1\\-1&0&-1&0&1\end{array}\right]\underrightarrow{w_1\leftrightarrow w_2}\left[\begin{array}{cc|ccc}-1&0&-1&0&1\\1&-1&1&2&1\end{array}\right]\underrightarrow{w_2+w_1}\]

    \[\left[\begin{array}{cc|ccc}-1&0&-1&0&1\\0&-1&0&2&2\end{array}\right]\underrightarrow{w_1\cdot (-1),w_2\cdot (-1)}\left[\begin{array}{cc|ccc}1&0&1&0&-1\\0&1&0&-2&-2\end{array}\right],\]

so as before


Given two linear maps \varphi,\psi\colon V\to W and a number a, their sum \varphi+\psi and a\varphi are linear maps V\to W. Obviously we add and multiply by coefficients. So e.g. if \varphi((x,y))=(-x,-2y),\psi((x,y))=(0,2x), then (\varphi+\psi)((x,y))=(-x,2x-2y).

It means that the set of all linear maps between fixed vector spaces V and W is a vector space itself. It is denoted by L(V,W).

Kernel and image of a linear transformation

If \varphi\colon V\to W is a linear transformation, then it is easy to see that

    \[\ker\varphi=\{v\in V\colon \varphi(v)=0\}\]

called the kernel of \varphi and

    \[\text{im}\varphi=\{\varphi(v)\colon v\in V\}\]

called the image of \varphi are linear subspaces of V and W respectively. The dimension of the image of \varphi is also called the rank of \varphi and denoted by r(\varphi). Notice that if v_1,\ldots, v_n spans V, then \varphi(v_1),\ldots,\varphi(v_n) spans \text{im}\varphi. Moreover, if V=\ker\varphi\oplus U, and v_1,\ldots, v_n is a basis of U, then \varphi(v_1),\ldots,\varphi(v_n) is a basis of \text{im}\varphi. Therefore,

    \[\dim V=\dim\ker\varphi+\dim\text{im}\varphi.\]

Matrix of a linear map

Matrices will be important for us for yet one more reason. It turns out that multiplying a matrix by a vector is actually the same as calculating a value of a linear map. Observe the following:

    \[\left[\begin{array}{ccc}1&0&-1\\2&3&-1\end{array}\right]\cdot \left[\begin{array}{c}x\\y\\z\end{array}\right]=\left[\begin{array}{c}x-z\\2x+3y-z\end{array}\right]\]

is exactly the same as \phi((x,y,z))=(x-z,2x+3y-z).

More precisely, given a linear map \varphi\colon V\to W, basis \mathcal{A} of V and basis \mathcal{B} of W, a matrix M(\varphi)_{\mathcal{A}}^{\mathcal{B}} which multiplied from the right by a (vertical) vector of coordinates of a vector v in basis \mathcal{A} will give the coordinates of \varphi(v) in \mathcal{B} will be called the matrix of \varphi in basis \mathcal{A} and \mathcal{B}.

In particular in the above example:


is the matrix of \phiin standard basis — and can be easily written out of the formula defining \phi simply by writing its coefficients in rows.

Changing bases of a matrix — first method: calculate coordinates.

Assume that we are given a formula defining \phi, as above (or its matrix in the standard basis) and basis \mathcal{A} and \mathcal{B}. E.g.: \mathcal{A}=((1,1,1),(1,0,-1),(1,0,0)) and \mathcal{B}=((2,3),(-1,-1)). We would like to calculate M(\phi)_{\mathcal{A}}^{\mathcal{B}}.

Notice that if we multiply this matrix from the right by (vertical) vector (1,0,0), then I will get simply the first column of this matrix. On the other hand the first vector from \mathcal{A}, namely vector (1,1,1), has in this basis coordinates 1,0,0. Therefore the result of the multiplication are the coordinates of \phi((1,1,1)) in basis \mathcal{B} and this is the first column of M(\phi)_{\mathcal{A}}^{\mathcal{B}}.

So: \phi((1,1,1))=(1-1, 2+3-1)=(0,4) and we have to find the coordinates of this vector in \mathcal{B}: (0,4)=4(2,3)+8(-1,-1), so the coordinates are 4,8 and this is the first column of the matrix we would like to calculate.

Let us find the second column. We do the same as before but with the second vector from basis \mathcal{A}. \phi((1,0,-1))=(1-(-1),2-(-1))=(2,3) has coordinates 1,0 in \mathcal{B}.

The third column: \phi((1,0,0))=(1,2) has in \mathcal{B} coordinates (1,2)=(2,3)+(-1,-1), so 1,1



Composing maps

Given two linear maps \phi\colon V\to W and \psi\colon W\to U we can compose them and consider a map which transforms a vector from V first via \phi and the result via \psi getting a vector form U.

Such a map is denoted as \psi\circ \phi. Given formulas defining \phi i \psi we can easily get the formula for \psi\circ \phi. E.g. consider \phi as above and \psi\colon\mathbb{R}^2\to\mathbb{R}^4 such that \psi((a,b))=(-b,a+b,-2a-b,3a). Therefore (\psi\circ\phi)((x,y,z))=\psi(\phi((x,y,z))=\psi(((x-z,2x+3y-z))=(-2x-3y+z,3x+3y-2z,-4x-3y+3z,3x-3z).

Now look at the matrices of those maps. If \mathcal{A} is basis of V, \mathcal{B} is basis W and \mathcal{C} is basis of U, then given coordinates of v in \mathcal{A} to get the coordinates of \psi\circ\phi(v)=\psi(\phi(v)) in \mathcal{C} we have to first multiply it by matrix M(\phi)_{\mathcal{A}}^{\mathcal{B}} (we will get coordinates \phi(v) in \mathcal{B}) and multiply the result by M(\psi)_{\mathcal{B}}^{\mathcal{C}}. Therefore we have multiplied the coordinates by M(\psi)_{\mathcal{B}}^{\mathcal{C}} \cdot M(\phi)_{\mathcal{A}}^{\mathcal{B}}, which means that:

    \[M(\psi\circ\phi)_{\mathcal{A}}^{\mathcal{C}} =M(\psi)_{\mathcal{B}}^{\mathcal{C}} \cdot M(\phi)_{\mathcal{A}}^{\mathcal{B}}.\]

Notice which bases have to agree in this formula!

In particular in our example:

    \[M(\psi\circ\phi)_{st}^{st} =M(\psi)_{st}^{st} \cdot M(\phi)_{st}^{st}=\left[\begin{array}{cc}0&-1\\1&1\\-2&-1\\3&0\end{array}\right]\cdot \left[\begin{array}{ccc}1&0&-1\\2&3&-1\end{array}\right] =\left[\begin{array}{ccc}-2&-3&1\\3&3&-2\\-4&-3&3\\3&0&-3\end{array}\right] \]

which is consistent with the formula we have calculated before.

Change-of-coordinates matrix

There is a special linear map, which we call the identity, which does nothing. For example, in \mathbb{R}^3 it is id((x,y,z))=(x,y,z). Therefore given two basis \mathcal{A} and \mathcal{B}, along with matrix M(id)_{\mathcal{A}}^{\mathcal{B}} if we multiply this matrix from the right by the coordinates of a vector v in basis \mathcal{A} we will get the coordinates also of v (as id(v)=v), but in basis \mathcal{B}. So matrix M(id)_{\mathcal{A}}^{\mathcal{B}} changes the basis from \mathcal{A} to \mathcal{B}.

Especially we will need matrices changing basis from the standard basis to the given one and from the given basis to the standard one. Let’s check how to calculate them.

It is easy to calculate the change-of-coordinates matrix change from a given basis to the standard basis. We will find M(id)_{\mathcal{A}}^{st} (basis is defined in the example above). After multiplying it from the right by 1,0,0 we will get its first column. On the other hand the first vector in \mathcal{A} has in it coordinates 1,0,0, so the result of multiplication are the coordinates of this vector in the standard basis, so simply it is this vector. So simply the i-th column of this matrix is the i-th vector from the basis. Therefore:


Now, the other case: from the standard basis to a given basis. We will calculate M(id)_{st}^{\mathcal{B}}. It can be easily seen that in columns we should put coordinates of vectors from the standard basis in the given basis. Let us calculate them: (1,0)=-(2,3)-3(-1,-1) and (0,1)=(2,3)+2(-1,-1). Therefore:


Changing the basis of a matrix — the second method: multiplication by a change-of-coordinates matrix

We have a new tool to change basis of a matrix of a linear map. Because id\circ \phi\circ id =\phi, we have that:

    \[M(\phi)_{\mathcal{A}}^{\mathcal{D}}=M(id\circ\phi\circ id)_{\mathcal{A}}^{\mathcal{D}}=M(id)_{\mathcal C}^{\mathcal{D}}\cdot M(\phi)_{\mathcal{B}}^{\mathcal{C}}\cdot M(id)_{\mathcal{A}}^{\mathcal{B}}.\]

In particular:

    \[M(\phi)_{\mathcal{A}}^{\mathcal{B}}=M(id)_{st}^{\mathcal{B}}\cdot M(\phi)_{st}^{st}\cdot M(id)_{\mathcal{A}}^{st}\]

and in this way given a matrix of a linear map in standard basis we can calculate its matrix in basis from \mathcal{A} to \mathcal{B}. In our example:


    \[=\left[\begin{array}{ccc}1&3&0\\1&6&1\end{array}\right] \cdot\left[\begin{array}{ccc}1&1&1\\1&0&0\\1&-1&0\end{array}\right]=\left[\begin{array}{ccc}4&1&1\\8&0&1\end{array}\right].\]

Obviously we get the same result as calculated by the first method.