Learn Hands-On Machine Learning with Scikit-Learn and TensorFlow-Chapter 8

np.random.seed(4)
m = 60
w1, w2 = 0.1, 0.3
noise = 0.1

angles = np.random.rand(m) * 3 * np.pi / 2 - 0.5
X = np.empty((m, 3))
X = np.empty((m, 3))
X[:, 0] = np.cos(angles) + np.sin(angles)/2 + noise * np.random.randn(m) / 2
X[:, 1] = np.sin(angles) * 0.7 + noise * np.random.randn(m) / 2
X[:, 2] = X[:, 0] * w1 + X[:, 1] * w2 + noise * np.random.randn(m)

rand(m) is not generating a random number between 0 and 60, but generating an array of 60 random numbers between 0-1. numpy.empty((m,3)) is generating a matrix with 60 rows and 3 columns. randn(m) is generating an array of 60 random numbers which are subjected to standard normal distribution. X[:,0]=xxx is using an 1-d array to assign the first column of X. Now X is a training set containing 60 instances. Every instance has 3 features.

X_centered = X - X.mean(axis=0)
U, s, V= np.linalg.svd(X_centered)

X.mean(axis=0) is averaging the data along the column,i.e,the data in all rows but at the same column are added up and divided by the total row number. After the singular value decomposition, V contains the principle components: every row of V is a principle component.  For n-dimentional feature space, V would be a n*n matrix. U is a m*m matrix. s is a 1-d array of size n. To restore X from U,s,V, you should generate an empty m*n matrix S, then set the dialog elements to the singular values s, then compute U*S*V.

To reduce the dimensions of the original features from 3 to 2, you should project the training data to the hyperplane constructed by the first two principle components,i.e, X*[p1 p1]. p1 is the column vector gotten by transposing the first row of V, p2 is the column vector gotten by transposing the first row of V. X is m*n, [p1 p2] is n*2, so the result is a matrix of m*2. The dimension is reduced from 3 to 2 now. To recover X from its projection, compute \(X_{p}[p1 p2]^T\), i.e, m*2 dot 2*n=m*n.

 

Leave a Reply