SOLVED CSCI 5561: Assignment 4 Convolutional Neural Network

TO FIND THE NEXT OR DIFFERENT PROJECT CLICK ON THE SEARCH BUTTON ON THE TOP RIGHT MENU AND SEARCH USING COURSE CODE OR PROJECT TITLE.

Starting from:

~~$28~~

$19.60

2 Overview Figure 1: You will implement (1) a multi-layer perceptron (neural network) and (2) convolutiona neural network to recognize hand-written digit using the MNIST dataset. The goal of this assignment is to implement neural networks to recognize hand-written digits in the MNIST data. MNIST Data You will use the MNIST hand-written digit dataset to perform the first task (neural network). We have reduced the image size (28 × 28 → 14 × 14) and subsampled the data. You can download the training and testing data from Canvas. Description: The zip file includes two MAT files (mnist_train.mat and mnist_test.mat). Each file includes im_* and label_* variables: • im_* is a matrix (196 × n) storing vectorized image data (196 = 14 × 14) • label_* is 1 × n vector storing the label for each image data. n is the number of images. You can visualize the i th image, e.g., plt.imshow(mnist_train[’im_train’][:, 0].reshape((14, 14), order=’F’), cmap=’gray’). 3 CSCI 5561: Assignment #4 Convolutional Neural Network 3 Single-layer Linear Perceptron x1 w 1 y 1 a 196 x 1 10 a 10 y (a) Single linear perceptron 0 2000 4000 6000 8000 Iterations 6 7 8 9 10 11 12 13 Loss Training loss Testing loss (b) Loss 0 1 2 3 4 5 6 7 8 9 Accuracy: 0.297905 0 1 2 3 4 5 6 7 8 9 (c) Confusion Figure 2: You will implement a single linear perceptron that produces accuracy near 30%. Random chance is 10% on testing data. You will implement a single-layer linear perceptron (Figure 2(a)) with stochastic gradient descent method. We provide main_slp_linear where you will implement get_mini_batch and train_slp_linear. def get_mini_batch(im_train, label_train, batch_size) ... return mini_batch_x, mini_batch_y Input: im_train and label_train are a set of images and labels, and batch_size is the size of the mini-batch for stochastic gradient descent. Output: mini_batch_x and mini_batch_y are cells that contain a set of batches (images and labels, respectively). Each batch of images is a matrix with size 196×batch_size, and each batch of labels is a matrix with size 10×batch_size (one-hot encoding). Note that the number of images in the last batch may be smaller than batch_size. Description: You should randomly permute the the order of images when building the batch, and whole sets of mini_batch_* must span all training data. 4 CSCI 5561: Assignment #4 Convolutional Neural Network def fc(x, w, b) ... return y Input: x∈ Rm×1 is the input to the fully connected layer, and w∈ Rn×m and b∈ Rn×1 are the weights and bias. Output: y∈ Rn×1 is the output of the linear transform (fully connected layer). Description: FC is a linear transform of x, i.e., y = wx + b. def fc_backward(dl_dy, x, w, b, y) ... return dl_dx, dl_dw, dl_db Input: dl_dy ∈ R1×n is the loss derivative with respect to the output y. Output: dl_dx ∈ R1×m is the loss derivative with respect the input x, dl_dw ∈ R1×(n×m) is the loss derivative with respect to the weights, and dl_db ∈ R1×n is the loss derivative with respect to the bias. Description: The partial derivatives w.r.t. input, weights, and bias will be computed. dl_dx will be back-propagated, and dl_dw and dl_db will be used to update the weights and bias. def loss_euclidean(y_tilde, y) ... return l, dl_dy Input: y_tilde ∈ Rm is the prediction, and y∈ {0, 1} m is the ground truth label. Output: l∈ R is the loss, and dl_dy is the loss derivative with respect to the prediction. Description: loss_euclidean measure Euclidean distance L = ∥y − ye∥ 2 . 5 CSCI 5561: Assignment #4 Convolutional Neural Network def train_slp_linear(mini_batch_x, mini_batch_y) ... return w, b Input: mini_batch_x and mini_batch_y are cells where each cell is a batch of images and labels. Output: w ∈ R10×196 and b ∈ R10×1 are the trained weights and bias of a single-layer perceptron. Description: You will use fc, fc_backward, and loss_euclidean to train a singlelayer perceptron using a stochastic gradient descent method where a pseudo-code can be found below. Through training, you are expected to see reduction of loss as shown in Figure 2(b). As a result of training, the network should produce more than 25% of accuracy on the testing data (Figure 2(c)). Algorithm 1 Stochastic Gradient Descent based Training 1: Set the learning rate γ 2: Set the decay rate λ ∈ (0, 1] 3: Initialize the weights with a Gaussian noise w ∈ N (0, 1) 4: k = 1 5: for iIter = 1 : nIters do 6: At every 1000th iteration, γ ← λγ 7: ∂L ∂w ← 0 and ∂L ∂b ← 0 8: for Each image xi in k th mini-batch do 9: Label prediction of xi 10: Loss computation l 11: Gradient back-propagation of xi , ∂l ∂w using back-propagation. 12: ∂L ∂w = ∂L ∂w + ∂l ∂w and ∂L ∂b = ∂L ∂b + ∂l ∂b 13: end for 14: k++ (Set k = 1 if k is greater than the number of mini-batches.) 15: Update the weights, w ← w − γ R ∂L ∂w , and bias b ← b − γ R ∂L ∂b 16: end for 6 CSCI 5561: Assignment #4 Convolutional Neural Network 4 Single-layer Perceptron x1 w 1 y 196 x 1 10 y 1 a 10 a 1 f S o ft - m a x 10 f (a) Single-layer perceptron 0 1000 2000 3000 4000 5000 Iterations 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Loss Training loss Testing loss (b) Loss 0 1 2 3 4 5 6 7 8 9 Accuracy: 0.898720 0 1 2 3 4 5 6 7 8 9 (c) Confusion Figure 3: You will implement a single perceptron that produces accuracy near 90% on testing data. You will implement a single-layer perceptron with soft-max cross-entropy using stochastic gradient descent method. We provide main_slp where you will implement train_slp. Unlike the single-layer linear perceptron, it has a soft-max layer that approximates a max function by clamping the output to [0, 1] range as shown in Figure 3(a). def loss_cross_entropy_softmax(x, y) ... return l, dl_dy Input: x ∈ Rm×1 is the input to the soft-max, and y∈ {0, 1} m is the ground truth label. Output: L∈ R is the loss, and dl_dy is the loss derivative with respect to x. Description: Loss_cross_entropy_softmax measure cross-entropy between two distributions L = Pm i yi log yei where yei is the soft-max output that approximates the max operation by clamping x to [0, 1] range: yei = e xi P i e xi , where xi is the i th element of x. 7 CSCI 5561: Assignment #4 Convolutional Neural Network def train_slp(mini_batch_x, mini_batch_y) ... return w, b Output: w ∈ R10×196 and b ∈ R10×1 are the trained weights and bias of a single-layer perceptron. Description: You will use the following functions to train a single-layer perceptron using a stochastic gradient descent method: fc, fc_backward, loss_cross_entropy_softmax Through training, you are expected to see reduction of loss as shown in Figure 3(b). As a result of training, the network should produce more than 85% of accuracy on the testing data (Figure 3(c)). 8 CSCI 5561: Assignment #4 Convolutional Neural Network 5 Multi-layer Perceptron w1 1 x 196 x 1 1 y 1 a 10 y 10 a 1 a 1 f m a mf w2 1 f S o ft - m a x 10 f (a) Multi-layer perceptron 0 1 2 3 4 5 6 7 8 9 Accuracy: 0.914553 0 1 2 3 4 5 6 7 8 9 (b) Confusion Figure 4: You will implement a multi-layer perceptron that produces accuracy more than 90% on testing data. You will implement a multi-layer perceptron with a single hidden layer using a stochastic gradient descent method. We provide main_mlp. The hidden layer is composed of 30 units as shown in Figure 4(a). def relu(x) ... return y Input: x is a general tensor, matrix, and vector. Output: y is the output of the Rectified Linear Unit (ReLu) with the same input size. Description: ReLu is an activation unit (yi = max(0, xi)). In some case, it is possible to use a Leaky ReLu (yi = max(ϵxi , xi) where ϵ = 0.01). def relu_backward(dl_dy, x, y) ... return dl_dx Input: dl_dy ∈ R1×z is the loss derivative with respect to the output y ∈ Rz where z is the size of input (it can be tensor, matrix, and vector). Output: dl_dx ∈ R1×z is the loss derivative with respect to the input x. 9 CSCI 5561: Assignment #4 Convolutional Neural Network def train_mlp(mini_batch_x, mini_batch_y) ... return w1, b1, w2, b2 Output: w1 ∈ R30×196 , b1 ∈ R30×1 , w2 ∈ R10×30 , b2 ∈ R10×1 are the trained weights and biases of a multi-layer perceptron. Description: You will use the following functions to train a multi-layer perceptron using a stochastic gradient descent method: fc, fc_backward, relu, relu_backward, loss_cross_entropy_softmax. As a result of training, the network should produce more than 90% of accuracy on the testing data (Figure 4(b)). 10 CSCI 5561: Assignment #4 Convolutional Neural Network 6 Convolutional Neural Network Input Conv (3) ReLu Pool (2x2) Flatten FC Soft-max (a) CNN 0 1 2 3 4 5 6 7 8 9 Accuracy: 0.947251 0 1 2 3 4 5 6 7 8 9 (b) Confusion Figure 5: You will implement a convolutional neural network that produces accuracy more than 92% on testing data. You will implement a convolutional neural network (CNN) using a stochastic gradient descent method. We provide main_cnn. As shown in Figure 4(a), the network is composed of: a single channel input (14×14×1) → Conv layer (3×3 convolution with 3 channel output and stride 1) → ReLu layer → Max-pooling layer (2 × 2 with stride 2) → Flattening layer (147 units) → FC layer (10 units) → Soft-max. def conv(x, w_conv, b_conv) ... return y Input: x ∈ RH×W×C1 is an input to the convolutional operation, w_conv ∈ Rh×w×C1×C2 and b_conv ∈ RC2×1 are weights and bias of the convolutional operation. Output: y ∈ RH×W×C2 is the output of the convolutional operation. Note that to get the same size with the input, you may pad zero at the boundary of the input image. Description: You can use np.pad for padding 0s at boundary. Optionally, you may use im2col1 to simplify convolutional operation. 1https://leonardoaraujosantos.gitbook.io/artificial-inteligence/machine_learning/ deep_learning/convolution_layer/making_faster 11 CSCI 5561: Assignment #4 Convolutional Neural Network def conv_backward(dl_dy, x, w_conv, b_conv, y) ... return dl_dw, dl_db Input: dl_dy is the loss derivative with respect to y. Output: dl_dw and dl_db are the loss derivatives with respect to convolutional weights and bias w and b, respectively. Description: Note that for the single convolutional layer, ∂L ∂x is not needed. Optionally, you may use im2col to simplify convolutional operation. def pool2x2(x) ... return y Input: x ∈ RH×W×C is a general tensor and matrix. Output: y ∈ R H 2 × W 2 ×C is the output of the 2 × 2 max-pooling operation with stride 2. def pool2x2_backward(dl_dy, x, y) ... return dl_dx Input: dl_dy is the loss derivative with respect to the output y. Output: dl_dx is the loss derivative with respect to the input x. 12 CSCI 5561: Assignment #4 Convolutional Neural Network def flattening(x) ... return y Input: x ∈ RH×W×C is a tensor. Output: y ∈ RHW C is the vectorized tensor (column major). def flattening_backward(dl_dy, x, y) ... return dl_dx Input: dl_dy is the loss derivative with respect to the output y. Output: dl_dx is the loss derivative with respect to the input x. function train_cnn(mini_batch_x, mini_batch_y) ... return w_conv, b_conv, w_fc, b_fc Output: w_conv ∈ R3×3×1×3 , b_conv ∈ R3 , w_fc ∈ R10×147 , b_fc ∈ R10×1 are the trained weights and biases of the CNN. Description: You will use the following functions to train a convolutional neural network using a stochastic gradient descent method: conv, conv_backward, pool2x2, pool2x2_backward, Flattening, flattening_backward, fc, fc_backward, relu, relu_backward, loss_cross_entropy_softmax. As a result of training, the network should produce more than 92% of accuracy on the testing data (Figure 5(b)). 13