Starting from:
$25

$17.50

Introduction to Machine Learning Homework 1 solved


2 Problem 2 (10 points)
Regularized risk minimization: Modify the Matlab code for “polyreg.m” such that it learns a multivariate regression function f : R
100 → R, where the basis functions are of the form
f(x; θ) = X
k
i=1
θixi
The data-set is available in “problem2.mat”. As before, the x variable contains {x1, . . . , xN } and the
y variable contains their scalar labels {y1, . . . , yN }.
Use an l2 loss function to penalize the complexity of the model, e.g. minimize the risk
Rreg(θ) = 1
N
X
N
i=1
1
2
(yi − f(x; θ))2 +
λ
2N
kθk
2
Use two-fold cross validation (as in Problem 1) to find the best value for λ. Include a plot showing
training and testing risk across various choices of λ. A reasonable range for this data set would be
from λ = 0 to λ = 1000. Also, mark the λ which minimizes the testing error on the data set.
What do you notice about the training and testing error?
3 Problem 3 (10 points)
Logistic Squashing Function. The logistic squashing function is given by g(z) = 1/(1 + exp(−z)).
Show that it satisfies the property g(−z) = 1−g(z). Also show that its inverse is given by g
−1
(y) =
ln(y/(1 − y).
4 Problem 4 (20 points)
Logistic Regression: Implement a linear logistic regression algorithm for binary classification in
Matlab using gradient descent. Your code should accept a dataset {(x1, y1), . . . ,(xN , yN )} where
xi ∈ R
d and yi ∈ {0, 1} and find a parameter vector θ ∈ R
d
for the classification function
f(x; θ) =
1 + exp(−θ
>x)
−1
which minimizes the empirical risk with logistic loss
Remp(θ) = 1
N
X
N
i=1
(yi − 1) log(1 − f(xi
; θ)) − yi
log(f(xi
; θ)).
Since you are using gradient descent, you will have to specify the step size η and the tolerance .
Pick reasonable values for η and  to then use your code to learn a classification function for the
dataset in “dataset4.mat”. Type “load dataset4” and you will have the variables X (input vectors)
and Y (binary labels) in your Matlab environment which contain the dataset.
Show any derivations you need to make for this algorithm