$14
Automatic image processing is a key component to many AI systems, including facial recognition and video compression. One basic method for processing is segmentation, by which we divide an image into a fixed number of components in order to simplify its representation. For example, we can train a mixture of Gaussians to represent an image, and segment it according to the simplified representation as shown in the images below.
In this assignment, you will learn to perform image segmentation. To this end, you will implement Gaussian mixture models and iteratively improve their performance. You will perform this segmentation on the "Bird" and "Party Spock" images included with the assignment.
The tests for the assignment are provided in the notebook, so Bonnie is only for submission purposes. The tests on Bonnie will be similar to the ones provided here, but the images being tested against, and the values for calculations will be different.
Thus, you will be allowed only 5 submissions on Bonnie. Make sure you test everything before submitting. Score for the last submission counts. The code will be allowed to run for not more than 2 hours per submission. In order for the code to run quickly, make sure to vectorize the code (more on this below).
Your assignment is to implement several methods of image segmentation, with increasing complexity.
The grade you receive for the assignment will be distributed as follows:
The assignment is due April 2nd, 2017 at 11:59PM UTC-12 (Anywhere on Earth time). The deliverable for this assignment is a completed mixture_models.py file.
The em.pdf chapter in the assignment folder gives a good explanation of implementing and training mixture models, particularly 424-427 (k-means) and 435-439 (mixture models and EM). The book Elements of Statistical Learning, pages 291-295.
A Gaussian mixture model is a generative model for representing the underlying probability distribution of a complex collection of data, such as the collection of pixels in a grayscale photograph.
In the context of this problem, a Gaussian mixture model defines the joint probability f(x) as
f(x)=∑i=1kmiNi(x|μi,σ2i)
where x is a grayscale value [0,1], f(x) is the joint probability of that gray scale value, mi is the mixing coefficient on component i, Ni is the ith Gaussian distribution underlying the value x with mean μi and variance σ2i.
We will be using this model to segment photographs into different grayscale regions. The idea of segmentation is to assign a component i to each pixel x using the maximum posterior probability
componentx=argmaxi(miNi(x|μi,σ2i)
Then we will replace each pixel in the image with its corresponding μi to produce a result as below (original above, segmented with three components below).
from __future__ import division import warnings warnings.simplefilter(action = "ignore", category = FutureWarning) import numpy as np import scipy as sp from matplotlib import image import matplotlib.pyplot as plt import matplotlib.cm as cm
"""Helper image-processing code. These have been added in a separate python file and added in to the repo. The functions below have all been imported in to your submission file""" def image_to_matrix(image_file, grays=False): """ Convert .png image to matrix of values. params: image_file = str grays = Boolean returns: img = (color) np.ndarray[np.ndarray[np.ndarray[float]]] or (grayscale) np.ndarray[np.ndarray[float]] """ img = image.imread(image_file) # in case of transparency values if(len(img.shape) == 3 and img.shape[2] > 3): height, width, depth = img.shape new_img = np.zeros([height, width, 3]) for r in range(height): for c in range(width): new_img[r,c,:] = img[r,c,0:3] img = np.copy(new_img) if(grays and len(img.shape) == 3): height, width = img.shape[0:2] new_img = np.zeros([height, width]) for r in range(height): for c in range(width): new_img[r,c] = img[r,c,0] img = new_img return img def matrix_to_image(image_matrix, image_file): """ Convert matrix of color/grayscale values to .png image and save to file. params: image_matrix = (color) numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] or (grayscale) numpy.ndarray[numpy.ndarray[float]] image_file = str """ # provide cmap to grayscale images cMap = None if(len(image_matrix.shape) < 3): cMap = cm.Greys_r image.imsave(image_file, image_matrix, cmap=cMap) def flatten_image_matrix(image_matrix): """ Flatten image matrix from Height by Width by Depth to (Height*Width) by Depth matrix. params: image_matrix = (color) numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] or (grayscale) numpy.ndarray[numpy.ndarray[float]] returns: flattened_values = (color) numpy.ndarray[numpy.ndarray[float]] or (grayscale) numpy.ndarray[float] """ if(len(image_matrix.shape) == 3): height, width, depth = image_matrix.shape else: height, width = image_matrix.shape depth = 1 return image_matrix.reshape(height*width, depth) def unflatten_image_matrix(image_matrix, width): """ Unflatten image matrix from (Height*Width) by Depth to Height by Width by Depth matrix. params: image_matrix = (color) numpy.ndarray[numpy.ndarray[float]] or (grayscale) numpy.ndarray[float] width = int returns: unflattened_values = (color) numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] or (grayscale) numpy.ndarray[numpy.ndarray[float]] """ heightWidth = image_matrix.shape[0] height = int(heightWidth / width) if(len(image_matrix.shape) > 1 and image_matrix.shape[-1] != 1): depth = image_matrix.shape[-1] return image_matrix.reshape(height, width, depth) else: return image_matrix.reshape(height, width) def image_difference(image_values_1, image_values_2): """ Calculate the total difference in values between two images. Assumes that both images have same shape. params: image_values_1 = (color) numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] or (grayscale) numpy.ndarray[numpy.ndarray[float]] image_values_2 = (color) numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] or (grayscale) numpy.ndarray[numpy.ndarray[float]] returns: dist = int """ flat_vals_1 = flatten_image_matrix(image_values_1) flat_vals_2 = flatten_image_matrix(image_values_2) N, depth = flat_vals_1.shape dist = 0. point_thresh = 0.005 for i in range(N): if(depth > 1): new_dist = sum(abs(flat_vals_1[i] - flat_vals_2[i])) if(new_dist > depth * point_thresh): dist += new_dist else: new_dist = abs(flat_vals_1[i] - flat_vals_2[i]) if(new_dist > point_thresh): dist += new_dist return dist
image_dir = 'images/' image_file = 'party_spock.png' values = image_to_matrix(image_dir + image_file) print(values)
The concept of Vectorization was introduced in the last section of Assignment 4. For this assignment, please vectorize your code wherever possible using numpy arrays, instead of running for-loops over the images being processed.
For an example of how this might be useful, consider the following array:
A = [12 34 1234 764 ...(has a million values)... 91, 78]
Now you need to calculate another array B, which has the same dimensions as A above. Say each value in B is calculated as follows:
(each value in B) = square_root_of(some constants pi log(k) * (each value in A))/7
You might wish to use a for-loop to compute this. However, it will take really long to run on an array of this magnitude.
Alternatively, you may choose to use numpy and perform this calculation in a single line. You can pass A as a numpy array and the entire calculation will be done in a line, resulting in B being populated with the corresponding values that come out of this formula.
20 pts
One easy method for image segmentation is to simply cluster all similar data points together and then replace their values with the mean value. Thus, we'll warm up using k-means clustering. This will also provide a baseline to compare with your segmentation. Please note that clustering will come in handy later.
Fill out k_means_cluster()
to convert the original image values matrix to its clustered counterpart. Your convergence test should be whether the assigned clusters stop changing. Note that this convergence test is rather slow. When no initial cluster means are provided, k_means_cluster()
should choose k random points from the data (without replacement) to use as initial cluster means.
For this part of the assignment, since clustering is best used on multidimensional data, we will be using the color image bird_color_24.png
.
You can test your implementation of k-means using our reference images in k_means_test()
.
from random import randint from functools import reduce def k_means_cluster(image_values, k=3, initial_means=None): """ Separate the provided RGB values into k separate clusters using the k-means algorithm, then return an updated version of the image with the original values replaced with the corresponding cluster values. params: image_values = numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] k = int initial_means = numpy.ndarray[numpy.ndarray[float]] or None returns: updated_image_values = numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] """ # TODO: finish this function raise NotImplementedError() return updated_image_values
def k_means_test(): """ Testing your implementation of k-means on the segmented bird_color_24 reference images. """ k_min = 2 k_max = 6 image_dir = 'images/' image_name = 'bird_color_24.png' image_values = image_to_matrix(image_dir + image_name) # initial mean for each k value initial_means = [ np.array([[0.90980393,0.8392157,0.65098041],[0.83137256,0.80784315,0.69411767]]), np.array([[0.90980393,0.8392157,0.65098041],[0.83137256,0.80784315,0.69411767],[0.67450982,0.52941179,0.25490198]]), np.array([[0.90980393,0.8392157,0.65098041],[0.83137256,0.80784315,0.69411767],[0.67450982,0.52941179,0.25490198],[0.86666667,0.8392157,0.70588237]]), np.array([[0.90980393,0.8392157,0.65098041],[0.83137256,0.80784315,0.69411767],[0.67450982,0.52941179,0.25490198],[0.86666667,0.8392157,0.70588237],[0,0,0]]), np.array([[0.90980393,0.8392157,0.65098041],[0.83137256,0.80784315,0.69411767],[0.67450982,0.52941179,0.25490198],[0.86666667,0.8392157,0.70588237],[0,0,0],[0.8392157,0.80392158,0.63921571]]), ] # test different k values to find best for k in range(k_min, k_max+1): updated_values = k_means_cluster(image_values, k, initial_means[k-k_min]) ref_image = image_dir + 'k%d_%s'%(k, image_name) ref_values = image_to_matrix(ref_image) dist = image_difference(updated_values, ref_values) print('Image distance = %.2f'%(dist)) if(int(dist) == 0): print('Clustering for %d clusters produced a realistic image segmentation.'%(k)) else: print('Clustering for %d clusters didn\'t produce a realistic image segmentation.'%(k))
k_means_test()
40 pts
Next, we will step beyond clustering and implement a complete Gaussian mixture model.
Complete the below implementation of GaussianMixtureModel
so that it can perform the following:
GaussianMixtureModel.default_convergence()
: if the new likelihood is within 10% of the previous likelihood for 10 consecutive iterations, the model has converged.We have provided the necessary tests for this part.
The log form of the Gaussian probability of scalar value x is:
ln(N(x|μ,σ))=−0.5ln(2πσ2)−(x−μ)22σ2
where μ is the mean and σ is standard deviation.
You can calculate the sum of log probabilities by using scipy.misc.logsumexp()
. For example, logsumexp([-2,-3])
will return the same result as numpy.log(numpy.exp(-2)+numpy.exp(-3))
.
In other words, logsumexp(a, b) = log(e^a + e^b)
.
numpy.array
arrays.You can instantiate them using the command
matrix = numpy.zeros([rows, columns])
where rows
is the number of rows and columns
is the number of columns in your matrix. numpy.zeros()
generates a matrix of the specified size containing 0s at each row/column cell. You can access cells with the syntax matrix[2,3]
which will return the value in row 2 and column 3.
For the assignment, focus on vectorizing the following:
Remember, these are fundamental operations and will be called a lot in the remainder of the assignment. So it is crucial you optimize these.
For the synthetic data test which we provide to check if your training is working, the set is too small and it won't make a difference. But with the actual image that we use ahead, for-loops won't do good. Vectorized code would take under 30 seconds to converge which would typically involve about 15-20 iterations with the convergence function we have here. Inefficient code that uses loops or iterates over each pixel value sequentially, will take hours to run. You don't want to do that because:
def default_convergence(prev_likelihood, new_likelihood, conv_ctr, conv_ctr_cap=10): """ Default condition for increasing convergence counter: new likelihood deviates less than 10% from previous likelihood. params: prev_likelihood = float new_likelihood = float conv_ctr = int conv_ctr_cap = int returns: conv_ctr = int converged = boolean """ increase_convergence_ctr = (abs(prev_likelihood) * 0.9 < abs(new_likelihood) < abs(prev_likelihood) * 1.1) if increase_convergence_ctr: conv_ctr+=1 else: conv_ctr =0 return conv_ctr, conv_ctr > conv_ctr_cap
from random import randint import math from scipy.misc import logsumexp class GaussianMixtureModel: """ A Gaussian mixture model to represent a provided grayscale image. """ def __init__(self, image_matrix, num_components, means=None): """ Initialize a Gaussian mixture model. params: image_matrix = (grayscale) numpy.nparray[numpy.nparray[float]] num_components = int """ self.image_matrix = image_matrix self.num_components = num_components if(means is None): self.means = [0]*num_components else: self.means = means self.variances = [0]*num_components self.mixing_coefficients = [0]*num_components def joint_prob(self, val): """Calculate the joint log probability of a greyscale value within the image. params: val = float returns: joint_prob = float """ # TODO: finish this raise NotImplementedError() return joint_prob def initialize_training(self): """ Initialize the training process by setting each component mean to a random pixel's value (without replacement), each component variance to 1, and each component mixing coefficient to a uniform value (e.g. 4 components -> [0.25,0.25,0.25,0.25]). NOTE: this should be called before train_model() in order for tests to execute correctly. """ # TODO: finish this raise NotImplementedError() def train_model(self, convergence_function=default_convergence): """ Train the mixture model using the expectation-maximization algorithm. Since each Gaussian is a combination of mean and variance, this will fill self.means and self.variances, plus self.mixing_coefficients, with the values that maximize the overall model likelihood. params: convergence_function = function that returns True if convergence is reached """ # TODO: finish this raise NotImplementedError() def segment(self): """ Using the trained model, segment the image matrix into the pre-specified number of components. Returns the original image matrix with the each pixel's intensity replaced with its max-likelihood component mean. returns: segment = numpy.ndarray[numpy.ndarray[float]] """ # TODO: finish this raise NotImplementedError() return segment def likelihood(self): """Assign a log likelihood to the trained model based on the following formula for posterior probability: ln(Pr(X | mixing, mean, stdev)) = sum((n=1 to N),ln(sum((k=1 to K), mixing_k * N(x_n | mean_k, stdev_k) ))) returns: log_likelihood = float """ # TODO: finish this raise NotImplementedError() return log_likelihood def best_segment(self, iters): """Determine the best segmentation of the image by repeatedly training the model and calculating its likelihood. Return the segment with the highest likelihood. params: iters = int returns: segment = numpy.ndarray[numpy.ndarray[float]] """ # finish this raise NotImplementedError() return segment
def gmm_likelihood_test(): """Testing the GMM method for calculating the overall model probability. Should return -364370. returns: likelihood = float """ image_file = 'images/party_spock.png' image_matrix = image_to_matrix(image_file) num_components = 5 gmm = GaussianMixtureModel(image_matrix, num_components) gmm.initialize_training() gmm.means = [0.4627451, 0.10196079, 0.027450981, 0.011764706, 0.1254902] likelihood = gmm.likelihood() return likelihood
gmm_likelihood_test()
def gmm_joint_prob_test(): """Testing the GMM method for calculating the joint log probability of a given point. Should return -0.98196. returns: joint_prob = float """ image_file = 'images/party_spock.png' image_matrix = image_to_matrix(image_file) num_components = 5 gmm = GaussianMixtureModel(image_matrix, num_components) gmm.initialize_training() gmm.means = [0.4627451, 0.10196079, 0.027450981, 0.011764706, 0.1254902] test_val = 0.4627451 joint_prob = gmm.joint_prob(0.4627451) return joint_prob
gmm_joint_prob_test()
def generate_test_mixture(data_size, means, variances, mixing_coefficients): """ Generate synthetic test data for a GMM based on fixed means, variances and mixing coefficients. params: data_size = (int) means = [float] variances = [float] mixing_coefficients = [float] returns: data = np.array[float] """ data = np.zeros(data_size).flatten() indices = np.random.choice( len(means), len(data), p=mixing_coefficients) for i in range(len(indices)): data[i] = np.random.normal(means[indices[i]], variances[indices[i]]) return np.array([data])
def gmm_train_test(): """Test the training procedure for GMM using synthetic data. returns: gmm = GaussianMixtureModel """ print( 'Synthetic example with 2 means') num_components = 2 data_range = (1,1000) actual_means = [2, 4] actual_variances = [1]*num_components actual_mixing = [.5]*num_components dataset_1 = generate_test_mixture(data_range, actual_means, actual_variances, actual_mixing) gmm = GaussianMixtureModel(dataset_1, num_components) gmm.initialize_training() # start off with faulty means gmm.means = [1,3] initial_likelihood = gmm.likelihood() gmm.train_model() final_likelihood = gmm.likelihood() likelihood_difference = final_likelihood - initial_likelihood likelihood_thresh = 250 if(likelihood_difference >= likelihood_thresh): print('Congrats! Your model\'s log likelihood improved by at least %d.'%(likelihood_thresh)) print( 'Synthetic example with 4 means:') num_components = 4 actual_means = [2,4,6,8] actual_variances = [1]*num_components actual_mixing = [.25]*num_components dataset_1 = generate_test_mixture(data_range, actual_means, actual_variances, actual_mixing) gmm = GaussianMixtureModel(dataset_1, num_components) gmm.initialize_training() # start off with faulty means gmm.means = [1,3,5,9] initial_likelihood = gmm.likelihood() gmm.train_model() final_likelihood = gmm.likelihood() # compare likelihoods likelihood_difference = final_likelihood - initial_likelihood likelihood_thresh = 200 if(likelihood_difference >= likelihood_thresh): print('Congrats! Your model\'s log likelihood improved by at least %d.'%(likelihood_thresh)) return gmm
gmm_train_test()
def gmm_segment_test(): """ Apply the trained GMM to unsegmented image and generate a segmented image. returns: segmented_matrix = numpy.ndarray[numpy.ndarray[float]] """ image_file = 'images/party_spock.png' image_matrix = image_to_matrix(image_file) num_components = 3 gmm = GaussianMixtureModel(image_matrix, num_components) gmm.initialize_training() gmm.train_model() segment = gmm.segment() segment_num_components = len(np.unique(segment)) if(segment_num_components == num_components): print('Congrats! Your segmentation produced an image '+ 'with the correct number of components.') return segment
def gmm_best_segment_test(): """ Calculate the best segment generated by the GMM and compare the subsequent likelihood of a reference segmentation. Note: this test will take a while to run. returns: best_seg = np.ndarray[np.ndarray[float]] """ image_file = 'images/party_spock.png' image_matrix = image_to_matrix(image_file) image_matrix_flat = flatten_image_matrix(image_matrix) num_components = 3 gmm = GaussianMixtureModel(image_matrix, num_components) gmm.initialize_training() iters = 10 # generate best segment from 10 iterations # and extract its likelihood best_seg = gmm.best_segment(iters) matrix_to_image(best_seg, 'images/best_segment_spock.png') best_likelihood = gmm.likelihood() # extract likelihood from reference image ref_image_file = 'images/party_spock%d_baseline.png'%(num_components) ref_image = image_to_matrix(ref_image_file, grays=True) gmm_ref = GaussianMixtureModel(ref_image, num_components) ref_vals = ref_image.flatten() ref_means = list(set(ref_vals)) ref_variances = [0]*num_components ref_mixing = [0]*num_components for i in range(num_components): relevant_vals = ref_vals[ref_vals==ref_means[i]] ref_mixing[i] = float(len(relevant_vals)) / float(len(ref_vals)) ref_variances[i] = np.mean((image_matrix_flat[ref_vals==ref_means[i]] - ref_means[i])**2) gmm_ref.means = ref_means gmm_ref.variances = ref_variances gmm_ref.mixing_coefficients = ref_mixing ref_likelihood = gmm_ref.likelihood() # compare best likelihood and reference likelihood likelihood_diff = best_likelihood - ref_likelihood likelihood_thresh = 1e4 if(likelihood_diff >= likelihood_thresh): print('Congrats! Your image segmentation is an improvement over ' + 'the baseline by at least %.2f.'%(likelihood_thresh)) return best_seg
best_segment = gmm_best_segment_test() matrix_to_image(best_segment, 'best_segment.png')
20 points
We'll now experiment with a few methods for improving GMM performance.
12.5 points
To run EM in our baseline Gaussian mixture model, we use random initialization to determine the initial values for our component means. We can do better than this!
Fill in the below GaussianMixtureModelImproved.initialize_training()
with an improvement in component initialization. Please don't use any external packages for anything other than basic calculations (e.g. scipy.misc.logsumexp
). Note that your improvement might significantly slow down runtime, although we don't expect you to spend more than 10 minutes on initialization.
Hint: you'll probably want an unsupervised learning method to initialize your component means. Clustering is one useful example of unsupervised learning, and you may want to look at 1-dimensional methods such as Jenks natural breaks optimization.
class GaussianMixtureModelImproved(GaussianMixtureModel): """A Gaussian mixture model for a provided grayscale image, with improved training performance.""" def initialize_training(self): """ Initialize the training process by setting each component mean using some algorithm that you think might give better means to start with, each component variance to 1, and each component mixing coefficient to a uniform value (e.g. 4 components -> [0.25,0.25,0.25,0.25]). [You can feel free to modify the variance and mixing coefficient initializations too if that works well.] """ # TODO: finish this raise NotImplementedError()
def gmm_improvement_test(): """ Tests whether the new mixture model is actually an improvement over the previous one: if the new model has a higher likelihood than the previous model for the provided initial means. returns: original_segment = numpy.ndarray[numpy.ndarray[float]] improved_segment = numpy.ndarray[numpy.ndarray[float]] """ image_file = 'images/party_spock.png' image_matrix = image_to_matrix(image_file) num_components = 3 initial_means = [0.4627451, 0.20392157, 0.36078432] # first train original model with fixed means gmm = GaussianMixtureModel(image_matrix, num_components) gmm.initialize_training() gmm.means = np.copy(initial_means) gmm.train_model() original_segment = gmm.segment() original_likelihood = gmm.likelihood() # then train improved model gmm_improved = GaussianMixtureModelImproved(image_matrix, num_components) gmm_improved.initialize_training() gmm_improved.train_model() improved_segment = gmm_improved.segment() improved_likelihood = gmm_improved.likelihood() # then calculate likelihood difference diff_thresh = 1e3 likelihood_diff = improved_likelihood - original_likelihood if(likelihood_diff >= diff_thresh): print('Congrats! Improved model scores a likelihood that was at ' + 'least %d higher than the original model.'%(diff_thresh)) return original_segment, improved_segment
best_segment, best_segment_improved = gmm_improvement_test() matrix_to_image(best_segment, 'best_segment_original.png') matrix_to_image(best_segment_improved, 'best_segment_improved.png')
7.5 points
You might be skeptical of the convergence criterion we've provided in default_convergence()
. To test out another convergence condition, implement new_convergence_condition()
to return true if all the new model parameters (means, variances, and mixing coefficients) are within 10% of the previous variables for 10 consecutive iterations. This will mean re-implementing train_model()
, which you will also do below in GaussianMixtureModelConvergence
.
You can compare the two convergence functions in convergence_condition_test()
.
def new_convergence_function(previous_variables, new_variables, conv_ctr, conv_ctr_cap=10): """ Convergence function based on parameters: when all variables vary by less than 10% from the previous iteration's variables, increase the convergence counter. params: previous_variables = [numpy.ndarray[float]] containing [means, variances, mixing_coefficients] new_variables = [numpy.ndarray[float]] containing [means, variances, mixing_coefficients] conv_ctr = int conv_ctr_cap = int return: conv_ctr = int converged = boolean """ # TODO: finish this function raise NotImplementedError() return conv_ctr, converged
class GaussianMixtureModelConvergence(GaussianMixtureModel): """ Class to test the new convergence function in the same GMM model as before. """ def train_model(self, convergence_function=new_convergence_function): # TODO: finish this function raise NotImplementedError()
def convergence_condition_test(): """ Compare the performance of the default convergence function with the new convergence function. return: default_convergence_likelihood = float new_convergence_likelihood = float """ image_file = 'images/party_spock.png' image_matrix = image_to_matrix(image_file) num_components = 3 initial_means = [0.4627451, 0.10196079, 0.027450981] # first test original model gmm = GaussianMixtureModel(image_matrix, num_components) gmm.initialize_training() gmm.means = np.copy(initial_means) gmm.train_model() default_convergence_likelihood = gmm.likelihood() # now test new convergence model gmm_new = GaussianMixtureModelConvergence(image_matrix, num_components) gmm_new.initialize_training() gmm_new.means = np.copy(initial_means) gmm_new.train_model() new_convergence_likelihood = gmm_new.likelihood() # test convergence difference convergence_diff = new_convergence_likelihood - default_convergence_likelihood convergence_thresh = 200 if(convergence_diff >= convergence_thresh): print('Congrats! The likelihood difference between the original ' + 'and the new convergence models should be at least %.2f'%(convergence_thresh)) return default_convergence_likelihood, new_convergence_likelihood
20 points
In our previous solutions, our only criterion for choosing a model was whether it maximizes the posterior likelihood regardless of how many parameters this requires. As a result, the "best" model may simply be the model with the most parameters, which would be overfit to the training data.
To avoid overfitting, we can use the Bayesian information criterion (a.k.a. BIC) which penalizes models based on the number of parameters they use. In the case of the Gaussian mixture model, this is equal to the number of components times the number of variables per component (mean, variance and mixing coefficient) = 3*components.
5 points
Implement bayes_info_criterion()
to calculate the BIC of a trained GaussianMixtureModel
.
def bayes_info_criterion(gmm): # TODO: finish this function raise NotImplementedError() return BIC
def bayes_info_test(): """ Test for your implementation of BIC on fixed GMM values. Should be about 727045. returns: BIC = float """ image_file = 'images/party_spock.png' image_matrix = image_to_matrix(image_file) num_components = 3 initial_means = [0.4627451, 0.10196079, 0.027450981] gmm = GaussianMixtureModel(image_matrix, num_components) gmm.initialize_training() gmm.means = np.copy(initial_means) BIC = bayes_info_criterion(gmm) return BIC
15 points
Now implement BIC_model_test()
, in which you will use the BIC and likelihood to determine the optimal number of components in the Party Spock image. Use the original GaussianMixtureModel
for your models. Iterate from k=2 to k=7 and use the provided means to train a model that minimizes its BIC and a model that maximizes its likelihood.
Then, fill out BIC_likelihood_question()
to return the number of components in both the min-BIC and the max-likelihood model.
def BIC_likelihood_model_test(): """Test to compare the models with the lowest BIC and the highest likelihood. returns: min_BIC_model = GaussianMixtureModel max_likelihood_model = GaussianMixtureModel """ # TODO: finish this method raise NotImplementedError() comp_means = [ [0.023529412, 0.1254902], [0.023529412, 0.1254902, 0.20392157], [0.023529412, 0.1254902, 0.20392157, 0.36078432], [0.023529412, 0.1254902, 0.20392157, 0.36078432, 0.59215689], [0.023529412, 0.1254902, 0.20392157, 0.36078432, 0.59215689, 0.71372563], [0.023529412, 0.1254902, 0.20392157, 0.36078432, 0.59215689, 0.71372563, 0.964706] ] return min_BIC_model, max_likelihood_model
def BIC_likelihood_question(): """ Choose the best number of components for each metric (min BIC and maximum likelihood). returns: pairs = dict """ # TODO: fill in bic and likelihood raise NotImplementedError() bic = 0 likelihood = 0 pairs = { 'BIC' : bic, 'likelihood' : likelihood } return pairs
2 points
A crucial part of machine learning is working with very large datasets. As stated before, using for loops over these datasets will result in the code taking many hours, or even several days, to run. Even vectorization can take time if not done properly, and as such there are certain tricks you can perform to get your code to run as fast as physically possible.
For this part of the assignment, you will need to implement part of a k-Means algorithm. You are given two arrays - points_array
with X n-dimensional points, and means_array
with Y n-dimensional points. You will need to return an X x Y array containing the distances from each point in points_array
to each point in means_array
.
If your implementation returns the correct answer in time comparable to Murtaza's implementation, you will receive 2 bonus points.
For reference, the data used is in the order of thousands of points and hundreds of means, and Bonnie automatically kills a grading script that takes more than 500MB. So please test accordingly locally before submitting, as you may lose a submission for an inefficient solution. It is very likely that you could run out of memory if your implementation is inefficient.
def bonus(points_array, means_array): """ Return the distance from every point in points_array to every point in means_array. returns: dists = numpy array of float """ # TODO: fill in the bonus function # REMOVE THE LINE BELOW IF ATTEMPTING BONUS raise NotImplementedError() return dists
def bonus_test(): points = np.array([[ 0.9059608,0.67550357,0.13525533],[ 0.23656114,0.63624466,0.3606615 ],[ 0.91163215,0.24431103,0.33318504],[ 0.25209736,0.24600123,0.42392935],[ 0.62799146,0.04520208,0.55232494],[ 0.5588561, 0.06397713,0.53465371],[ 0.82530045,0.62811624,0.79672349],[ 0.50048147,0.13215356,0.54517893],[ 0.84725662,0.71085917,0.61111105],[ 0.25236734,0.25951904,0.70239158]]) means = np.array([[ 0.39874413,0.47440682,0.86140829],[ 0.05671347,0.26599323,0.33577454],[ 0.7969679, 0.44920099,0.37978416],[ 0.45428452,0.51414022,0.21209852],[ 0.7112214, 0.94906158,0.25496493]]) expected_answer = np.array([[ 0.90829883,0.9639127, 0.35055193,0.48575144,0.35649377],[ 0.55067427,0.41237201,0.59110637,0.29048911,0.57821151],[ 0.77137409,0.8551975, 0.23937264,0.54464354,0.73685561],[ 0.51484192,0.21528078,0.58320052,0.39705222,0.85652654],[ 0.57645778,0.64961631,0.47067874,0.60483973,0.95515036],[ 0.54850426,0.57663736,0.47862222,0.56358129,0.94064631],[ 0.45799673,0.966609,0.45458971,0.70173336,0.63993928],[ 0.47695785,0.50861901,0.46451987,0.50891112,0.89217387],[ 0.56543953,0.94798437,0.35285421,0.59357932,0.4495398 ],[ 0.30477736,0.41560848,0.66079087,0.58820896,0.94138546]]) if np.allclose(expected_answer,bonus(points,means),1e-7): print 'You returned the correct distances.' else: print 'Your distance calculation is incorrect.' bonus_test()
You're done with the requirements! Hope you have completed the functions in the mixture_models.py file and tested everything!