Solved Math564 Predicting Happiness!

TO FIND THE NEXT OR DIFFERENT PROJECT CLICK ON THE SEARCH BUTTON ON THE TOP RIGHT MENU AND SEARCH USING COURSE CODE OR PROJECT TITLE.

Starting from:

~~$30~~

$19.50

The city of Somerville, Massachusetts (a suburb of Boston) regularly asks the residents to complete a survey regarding the quality of life as can be influenced by city policy and infrastructure. The survey also asks the residents how happy they are living in Somerville. Recently, the survey has expanded to include more questions and more demographic data. We will consider a subset of the data to see if survey question responses can be used to predict the happiness score. The full data set can be obtained from the city website1 . The data subset we will consider is provided on the course canvas page as file happiness.csv. This data set contains the 3669 survey responses from the years 2019 through 2023. The data file has 10 columns (labeled A through J) with the following descriptions: A : Ward/Neighborhood B : Happiness C : Beauty of Neighborhood D : Convenience of Getting Around E : Housing Condition F : Street and Sidewalk Maintenance G : Public Schools H : Police Department I : Community Events J : City Services Information The data in each column is a positive integer with the following descriptions. column description A categorical ward designation with values 1–7 B happiness index: 1=very unhappy, 2=unhappy, 3 = neutral, 4=happy, 5=very happy C–J satisfaction index: 1=very unsatisfied, 2=unsatisfied, 3=neutral, 4=satisfied, 5=very satisfied We consider a classification problem in which happiness (value in column B) can be predicted from the satisfaction (values in columns C through J). The goal of this project is to construct and test one or more “feed-forward neural networks” (FFNN) that can be used to address the classification problem. A FFNN is an example of a classifier function yk = f(xk;w) where xk ∈ R p is a feature vector of satisfaction values associated with person k, yk ∈ R is the happiness value for person k, and w ∈ R m is a parameter vector. That is, we assume that a good classifier is among the family of functions specified by a particular parameter choice. We wish to solve min w F(w) = 1 2 Xn k=1 (yk − f(xk;w))2 . A FFNN can be represented as f(x) = σ (Wdσ (· · · σ (W2σ (W1x))· · ·)), 1https://www.somervillema.gov/HappinessSurvey P2-1 where Wj is a rj × cj matrix (with rj = p, cj = rj+1, and cd = 1) and σ is the (elementwise) sigmoid function σ(u) = 1 1 + exp(−u) . The parameters of the function are contained in the weight matrices W1, . . . , Wd. That is, m = Pd j=1(rj cj ). The gradient of f can be efficiently computed using the method of backpropagation. Let Gj be the matrix of gradient vector values associated elementwise with the weights in Wj . Then, we have the forward computation: L1 = σ(W1x) L2 = σ(W2L1) . . . Ld = σ(WdLd−1) F = 1 2 ∥Ld − y∥ 2 and the gradient computation (using the multivariate chain rule): hd = (Ld − y)Ld(1 − Ld), Gd = hdL T d−1 hd−1 = WT d hdLd−1(1 − Ld−1), Gd−1 = hd−1L T d−2 . . . h2 = WT 3 h3L2(1 − L2), G2 = h2L T 1 h1 = WT 2 h2L1(1 − L1), G1 = h1x T In this derivation, we have used the fact that σ ′ (u) = σ(u)(1 − σ(u)). Constructing the problem in terms of matrices Wj and Gj provides convenient notation. However, it must be remembered that the parameter vector w ∈ R m is constructed from the collective and ordered elements of W1, W2, . . . , Wd. Similarly, the gradient vector g ∈ R m is constructed from the identically ordered elements of G1, G2, . . . , Gd. One interesting point is that if one employs the identity function instead of the sigmoid (or other nonlinear monotonic function) then the entire transformation is linear with both forward and backward computations collapsing into a single matrix product. It can be helpful to visualize the FFNN as a computational directed acyclic graph. The forward computation proceeds left to right. P2-2 x W1x L1 W2L1 L2 W3L2 L3 F Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer Loss Function In this example, the network contains two hidden layers and uses three weight matrices. W1 is a 4 × 5 matrix, W2 is a 3 × 4 matrix, and W3 is a 1 × 3 matrix. The computations within each layer are the elementwise sigmoid “activation” functions. The input x can be a single vector of (five, in this example) attributes, or it can be an array (five by m) of attributes of m individuals. Remember that the output F ∈ (0, 1). So, if the output is to distinguish between h ordered possibilities, then exact match can be assigned values as 1 h+1 , 2 h+1 , . . . , h h+1 . This choice leads to stability in weight determination – F is never driven to values of zero or one. Complete the following tasks. 1. Construct a function which reads the Somerville data set, fills in missing data with justification, and normalizes the data appropriately. 2. Construct a function that computes – for a general, user-specified FFNN – the loss function for a classification problem and the (backpropagation) gradient. That is, construct a function for which the user selects the number and size of hidden layers and the number of inputs. 3. Choose a method for selecting training data and test data sets. 4. Solve the classification problem using a FFNN which includes all eight input quantities (satisfaction values), two hidden layers, of 12 and 10 nodes, and a single output layer. Test various optimization methods using the code you have developed. 5. Explore other choices of FFNN construction. P2-3