{"id":3261,"date":"2021-02-23T14:00:14","date_gmt":"2021-02-23T14:00:14","guid":{"rendered":"https:\/\/thenextweb.com\/?p=1340172"},"modified":"2021-02-23T14:00:14","modified_gmt":"2021-02-23T14:00:14","slug":"how-to-turn-your-dogs-nap-time-into-a-regularized-linear-model","status":"publish","type":"post","link":"https:\/\/www.londonchiropracter.com\/?p=3261","title":{"rendered":"How to turn your dog\u2019s nap time into a regularized linear model"},"content":{"rendered":"\n<p id=\"d82a\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Looking at this nap duration model,<span>&nbsp;<\/span><em class=\"mb\">Beta 0<\/em><span>&nbsp;<\/span>is the<span>&nbsp;<\/span><strong class=\"jx gr\">intercept<\/strong>, the value the target takes when all features are equal to zero.<\/p>\n<p id=\"0dbe\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>The remaining betas are the unknown<span>&nbsp;<\/span><strong class=\"jx gr\">coefficients<\/strong><span>&nbsp;<\/span>which, along with the intercept, are the missing pieces of the model. You can observe the outcome of the combination of the different features, but you don\u2019t know all the details about how each feature impacts the target.<\/p>\n<p id=\"960e\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Once you determine the value for each coefficient you know the direction, either positive or negative, and the magnitude of the impact each feature has in target.<\/p>\n<p id=\"7b9e\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>With a linear model, you\u2019re assuming all features are independent of each other so, for instance, the fact that you got a delivery doesn\u2019t have any impact on how many treats your dog gets in a day.<\/p>\n<p id=\"b83b\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Additionally, you think there\u2019s a linear relationship between the features and the target.<\/p>\n<p id=\"6dd0\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>So, on the days you get to play more with your dog they\u2019ll get more tired and will want to nap for longer. Or, on days when there are no squirrels outside your dog won\u2019t need to nap as much, because they didn\u2019t spend as much energy staying alert and keeping an eye on the squirrels\u2019 every move.<\/p>\n<h2 class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\">For how long will your dog nap tomorrow?<\/h2>\n<p id=\"fa12\" class=\"jv jw gg jx b hp lw jz ka hs lx kc kd ke ly kg kh ki lz kk kl km ma ko kp kq fz hn\" data-selectable-paragraph>With the general idea of the model in your mind, you collected data for a few days. Now you have real observations of the features and the target of your model.<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"nj nk am nl v nm\" tabindex=\"0\" role=\"button\">\n<div class=\"fs ft no\">\n<div class=\"jk s am jl\">\n<div class=\"np jn s\">\n<figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/2788\/1*NglMC1S_uGLfDL9zKrR2JQ.jpeg\" alt=\"Image for post\" width=\"1394\" height=\"524\" data-lazy=\"true\"><figcaption><a href=\"https:\/\/thenextweb.com\/neural\/2021\/02\/23\/turn-your-dog-nap-time-into-regularized-linear-model-syndication\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fneural%2F2021%2F02%2F23%2Fturn-your-dog-nap-time-into-regularized-linear-model-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: A few days\u2019 worth of observations of features and targets for your dog\u2019s nap duration.\" data-title=\"Share A few days\u2019 worth of observations of features and targets for your dog\u2019s nap duration. on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share A few days\u2019 worth of observations of features and targets for your dog\u2019s nap duration. on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>A few days\u2019 worth of observations of features and targets for your dog\u2019s nap duration.<\/figcaption><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"5512\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>But there are still a few critical pieces missing, the coefficient values and the intercept.<\/p>\n<p id=\"cc44\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>One of the most popular methods to find the coefficients of a linear model is<span>&nbsp;<\/span><a class=\"eh kr\" href=\"https:\/\/en.wikipedia.org\/wiki\/Ordinary_least_squares\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">Ordinary Least Squares<\/a>.<\/p>\n<p id=\"df66\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>The premise of Ordinary Least Squares (OLS) is that you\u2019ll pick the coefficients that minimize the<span>&nbsp;<\/span><a class=\"eh kr\" href=\"https:\/\/en.wikipedia.org\/wiki\/Residual_sum_of_squares\" rel=\"nofollow noopener noreferrer\" target=\"_blank\"><strong class=\"jx gr\">residual sum of squares<\/strong><\/a>, i.e., the total squared difference between your predictions and the observed data[1].<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"fs ft nq\">\n<div class=\"jk s am jl\">\n<div class=\"nr jn s\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/934\/1*D_NEmi62E_UYf240_rGtQg.jpeg\" sizes=\"467px\" alt=\"Image for post\" width=\"467\" height=\"211\" data-lazy=\"true\" data-srcset=\"https:\/\/miro.medium.com\/max\/552\/1*D_NEmi62E_UYf240_rGtQg.jpeg 276w, https:\/\/miro.medium.com\/max\/934\/1*D_NEmi62E_UYf240_rGtQg.jpeg 467w\"><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"fec6\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>With the residual sum of squares, not all residuals are treated equally. You want to make an example of the times when the model generated predictions too far off from the observed values.<\/p>\n<p id=\"394b\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>It\u2019s not so much about the prediction being too far off above or below the observed value, but the magnitude of the error. You square the residuals and penalize the predictions that are too far off while making sure you\u2019re only dealing with positive values.<\/p>\n<blockquote class=\"mc\" readability=\"8\">\n<p id=\"84cd\" class=\"md me gg az mf mg mh mi mj mk ml kq dt\" data-selectable-paragraph>With residual sum of squares, it\u2019s not so much about the prediction being too far above or below the observed value, but the magnitude of that error.<\/p>\n<\/blockquote>\n<p id=\"4059\" class=\"jv jw gg jx b hp mm jz ka hs mn kc kd ke mo kg kh ki mp kk kl km mq ko kp kq fz hn\" data-selectable-paragraph>This way when RSS is zero it really means prediction and observed values are equal, and it\u2019s not just the bi-product of arithmetic.<\/p>\n<p id=\"4945\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>In python, you can use<span>&nbsp;<\/span><a class=\"eh kr\" href=\"https:\/\/scikit-learn.org\/stable\/modules\/linear_model.html\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">ScikitLearn<\/a><span>&nbsp;<\/span>to fit a linear model to the data using Ordinary Least Squares.<\/p>\n<p id=\"7d31\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Since you want to test the model with data it was not trained on, you want to hold out a percentage of your original dataset, into a test set. In this case, the test dataset sets aside 20% of the original dataset at random.<\/p>\n<pre class=\"nc nd ne nf ng ns nt nu\"><figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1340176 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.56.38.png\" alt width=\"884\" height=\"996\" sizes=\"(max-width: 884px) 100vw, 884px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.56.38.png 884w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.56.38-186x210.png 186w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.56.38-240x270.png 240w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.56.38-120x135.png 120w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.56.38-796x897.png 796w\"><\/figure><figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-1340177 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.57.14.png\" alt width=\"886\" height=\"926\" sizes=\"(max-width: 886px) 100vw, 886px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.57.14.png 890w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.57.14-201x210.png 201w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.57.14-258x270.png 258w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.57.14-129x135.png 129w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-12.57.14-796x832.png 796w\"><\/figure><\/pre>\n<p id=\"c972\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>After fitting a linear model to the training set, you can check its characteristics.<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"fs ft of\">\n<div class=\"jk s am jl\">\n<div class=\"og jn s\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/1136\/1*ioZ3MpulZ2QoPx8YOX1pTg.jpeg\" sizes=\"568px\" alt=\"Image for post\" width=\"568\" height=\"206\" data-lazy=\"true\" data-srcset=\"https:\/\/miro.medium.com\/max\/552\/1*ioZ3MpulZ2QoPx8YOX1pTg.jpeg 276w, https:\/\/miro.medium.com\/max\/1104\/1*ioZ3MpulZ2QoPx8YOX1pTg.jpeg 552w, https:\/\/miro.medium.com\/max\/1136\/1*ioZ3MpulZ2QoPx8YOX1pTg.jpeg 568w\"><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"848f\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>The coefficients and the intercept are the last pieces you needed to define your model and make predictions. The coefficients in the output array follow the order of the features in the dataset, so your model can be written as:<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"nj nk am nl v nm\" tabindex=\"0\" role=\"button\">\n<div class=\"fs ft oh\">\n<div class=\"jk s am jl\">\n<div class=\"oi jn s\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/2494\/1*BsKc_S0-a3ptUdeISX8UPA.jpeg\" sizes=\"700px\" alt=\"Image for post\" width=\"1247\" height=\"48\" data-lazy=\"true\" data-srcset=\"https:\/\/miro.medium.com\/max\/552\/1*BsKc_S0-a3ptUdeISX8UPA.jpeg 276w, https:\/\/miro.medium.com\/max\/1104\/1*BsKc_S0-a3ptUdeISX8UPA.jpeg 552w, https:\/\/miro.medium.com\/max\/1280\/1*BsKc_S0-a3ptUdeISX8UPA.jpeg 640w, https:\/\/miro.medium.com\/max\/1400\/1*BsKc_S0-a3ptUdeISX8UPA.jpeg 700w\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"adc6\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>It\u2019s also useful to compute a few metrics to evaluate the quality of the model.<\/p>\n<p id=\"bf3d\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph><a class=\"eh kr\" href=\"https:\/\/en.wikipedia.org\/wiki\/Coefficient_of_determination\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">R-squared<\/a>, also called the coefficient of determination, gives a sense of how good the model is at describing the patterns in the training data, and has values ranging from 0 to 1. It shows how much of the variability in the target is explained by the features[1].<\/p>\n<p id=\"b4d4\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>For instance, if you\u2019re fitting a linear model to the data but there\u2019s no linear relationship between target and features, R-squared is going to be very close to zero.<\/p>\n<p id=\"99e0\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Bias and variance are metrics that help balance the two sources of error a model can have:<\/p>\n<ul class>\n<li id=\"c9d6\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq ms kt ku hn\" data-selectable-paragraph>Bias relates to the training error, i.e., the error from predictions on the training set.<\/li>\n<li id=\"ef05\" class=\"jv jw gg jx b hp kv jz ka hs kw kc kd ke kx kg kh ki ky kk kl km kz ko kp kq ms kt ku hn\" data-selectable-paragraph>Variance relates to the generalization error, the error from predictions on the test set.<\/li>\n<\/ul>\n<p id=\"08ff\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>This linear model has a relatively high variance. Let\u2019s use regularization to reduce the variance while trying to keep bias a low as possible.<\/p>\n<h2>Model regularization<\/h2>\n<p id=\"f69a\" class=\"jv jw gg jx b hp lw jz ka hs lx kc kd ke ly kg kh ki lz kk kl km ma ko kp kq fz hn\" data-selectable-paragraph>Regularization is a set of techniques that improve a linear model in terms of:<\/p>\n<ul class>\n<li id=\"0f84\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq ms kt ku hn\" data-selectable-paragraph>Prediction accuracy, by reducing the variance of the model\u2019s predictions.<\/li>\n<li id=\"08d6\" class=\"jv jw gg jx b hp kv jz ka hs kw kc kd ke kx kg kh ki ky kk kl km kz ko kp kq ms kt ku hn\" data-selectable-paragraph>Interpretability, by<span>&nbsp;<\/span><em class=\"mb\">shrinking<\/em><span>&nbsp;<\/span>or reducing to zero the coefficients that are not as relevant to the model[2].<\/li>\n<\/ul>\n<p id=\"6b92\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>With Ordinary Least Squares you want to minimize the Residual Sum of Squares (RSS).<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"fs ft oj\">\n<div class=\"jk s am jl\">\n<div class=\"ok jn s\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/780\/1*CuBJjoqYh8bzq9xsI0_Lyg.jpeg\" sizes=\"390px\" alt=\"Image for post\" width=\"390\" height=\"60\" data-lazy=\"true\" data-srcset=\"https:\/\/miro.medium.com\/max\/552\/1*CuBJjoqYh8bzq9xsI0_Lyg.jpeg 276w, https:\/\/miro.medium.com\/max\/780\/1*CuBJjoqYh8bzq9xsI0_Lyg.jpeg 390w\"><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"a66c\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>But, in a regularized version of Ordinary Least Squares, you want to<span>&nbsp;<\/span><em class=\"mb\">shrink<span>&nbsp;<\/span><\/em>some of its coefficients to reduce overall model variance. You do that by applying a penalty to the Residual Sum of Squares[1].<\/p>\n<p id=\"bdcc\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>In the<span>&nbsp;<\/span><em class=\"mb\">regularized<span>&nbsp;<\/span><\/em>version of OLS, you\u2019re trying to find the coefficients that minimize:<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"fs ft ol\">\n<div class=\"jk s am jl\">\n<div class=\"om jn s\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/1166\/1*yuaINKMiHgUlHX9ZvB_sGA.jpeg\" sizes=\"583px\" alt=\"Image for post\" width=\"583\" height=\"60\" data-lazy=\"true\" data-srcset=\"https:\/\/miro.medium.com\/max\/552\/1*yuaINKMiHgUlHX9ZvB_sGA.jpeg 276w, https:\/\/miro.medium.com\/max\/1104\/1*yuaINKMiHgUlHX9ZvB_sGA.jpeg 552w, https:\/\/miro.medium.com\/max\/1166\/1*yuaINKMiHgUlHX9ZvB_sGA.jpeg 583w\"><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"e72f\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>The<span>&nbsp;<\/span><em class=\"mb\">shrinkage penalty<\/em><span>&nbsp;<\/span>is the product of a tuning parameter and regression coefficients, so it will get smaller as the regression coefficient portion of the penalty gets smaller. The tuning parameter controls the impact of the<span>&nbsp;<\/span><em class=\"mb\">shrinkage penalty<span>&nbsp;<\/span><\/em>in the residual sum of squares<em class=\"mb\">.<\/em><\/p>\n<p id=\"b19f\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>The<span>&nbsp;<\/span><em class=\"mb\">shrinkage penalty<\/em><span>&nbsp;<\/span>is never applied to Beta 0, the intercept, because you only want to control the effect of the coefficients on the features, and the intercept doesn\u2019t have a feature associated with it. If all features have coefficient zero, the target will be equal to the value of the intercept.<\/p>\n<p id=\"19d1\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>There are two different regularization techniques that can be applied to OLS:<\/p>\n<ul class>\n<li id=\"e663\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq ms kt ku hn\" data-selectable-paragraph>Ridge Regression,<\/li>\n<li id=\"16c4\" class=\"jv jw gg jx b hp kv jz ka hs kw kc kd ke kx kg kh ki ky kk kl km kz ko kp kq ms kt ku hn\" data-selectable-paragraph>Lasso.<\/li>\n<\/ul>\n<h2 id=\"8033\" class=\"jv jw gg jx b hp lw jz ka hs lx kc kd ke ly kg kh ki lz kk kl km ma ko kp kq fz hn\">Ridge Regression<\/h2>\n<p class=\"jv jw gg jx b hp lw jz ka hs lx kc kd ke ly kg kh ki lz kk kl km ma ko kp kq fz hn\" data-selectable-paragraph>Ridge Regression minimizes the sum of the square of the coefficients.<\/p>\n<p id=\"ca1a\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>It\u2019s also called L2 norm because, as the tuning parameter<span>&nbsp;<\/span><em class=\"mb\">lambda<\/em><span>&nbsp;<\/span>increases the norm of the vector of least squares coefficients will always decrease.<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"nj nk am nl v nm\" tabindex=\"0\" role=\"button\">\n<div class=\"fs ft on\">\n<div class=\"jk s am jl\">\n<div class=\"oo jn s\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/2780\/1*153rBbaG1VodJoepATDRLw.jpeg\" sizes=\"700px\" alt=\"Image for post\" width=\"1390\" height=\"262\" data-lazy=\"true\" data-srcset=\"https:\/\/miro.medium.com\/max\/552\/1*153rBbaG1VodJoepATDRLw.jpeg 276w, https:\/\/miro.medium.com\/max\/1104\/1*153rBbaG1VodJoepATDRLw.jpeg 552w, https:\/\/miro.medium.com\/max\/1280\/1*153rBbaG1VodJoepATDRLw.jpeg 640w, https:\/\/miro.medium.com\/max\/1400\/1*153rBbaG1VodJoepATDRLw.jpeg 700w\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"13c1\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Even though it<span>&nbsp;<\/span><em class=\"mb\">shrinks<span>&nbsp;<\/span><\/em>each model coefficient in the same proportion, Ridge Regression will never actually<span>&nbsp;<\/span><em class=\"mb\">shrink<span>&nbsp;<\/span><\/em>them to zero.<\/p>\n<p id=\"97bd\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>The very aspect that makes this regularization more stable, is also one of its disadvantages. You end up reducing the model variance, but the model maintains its original level of complexity, since none of the coefficients were reduced to zero.<\/p>\n<p id=\"65dc\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>You can fit a model with Ridge Regression by running the following code.<\/p>\n<pre class=\"nc nd ne nf ng ns nt nu\"><span id=\"ade9\" class=\"hn nv lb gg nw b dj nx ny s nz\" data-selectable-paragraph>fit_model(features, targets, type='Ridge')<\/span><\/pre>\n<p id=\"08bb\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Here&nbsp;<em class=\"mb\">lambda<\/em>, i.e., alpha in the<span>&nbsp;<\/span><a class=\"eh kr\" href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">scikit learn method<\/a>, was arbitrarily set to 0.5, but in the next section you\u2019ll go through the process of tuning this parameter.<\/p>\n<p id=\"0eff\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Based on the output of the ridge regression, your dog\u2019s nap duration can be modeled as:<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"nj nk am nl v nm\" tabindex=\"0\" role=\"button\">\n<div class=\"fs ft op\">\n<div class=\"jk s am jl\">\n<div class=\"oq jn s\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/3512\/1*ECf-FlusGhKs6AiqiJb9ag.jpeg\" sizes=\"700px\" alt=\"Image for post\" width=\"1756\" height=\"68\" data-lazy=\"true\" data-srcset=\"https:\/\/miro.medium.com\/max\/552\/1*ECf-FlusGhKs6AiqiJb9ag.jpeg 276w, https:\/\/miro.medium.com\/max\/1104\/1*ECf-FlusGhKs6AiqiJb9ag.jpeg 552w, https:\/\/miro.medium.com\/max\/1280\/1*ECf-FlusGhKs6AiqiJb9ag.jpeg 640w, https:\/\/miro.medium.com\/max\/1400\/1*ECf-FlusGhKs6AiqiJb9ag.jpeg 700w\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"8065\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Looking at other characteristics of the model, like R-squared, bias , and variance, you can see that all were reduced compared to the output of OLS.<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"fs ft or\">\n<div class=\"jk s am jl\">\n<div class=\"os jn s\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/1230\/1*IGsut1yffIyILAy0AULTQg.jpeg\" sizes=\"615px\" alt=\"Image for post\" width=\"615\" height=\"222\" data-lazy=\"true\" data-srcset=\"https:\/\/miro.medium.com\/max\/552\/1*IGsut1yffIyILAy0AULTQg.jpeg 276w, https:\/\/miro.medium.com\/max\/1104\/1*IGsut1yffIyILAy0AULTQg.jpeg 552w, https:\/\/miro.medium.com\/max\/1230\/1*IGsut1yffIyILAy0AULTQg.jpeg 615w\"><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"aee7\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Ridge regression was very effective at<span>&nbsp;<\/span><em class=\"mb\">shrinking<\/em><span>&nbsp;<\/span>the value of the coefficients and, as a consequence, the variance of the model was significantly reduced.<\/p>\n<p id=\"4a74\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>However, the complexity and interpretability of the model remained the same. You still have four features that impact the duration of your dog\u2019s daily nap.<\/p>\n<p id=\"f35a\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Let\u2019s turn to Lasso and see how it performs.<\/p>\n<h2 class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\">Lasso<\/h2>\n<p id=\"5012\" class=\"jv jw gg jx b hp lw jz ka hs lx kc kd ke ly kg kh ki lz kk kl km ma ko kp kq fz hn\" data-selectable-paragraph>Lasso is short for<span>&nbsp;<\/span><em class=\"mb\">Least Absolute Shrinkage and Selection Operator<\/em><span>&nbsp;<\/span>[2], and it minimizes the sum of the absolute values of the coefficients.<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"nj nk am nl v nm\" tabindex=\"0\" role=\"button\">\n<div class=\"fs ft on\">\n<div class=\"jk s am jl\">\n<div class=\"oo jn s\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/2780\/1*L9RGtkRK-ID_ih6X2uOCxA.jpeg\" sizes=\"700px\" alt=\"Image for post\" width=\"1390\" height=\"262\" data-lazy=\"true\" data-srcset=\"https:\/\/miro.medium.com\/max\/552\/1*L9RGtkRK-ID_ih6X2uOCxA.jpeg 276w, https:\/\/miro.medium.com\/max\/1104\/1*L9RGtkRK-ID_ih6X2uOCxA.jpeg 552w, https:\/\/miro.medium.com\/max\/1280\/1*L9RGtkRK-ID_ih6X2uOCxA.jpeg 640w, https:\/\/miro.medium.com\/max\/1400\/1*L9RGtkRK-ID_ih6X2uOCxA.jpeg 700w\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"df08\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>It\u2019s very similar to Ridge regression but, instead of the L2 norm, it uses the L1 norm as part of the<span>&nbsp;<\/span><em class=\"mb\">shrinkage penalty<\/em>. That\u2019s why Lasso is also referred to as L1 regularization.<\/p>\n<p id=\"6207\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>What\u2019s powerful about Lasso is that it will actually<span>&nbsp;<\/span><em class=\"mb\">shrink<\/em><span>&nbsp;<\/span>some of the coefficients to zero, thus reducing both variance and model complexity.<\/p>\n<p id=\"ed69\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Lasso uses a technique called soft-thresholding[1]. It<span>&nbsp;<\/span><em class=\"mb\">shrinks<span>&nbsp;<\/span><\/em>each coefficient by a constant amount such that, when the coefficient value is lower than the<span>&nbsp;<\/span><em class=\"mb\">shrinkage constant<\/em><span>&nbsp;<\/span>it\u2019s reduced to zero.<\/p>\n<p id=\"e798\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Again, with an arbitrary<span>&nbsp;<\/span><em class=\"mb\">lambda<\/em><span>&nbsp;<\/span>of 0.5, you can fit lasso to the data.<\/p>\n<pre class=\"nc nd ne nf ng ns nt nu\"><figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-1340181 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-13.01.39.png\" alt width=\"838\" height=\"60\" sizes=\"(max-width: 838px) 100vw, 838px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-13.01.39.png 894w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-13.01.39-280x20.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-13.01.39-540x39.png 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-13.01.39-270x19.png 270w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/02\/Screenshot-2021-02-23-at-13.01.39-796x57.png 796w\"><\/figure><\/pre>\n<p id=\"3ea4\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>In this case, you can see the feature&nbsp;<em class=\"mb\">squirrels was<span>&nbsp;<\/span><\/em>dropped from the model, because its coefficient is zero.<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"nj nk am nl v nm\" tabindex=\"0\" role=\"button\">\n<div class=\"fs ft ot\">\n<div class=\"jk s am jl\">\n<div class=\"ou jn s\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/2242\/1*ozq0VpMGPB3rp0L2Gl6X8w.jpeg\" sizes=\"700px\" alt=\"Image for post\" width=\"1121\" height=\"406\" data-lazy=\"true\" data-srcset=\"https:\/\/miro.medium.com\/max\/552\/1*ozq0VpMGPB3rp0L2Gl6X8w.jpeg 276w, https:\/\/miro.medium.com\/max\/1104\/1*ozq0VpMGPB3rp0L2Gl6X8w.jpeg 552w, https:\/\/miro.medium.com\/max\/1280\/1*ozq0VpMGPB3rp0L2Gl6X8w.jpeg 640w, https:\/\/miro.medium.com\/max\/1400\/1*ozq0VpMGPB3rp0L2Gl6X8w.jpeg 700w\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"f474\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>With Lasso, your dog\u2019s nap duration can be described as a model with three features:<\/p>\n<figure class=\"nc nd ne nf ng jd fs ft paragraph-image\">\n<div class=\"nj nk am nl v nm\" tabindex=\"0\" role=\"button\">\n<div class=\"fs ft ov\">\n<div class=\"jk s am jl\">\n<div class=\"ow jn s\"><img decoding=\"async\" loading=\"lazy\" class=\"abw abx ep fd ez jg v c lazy\" src=\"https:\/\/miro.medium.com\/max\/2204\/1*Z9QtEku5E8rPysZPndV6tg.jpeg\" sizes=\"700px\" alt=\"Image for post\" width=\"1102\" height=\"38\" data-lazy=\"true\" data-srcset=\"https:\/\/miro.medium.com\/max\/552\/1*Z9QtEku5E8rPysZPndV6tg.jpeg 276w, https:\/\/miro.medium.com\/max\/1104\/1*Z9QtEku5E8rPysZPndV6tg.jpeg 552w, https:\/\/miro.medium.com\/max\/1280\/1*Z9QtEku5E8rPysZPndV6tg.jpeg 640w, https:\/\/miro.medium.com\/max\/1400\/1*Z9QtEku5E8rPysZPndV6tg.jpeg 700w\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"bc36\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Here the advantage over Ridge regression is that you ended up with a model that is more interpretable, because it has fewer features.<\/p>\n<p id=\"3eac\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>Going from four to three features is not a big deal in terms of interpretability, but you can see how this could extremely useful in datasets that have hundreds of features!<\/p>\n<h2 class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\">Finding your optimal lambda<\/h2>\n<p id=\"06f5\" class=\"jv jw gg jx b hp lw jz ka hs lx kc kd ke ly kg kh ki lz kk kl km ma ko kp kq fz hn\" data-selectable-paragraph>So far the<span>&nbsp;<\/span><em class=\"mb\">lambda<\/em><span>&nbsp;<\/span>you used to see Ridge Regression and Lasso in action was completely arbitrary. But there\u2019s a way you can fine-tune the value of<span>&nbsp;<\/span><em class=\"mb\">lambda<\/em><span>&nbsp;<\/span>to guarantee that you can reduce the overall model variance.<\/p>\n<p id=\"7960\" class=\"jv jw gg jx b hp jy jz ka hs kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq fz hn\" data-selectable-paragraph>If you plot the<span>&nbsp;<\/span><a class=\"eh kr\" href=\"https:\/\/en.wikipedia.org\/wiki\/Root-mean-square_deviation\" rel=\"nofollow noopener noreferrer\" target=\"_blank\">root mean squared error<\/a>&nbsp;against a continuous set of<span>&nbsp;<\/span><em class=\"mb\">lambda<span>&nbsp;<\/span><\/em>values, you can use the<span>&nbsp;<\/span><a class=\"eh kr\" href=\"https:\/\/en.wikipedia.org\/wiki\/Elbow_method_(clustering)\" rel=\"nofollow noopener noreferrer\" target=\"_blank\"><em class=\"mb\">elbow technique<\/em><\/a><span>&nbsp;<\/span>to find the optimal value.<\/p>\n<p> <a href=\"https:\/\/thenextweb.com\/neural\/2021\/02\/23\/turn-your-dog-nap-time-into-regularized-linear-model-syndication\/\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Looking at this nap duration model,&nbsp;Beta 0&nbsp;is the&nbsp;intercept, the value the target takes when all features are equal to zero. The remaining betas are the unknown&nbsp;coefficients&nbsp;which, along with the intercept, are the&#8230;<\/p>\n","protected":false},"author":1,"featured_media":3262,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/3261"}],"collection":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3261"}],"version-history":[{"count":0,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/3261\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/media\/3262"}],"wp:attachment":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3261"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3261"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3261"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}