筆記Andrew Ng:Machine Learning Week2

一、Linear Regression with Multiple Variables

Welcome to week 2! I hope everyone has been enjoying the course and learning a lot! This week we’re covering linear regression with multiple variables. we’ll show how linear regression can be extended to accommodate multiple input features. We also discuss best practices for implementing linear regression.
We’re also going to go over how to use Octave. You’ll work on programming assignments designed to help you understand how to implement the learning algorithms in practice. To complete the programming assignments, you will need to use Octave or MATLAB.
As always, if you get stuck on the quiz and programming assignment, you should post on the Discussions to ask for help. (And if you finish early, I hope you’ll go there to help your fellow classmates as well.)

(1) Multivariate Linear Regression

Multiple Features

multiple features

Gradient Descent for Multiple Variable

在這里插入圖片描述
求導后得到：
gradient descent for multiple variables
(simultaneously update θj for j=0,1,…,n)

python：compute Cost Function

import numpy as np
def computeCost(X, y, theta):
    inner = np.power(((X * theta.T) - y), 2)
    return np.sum(inner) / (2 * len(X))

Gradient Descent in Practice 1 - Feature Scaling

We can speed up gradient descent by having each of our input values in roughly the same range.
This is because θ will descend quickly on small ranges and slowly on large ranges,
and so will oscillate inefficiently down to the optimum when the variables are very uneven.

The way to prevent this is to modify the ranges of our input variables so that they are all roughly the same.
Ideally: -1<=x(i)<=1 or -0.5<=x(i)<=0.5
These aren’t exact requirements; we are only trying to speed things up.
The goal is to get all input variables into roughly one of these ranges, give or take a few.

Two techniques to help with this are feature scaling and mean normalization.

two techniques

feature scaling

Feature scaling involves dividing the input values by the range
(i.e. the maximum value minus the minimum value)
of the input variable, resulting in a new range of just 1.

mean normalization

Mean normalization involves subtracting the average value for an input variable from the values
for that input variable resulting in a new average value for the input variable of just zero.

Gradient Descent in Practice 2 - Learing rate

Debugging gradient descent

Make a plot with number of iterations on the x-axis.
Now plot the cost function, J(θ) over the number of iterations of gradient descent.
If J(θ) ever increases, then you probably need to decrease α.

Automatic convergence test

Declare convergence if J(θ) decreases by less than E in one iteration,
where E is some small value such as 10?3.
However in practice it’s difficult to choose this threshold value.

It has been proven that if learning rate α is sufficiently small,
then J(θ) will decrease on every iteration.

summarize

If α is too small: slow convergence.
If α is too large: may not decrease on every iteration and thus may not converge.

Features and Polynomial Regression

We can improve our features and the form of our hypothesis function in a couple different ways.
We can combine multiple features into one. For example, we can combine x1 and x2 into a new feature x3 by taking x1*x2

Polynomial Regression

Our hypothesis function need not be linear (a straight line) if that does not fit the data well.
We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic
or square root function (or any other form).

One important thing : feature scaling

One important thing to keep in mind is, if you choose your features this way
then feature scaling becomes very important.
eg. if x1 has range 1-1000 then range of x1^2 becomes 1-1000000 and that of x1^3 becomes 1-1000000000

(2) Computing Parameters Analytically

Normal Equation

The normal equation formula :

normal equation formula

Formula derivation process

normal equation.jpg

a comparison of gradient descent and the normal equation

comparison_1
And there is no need to do feature scaling with the normal equation
comparison_2

Normal Equation Noninvertibility

noninvertible

python：implement Normal Equation

#Using python to implement Normal Equation
import numpy as np
    
def normalEqn(X, y):    
    theta = np.linalg.inv(X.T@X)@X.T@y #X.T@X等價于X.T.dot(X)
    
    return theta

本文鏈接：https://blog.csdn.net/weixin_45004761/article/details/105698019

智能推薦

Andrew Ng-深度學習-第一門課-week2

1.2.2 第一位代表第一門課，第二位代表第幾周，第三位代表第幾次視頻。編號和視頻順序對應，有些章節視頻內容較少進行了省略。對內容進行簡單的總結，而不是全面的記錄視頻的每一個細節，詳細可見[1]。 1.神經網絡和深度學習 1.2 Basics of Neural Network programming 1.2.1 Binary classification 符號定義： xxx：表示一個nxn_x...

過度擬合-機器學習（machine learning）筆記（Andrew Ng）

過度擬合overfitting 什么是過度擬合如何解決過擬合問題正則化正則化線性回歸正則化邏輯回歸過度擬合（overfitting）什么是過度擬合欠擬合：如果一個算法沒有很好的擬合數據，比如一個本應該用二次多項式擬合的數據用了線性去擬合，導致最后擬合數據的效果很差。我們稱之為欠擬合（underfitting）或者高偏差（high bias）。過擬合：如果一個應該用二次多項式去擬合的...

Machine Learning(Andrew Ng）ex2.logistic regression

Exam1 Exam2 Admitted 0 34.623660 78.024693 0 1 30.286711 43.894998 0 2 35.847409 72.902198 0 3 60.182599 86.308552 1 4 79.032736 75.344376 1 Exam1 Exam2 Admitted 0 34.623660 78.024693 0 1 30.286711 43...

吳恩達深度學習筆記（一）week1~week2

吳恩達深度學習筆記（一）筆記前言：距離開學過去也有兩個半月了，浮躁期也漸漸過去，兩個半月糊里糊涂的接觸了些機器學習/深度學習的知識，看了不少資料，回來過頭來看還是覺得看吳恩達老師的課受益最深，深入淺出，在此也強烈推薦每一個想進入機器學習領域又苦于無從入手的同學以該視頻課程作為入門。上周做了組內的第一次匯報便是講神經網絡，準備過程中發現之前自認為弄懂的知識點甚是模糊，然后又想想這幾個月的學習，大都...

Neural Networks and Deep Learning（week2）神經網絡的編程基礎 (Basics of Neural Network programming)...

總結一、處理數據 1.1 向量化(vectorization) (height, width, 3) ===> 展開shape為(heigh*width*3, m)的向量 1.2 特征歸一化(Normalization) 一般數據，使用標準化(Standardlization)， z(i) = (x(i) - mean) / delta，mean與delta代表X的均值和標準差，最終特征處...

筆記Andrew Ng:Machine Learning Week2

筆記Andrew Ng:Machine Learning Week2

一、Linear Regression with Multiple Variables

(1) Multivariate Linear Regression

Multiple Features

Gradient Descent for Multiple Variable

python：compute Cost Function

Gradient Descent in Practice 1 - Feature Scaling

two techniques

Gradient Descent in Practice 2 - Learing rate

Debugging gradient descent

Automatic convergence test

summarize

Features and Polynomial Regression

Polynomial Regression

One important thing : feature scaling

(2) Computing Parameters Analytically

Normal Equation

The normal equation formula :

Formula derivation process

a comparison of gradient descent and the normal equation

Normal Equation Noninvertibility

python：implement Normal Equation

智能推薦

猜你喜歡