Its all about Data: 2013-05-12

Saturday, May 18, 2013

Linear Regression with multiple Variables

In previous blog(Linear Regression with one variable) linear regression that we developed, we have a single feature x, the size of the house, and we wanted to use that to predict why the price of the house and this was our form of our hypothesis. But now imagine, what if we had not only the size of the house as a feature or as a variable of which to try to predict the price, but that we also knew the number of bedrooms, the number of house and the age of the home and years.
It seems like this would give us a lot more information with which to predict the price.
Lets start with example of multiple variables :

In this input feature will be not only House Size like as one variable regression but also Number of Bed Rooms,Number of Floor, Age of House also will be the feature.

Hypothesis function will change from h(x) = Θ₀+Θ₁x to h(x) = Θ₀+Θ₁X₁ +Θ₂X₂ +Θ₃X₃ ...+Θ_nX_n

For convenience of notation we define X₀₌₁ and hypothesis function h(x) becomes as

h(x) = Θ₀X₀+Θ₁X₁ +Θ₂X₂ +Θ₃X₃ ...+Θ_nX_n

We represents each x and Θ value as vector X and Θ as

h(x) = Θ^Tx

This hypothesis function h(x) is multivariate linear regression.

Cost function for multiple variables represents as :

Gradient Descent

Previously in Linear Regression with one variable(n=1)

Repeat

{

}

Linear Regression with multiple variable(n > 1)

_Repeat

Friday, May 17, 2013

Gradient Descent Algorithm for minimizing Cost Function

Gradient Descent algorithm not only use for minimizing cost function for linear regression but also for minimizing other functions as well.

Algorithm Outline Steps :

Step1 : Start with some Θ_{0 ,}Θ₁
_{Step2 : Keep changing}Θ_{0 ,}Θ_{1 to reduce J(}Θ_{0 ,}Θ_{1) until algorithm end up at minimum.}
_{Algorithm :}
_{Repeat until convergence occur}
_{

}

Here α is learning rate.

How Θ_{0 ,}Θ₁ update is shown as :

Correct way of simultaneously update is :

Incorrect way of simultaneously update is :

Pictorial representation of how Θ₁ update with +ve/-ve slope :

Θ_{1 =}Θ₁- α (+ve number)

Θ_{1 =}Θ₁- α (-ve number)

Learning rate α for minimizing cost function represents as :

Two things always keep in mind for choosing the learning rate α :

1) If α is too small, gradient descent can be very slow.
2) If α is too large, gradient descent can overshoot the minimum. It may fail to converge or even diverge.

Gradient Descent Algorithm also called as Batch Gradient Descent Algorithm. Because at each step it use all training example.

Minimize the cost function of Regression Method

As much we minimize the cost function, regression line for prediction would be more accurate.

Lets start with simplified hypothesis where Θ₀= 0.and h(X)= Θ₀+ Θ1 X = Θ1 X .

Minimize this cost function :

h(x) : For fixed value of Θ1 this is function of x.

J( Θ) : Function of parameter Θ1.

We minimizing the cost function for reducing the gap between actual and predicted value. For visualizing it explained it in below figure:

For different value of Θ1, plot the hypothesis value h(x) and cost function J( Θ).

Plot of J(Θ) is shown as :

Calculate the minima of this cost function plot. That value consider as minimize cost function. In above figure minimum cost function value lie between 0 and 1.

Linear Regression with One Variable

As i have mentioned in "What is Machine Learning?" Regression method use for predicting continuous variable.
Lets start with example predict the house price according to size. We're going to use a data
set of housing prices from the Delhi City. Here I'm going to plot my data set of a number of houses
that were different sizes that were sold for a range of different prices.

Let's say somebody wanted to sell house of 1350 square. feet. We want to tell him/her how much they might be able to sell the
house for.
Well one thing we could do is fit a model. Maybe fit a straight line to this data. Looks something like that and based on that, maybe we could tell him/her that let's say maybe he/she can sell the house for around 2500,000.

It's a regression problem where the term regression refers to the fact that we are predicting a real-valued output namely the price.

Lets use x(i), y(i)) just refers to the ith row of this below table.

So for example, x(1) refers to the input value for the first training example to predict output value y(1).

From this training data we make hypothesis to predict house price from house size.

Flow of process is shown below figure :

Here h represent Hypothesis, which maps from X(Size of House) to Y(Estimated Price).

h(x) = Θ₀ + Θ₁(x)

Regression Line represents as shown in below figure :

Below figures shows Regression Line(Hypothesis) for different Θ₀and Θ_1.

_{Goal : Goal for each regression line is chose}Θ_{0 ,} Θ₁ as where h(x) close to y(output value).
Goal of each training iteration is to minimize the squared error. Cost function of this Regression line is :

M : Number of training example

This complete post is regarding Linear Regression with one Variable.

Variables in Statistics

In statistics variables are defined to two categories :
1) Quantitative : Quantitative variables are numeric. They represent a measurable quantity. Ex. Number of records in database, number of persons works in IT sector etc
2) Qualitative : Qualitative variables take on values that are labels. Ex. Behavior of User, Color of animal, Category of employee like as Manager/Director/Software Engineer etc

Quantitative variables are further classified into two categories :
1) Discrete Variable : If variable can take some specific value between max and min value instead of all values between max and min.
Ex. Number of times tale comes in 100 times tossing a coin.
Number of even numbers between 1 to 1000.

2) Continuous Variable : If variable can take any value between max and min value.
Ex. Weight of person can take any value as 56.7/ 56.73/56.734/56.7342 etc
Number of rational numbers between 0 and 1.

Statistical data are often classified according to the number of variables as
Univariate data : If experiment done using one variable. Ex. Predicted person fitness by his/her weight.

Bivariate data : If experiment done using two variables. Ex. Predicted person fitness by his/her weight and height.

Multivariate data : If experiment done using more than two variables. Ex. Predicted person fitness by his/her weight, height, age, looks etc.

Thursday, May 16, 2013

Statistics in Machine Learning

Machine learning merges statistics with the computational sciences :computer science, systems science and optimization. Much of the agenda in machine learning is driven by applied problems in science and technology, where data streams are increasingly large-scale, dynamical and heterogeneous, and where mathematical and algorithmic creativity are required to bring statistical methodology to bear.

Statistics a mathematical body of science that pertains to the collection, analysis, interpretation or explanation, and presentation of data. Statistics itself also provides tools for prediction and forecasting the use of data and statistical models.

Parts of statistics which are extensively use in Machine Learning :

1) Variables

2) Central Tendency

3) Variability

4) Probability

5) Probability Distribution

a) Discrete Probability Distribution

b) Continuous Probability Distribution

6) Hypothesis Testing

What is Machine Learning ?

There is no well define what is and what isn't m/c learning. One of the author Samuel defined machine
learning as the field of study that gives computers the ability to learn without being explicitly programmed.

More recent definition by Tom Mitchell defines machine learning by saying that, a well posed learning problem is defined as follows. He says, a computer program is said to learn from experience E, with respect to some task T, and some performance measure P, if its performance on T as measured by P improves
with experience E.

Let's say your email program watches which emails you do or do not flag as spam. So in an email client like this you might click this spam button to report some email as spam, but not other emails and. Based on which emails you mark as spam, so your e-mail program learns better how to filter spam e-mail.

There are several different types of learning algorithms. The main two types are what we call supervised learning and unsupervised learning.

Supervised Learning : In this we're going to teach the computer how to do something. In supervise learning what's the right answer is already given in training time.

Regression and Classification are Supervise Learning Methods

Regression : Predict continuous value output.
Ex. a) Inventory item store predict how many these items will sell over next month.
b) Predict the price of house according to attributes like as house area and house locality etc.

Classification : Predict discrete .value output.
a) Examined the mail is Spam or Not.
b) From s/w examined it is hacked or compromised.

Unsupervised Learning : In this we're going to let it learn by itself. In unsupervised learning what's the right answer in not given at training time.

Unsupervised Learning includes :
a) Clustering : k-means, hierarchical etc
b) Blind Signal Separation : Principal component analysis, Singular value decomposition etc

Ex. : Divide the news data in multiple categories like as Political News, Cricket News and Bollywood News etc.
Market Segmentation
Social Network Analysis