Saturday, May 18, 2013

Linear Regression with multiple Variables


In previous blog(Linear Regression with one variable) linear regression that we developed, we have a single feature x, the size of the house, and we wanted to use that to predict why the price of the house and this was our form of our hypothesis. But now imagine, what if we had not only the size of the house as a feature or as a variable of which to try to predict the price, but that we also knew the number of bedrooms, the number of house and the age of the home and years.
It seems like this would give us a lot more information with which to predict the price.
Lets start with example of multiple variables :

In this input feature will be not only House Size like as one variable regression but also Number of Bed Rooms,Number of Floor, Age of House also will be the feature.

Hypothesis function will change from h(x) = Θ+  Θ1 x  to h(x) = Θ+  Θ1X1   +  Θ2 X2 +  Θ3 X3 ...+  Θn Xn



For convenience of notation we define X0=1  and hypothesis function h(x) becomes as
h(x) = Θ0X0 +  Θ1X1   +  Θ2 X2 +  Θ3 X3 ...+  Θn Xn

We represents each x and Θ value as vector X and Θ as

h(x) = Θx

This hypothesis function h(x) is multivariate linear regression.

Cost function for multiple variables represents as :


Gradient Descent 
Previously in Linear Regression with one variable(n=1)
Repeat
{


}

Linear Regression with multiple variable(n > 1)

Repeat
{

}


Friday, May 17, 2013

Gradient Descent Algorithm for minimizing Cost Function


Gradient Descent algorithm not only use for minimizing cost function for linear regression but also for minimizing other functions as well.

Algorithm Outline Steps :

Step1 : Start with some Θ0 , Θ1

Step2 : Keep changing  Θ0 , Θ1 to reduce J( Θ0 , Θ1) until algorithm end up at minimum.


Algorithm :

Repeat until convergence occur
{







}

Here α  is learning rate.

How Θ0 , Θ1  update is shown as :

Correct way of simultaneously update is :















Incorrect way of simultaneously update is :

















Pictorial representation of how  Θ1 update with +ve/-ve slope :





















Θ1 = Θ α (+ve number)

















Θ1 = Θ α (-ve number)


Learning rate α for minimizing cost function represents as : 

















Two things always keep in mind for choosing the learning rate α :

1) If α is too small, gradient descent can be very slow.
2) If α is too large, gradient descent can overshoot the minimum. It may fail to converge or even diverge.

Gradient Descent Algorithm also called as Batch Gradient Descent Algorithm. Because at each step it use all training example.




Minimize the cost function of Regression Method

As much we minimize the cost function, regression line for prediction would be more accurate.

Lets start with simplified hypothesis where Θ= 0.and h(X)= ΘΘ1 X = ΘX .

Minimize this cost function :
                           

h(x) : For fixed value of Θ1 this is function of x.
J( Θ) : Function of parameter Θ1.

We minimizing the cost function for reducing the gap between actual and predicted value. For visualizing it  explained it in below figure:


For different value of Θ1,  plot the hypothesis value h(x) and cost function J( Θ).

Plot of J(Θ) is shown as :

Calculate the minima of this cost function plot. That value consider as minimize cost function. In above figure minimum cost function value lie between 0 and 1.



Linear Regression with One Variable

As i have mentioned in "What is Machine Learning?" Regression method use for predicting continuous variable.
Lets start with example predict the house price according to size. We're going to use a data
set of housing prices from the Delhi City. Here I'm going to plot my data set of a number of houses
that were different sizes that were sold for a range of different prices.




 Let's say somebody wanted to sell house of 1350 square. feet. We want to tell him/her how much they might be able to sell the
house for.
Well one thing we could do is fit a model. Maybe fit a straight line to this data. Looks something like that and based on that, maybe we could tell him/her that let's say maybe he/she can sell the house for around 2500,000.


It's a regression problem where the term regression refers to the fact that we are predicting a real-valued output namely the price.




Lets use x(i), y(i)) just refers to the ith row of this below table. 


So for example, x(1) refers to the input value for the first training example to predict output value y(1).

From this training data we make hypothesis to predict house price from house size.

Flow of process is shown below figure :


Here h represent Hypothesis, which maps from X(Size of House) to Y(Estimated Price).

h(x) =  Θ0   +  Θ1(x) 

Regression Line represents as shown in below figure :





Below figures shows Regression Line(Hypothesis) for different   Θand  Θ1.





Goal : Goal for each regression line is chose  Θ0 , Θ  as where h(x) close to y(output value).
Goal of each training iteration is to minimize the squared error. Cost function of this Regression line is :

M : Number of training example

This complete post is regarding Linear Regression with one Variable.


Variables in Statistics



In statistics variables are defined to two categories :
1) Quantitative :  Quantitative variables are numeric. They represent a measurable quantity. Ex. Number of records in database, number of persons works in IT sector etc
2) Qualitative  : Qualitative variables take on values that are labels. Ex. Behavior of User, Color of animal, Category of employee like as Manager/Director/Software Engineer etc


Quantitative variables are further classified into two categories :
   1) Discrete Variable : If variable can take some specific value between max and min value instead of all values between max and min.
 Ex. Number of times tale comes in 100 times tossing a coin.
  Number of even numbers between 1 to 1000.

   2) Continuous Variable : If variable can take any value between max and min value.
Ex. Weight of person can take any value as 56.7/ 56.73/56.734/56.7342 etc
Number of rational numbers between 0 and 1.


Statistical data are often classified according to the number of variables as
Univariate data : If experiment done using one variable. Ex. Predicted person fitness by his/her weight.

Bivariate data : If experiment done using two variables. Ex. Predicted person fitness by his/her weight and height.

Multivariate data : If experiment done using more than two variables. Ex. Predicted person fitness by his/her weight, height, age, looks etc.




Thursday, May 16, 2013

Statistics in Machine Learning


Machine learning merges statistics with the computational sciences :computer science, systems science and optimization. Much of the agenda in machine learning is driven by applied problems in science and technology, where data streams are increasingly large-scale, dynamical and heterogeneous, and where mathematical and algorithmic creativity are required to bring statistical methodology to bear.

Statistics a mathematical body of science that pertains to the collection, analysis, interpretation or explanation, and presentation of data.  Statistics itself also provides tools for prediction and forecasting the use of data and statistical models.

Parts of statistics which are extensively use in Machine Learning :

1) Variables
2) Central Tendency
3) Variability
4) Probability
5) Probability Distribution
           a) Discrete Probability Distribution
           b) Continuous Probability Distribution
6) Hypothesis Testing

What is Machine Learning ?


There is no well define what is and what isn't m/c learning. One of the author Samuel defined machine
learning as the field of study that gives computers the ability to learn without being explicitly programmed.

More recent definition by Tom Mitchell defines machine learning by saying that, a well posed learning problem is defined as follows. He says, a computer program is said to learn from experience E, with respect to some task T, and some performance measure P, if its performance on T as measured by P improves
with experience E.

Let's say your email program watches which emails you do or do not flag as spam. So in an email client like this you might click this spam button to report some email as spam, but not other emails and. Based on which emails you mark as spam, so your e-mail program learns better how to filter spam e-mail.

There are several different types of learning algorithms. The main two types are what we call supervised learning and unsupervised learning.

Supervised Learning : In this we're going to teach the computer how to do something. In supervise learning what's the right answer is already given in training time.

Regression and Classification are Supervise Learning Methods

Regression : Predict continuous value output.
Ex. a) Inventory item store predict how many these items will sell over next month.
b) Predict the price of house according to attributes like as house area and  house locality etc.

Classification : Predict discrete .value output.
a) Examined the mail is Spam or Not.
b) From s/w examined it is hacked or compromised.

Unsupervised Learning :  In this we're going to let it learn by itself. In unsupervised learning what's the right answer in not given at training time.

Unsupervised Learning includes :
a) Clustering : k-means, hierarchical etc
b) Blind Signal Separation :  Principal component analysis, Singular value decomposition etc

Ex. : Divide the news data in multiple categories like as Political News, Cricket News and Bollywood News etc.
Market Segmentation
Social Network Analysis