Its all about Data: Classification

Classification is an algorithm that distinguishes between a fixed set of classes.
Such as "spam" vs. "non-spam", based on labeled training is example of classification
So many algorithms are use for classification problem. Some algorithms are :
a) Naive Bayes
b) Balanced Winnow
c) Winnow
d) Maximum Entropy
e) Decision Tree
f) Support Vector M/C(SVM)

Balanced Winnow Algorithm :

Balanced Winnow keeps two weights for each feature, w+ and w-.

Overall weight of feature : (w+ - w- ).

For a given document with feature vector x = (S_p1, S_p2,…..,S_pd) the algorithm deems the document relevant iff :

: Weights corresponding to active feature P_j

: Strength of P_jfeature.

d : Number of documents

τ : Threshold

In Balanced Winnow algorithm initialize below weights with 2τ /m and τ /m respectively.

Here m is the active number of feature in a document in the collection.
Weight of active features are updated only when a mistake is made and these weight updated with promotion step or demotion step.

Positive Example : In the positive example step, following a mistake on a positive example(If the algorithm predicts 0 and the label is 1). Positive part promoted and Negative part of weight is demoted as :

Negative Example : In the negative example step, following a mistake on a negative example(If the algorithm predicts 1 and the label is 0). Positive part demoted and Negative part of weight is promoted as :

α : Promotion Parameter. The value of α is greater than 1(α > 1)

β : Demotion Parameter. The value of β lies between 0 and 1(1>β>0).

Three ways of adjusting the value of “s(i, d)” (Strength of feature I in document d) according to

frequency document:

1) If the feature is present(active) in document then “s(i, d) = 1” otherwise “s(i, d) = 0”.

2) Strength of feature depend on number of occurrence “n(i, d)” in document. In this “s(i, d) = n(i, d)”.

3) In this reducing the strength by normalizing the value of n(i ,d) as : s(i, d) = sqrt(n(i,d))

Example :

Let the document d contains {information, retrieval, retrieval, retrieval, retrieval, information,

retrieval, retrieval} terms.

Let the weight vector of these two features ”information, retrieval” is Wd = (0.5,0.1) and

threshold τ = 1

Strength function of above each approach calculated as :

1.) X1 = (1,1)(boolean feature present or not), Strength function(1) is ∑ = 0.5*1 + 0.1*1 = 0.6.

2) X2 = (2,6) (Number of times feature occur), Strength function(2) is ∑ = 0.5*2 + 0.1*6 = 1.1.

3) X3 = (1.41,2.45) (Normalize the number of times feature occur), Strength function(3) is ∑ = 0.5*1.41 + 0.1*2.45 = 0.95.

In above example Balanced Winnow algorithm will predict 0 when using the first and third strength function and 1 when using the second strength function

A computational advantage of Balanced Winnow is given in a dispersed high dimensional space because
only weights of active features are updated.
This makes the algorithm more sensitive to the relationships among the features. These relationships may go unnoticed by an algorithm that is based on counts of features.

Positive Winnow : Positive Winnow keeps only one weight W_pjof feature P_jinstead of two weights w+ and w- like as Balanced Winnow.
For a given document with feature vector x = (S_p1, S_p2,…..,S_pd) the algorithm deems the document relevant iff :

When a mistake is made, the weight vector is updated. If the algorithm predicts 0 and the label is 1 (positive example) then the weights of all the active features is promoted - the weight W_pj is multiplied by α
Otherwise, if the algorithm predicts 1 and the received label is 0 (negative example) then the weights of all the features are demoted - the weight W_pj is multiplied by β.
No matter which mistake is made, the weights of the inactive features remain unchanged.

Its all about Data

Sunday, May 26, 2013

Classification

No comments:

Post a Comment