Its all about Data: Classifier/Model Analysis

After generating the classifier/model, we do the analysis on that. There is two terms precision and recall define the strength of classifier/model.
Precision (+ve Predicted Value) : It is fraction of retrieved instances that are relevant. Ex. Search engine have given 10 pages result to user according to query from the 20 pages those are exactly match query result. Only 6 pages match the query correctly out of 10 retrieved pages.Total 9 pages exactly match the query result out of 20.

Precision : (Total number of retrieved documents those match exactly the query result) / (Total number of retrieved documents) : 6/10
Precision : (True positive) / (True Positive + False Positive)
: ( 6 ) / (6 + 4 )

Precision can be seen as a measure of exactness or quality.

Recall (Sensitivity) : Fraction of relevant instances retrieved.

Recall : (Total number of retrieved documents those match exactly the query result) / (Total number of documents exactly match query result) : 6/9

Recall : (True positive) / (True Positive + False Negative)
: ( 6 ) / ( 6 + 3 )

Recall is a measure of completeness or quantity.

In statistics, if the null hypothesis is that all and only the relevant items are retrieved, absence of type I and type II errors corresponds respectively to maximum precision (no false positives) and maximum recall (no false negatives).

type I error : False Positive : 10 -6 = 4 for above example.
type II error : False Negative : 9 -6 = 3 for above example.

Often, there is an inverse relationship between precision and recall, where it is possible to increase one at the cost of reducing the other.

Usually, precision and recall scores are not measured in isolation. Instead, either values for one measure are compared for a fixed level at the other measure and both are combined into a single measure, such as their harmonic mean the F-measure(Balanced F-score), which is the weighted harmonic mean of precision and recall.

There is another analysis which is decile analysis to check the efficiency of model.
Decile analysis is created to test the model’s ability to predict the intended outcome. In this each column of decile analysis chart(x-axis) represents a collection of records that have been scored using the model. The height of each column(y-axis) represents the average of those records’ actual behavior.

Steps to calculate Decile Analysis are :

Step 1 : The records are sorted by their predicted scores in descending order and divided into ten equal-sized bins or deciles. The top decile contains the 10% of the population most likely to respond and the bottom decile contains the 10% of the population least likely to respond, based on the model scores.

Step 2 : The deciles and their actual response rates are graphed on the x and y axes, respectively.

When we’re looking at a decile analysis, we want to see a staircase effect; that is, we’ll want the bars to descend in order from left to right, as shown below :

In contrast, if the bars seem to be out of order or flat, the decile analysis is tell us that the model is not doing a very good job of predicting actual responses.

Its all about Data

Friday, May 24, 2013

Classifier/Model Analysis

No comments:

Post a Comment