# "classify" in Matlab

Posted 2011.10.07 13:59# classify

Discriminant analysis

## Syntax

`class = classify(sample,training,group)
class = classify(sample,training,group,'type')
class = classify(sample,training,group,'type',prior)
[class,err] = classify(...)
[class,err,POSTERIOR] = classify(...)
[class,err,POSTERIOR,logp] = classify(...)
[class,err,POSTERIOR,logp,coeff] = classify(...)
`

## Description

`class = classify(sample,training,group)` classifies
each row of the data in `sample` into one of the
groups in `training`. `sample` and `training` must
be matrices with the same number of columns. `group` is
a grouping variable for `training`. Its unique values
define groups; each element defines the group to which the corresponding
row of `training` belongs. `group` can
be a categorical variable, a numeric vector, a string array, or a
cell array of strings. `training` and `group` must
have the same number of rows. (See Grouped Data.) `classify` treats `NaN`s
or empty strings in `group` as missing values, and
ignores the corresponding rows of `training`. The
output `class` indicates the group to which each
row of `sample` has been assigned, and is of the
same type as `group`.

`class = classify(sample,training,group,'type')` allows
you to specify the type of discriminant function. Specify

*inside single quotes.*

`type`*is one of:*

`type``linear`— Fits a multivariate normal density to each group, with a pooled estimate of covariance. This is the default.`diaglinear`— Similar to`linear`, but with a diagonal covariance matrix estimate (naive Bayes classifiers).`quadratic`— Fits multivariate normal densities with covariance estimates stratified by group.`diagquadratic`— Similar to`quadratic`, but with a diagonal covariance matrix estimate (naive Bayes classifiers).`mahalanobis`— Uses Mahalanobis distances with stratified covariance estimates.

`class = classify(sample,training,group,'type',prior)` allows
you to specify prior probabilities for the groups.

*is one of:*

`prior`A numeric vector the same length as the number of unique values in

`group`(or the number of levels defined for`group`, if`group`is categorical). If`group`is numeric or categorical, the order ofmust correspond to the ordered values in`prior``group`, or, if`group`contains strings, to the order of first occurrence of the values in`group`.A 1-by-1 structure with fields:

`prob`— A numeric vector.`group`— Of the same type as`group`, containing unique values indicating the groups to which the elements of`prob`correspond.

As a structure,

can contain groups that do not appear in`prior``group`. This can be useful if`training`is a subset a larger training set.`classify`ignores any groups that appear in the structure but not in the`group`array.The string

`'empirical'`, indicating that group prior probabilities should be estimated from the group relative frequencies in`training`.

* prior* defaults to a numeric vector
of equal probabilities, i.e., a uniform distribution.

*is not used for discrimination by Mahalanobis distance, except for error rate calculation.*

`prior``[class,err] = classify(...)` also
returns an estimate `err` of the misclassification
error rate based on the `training` data. `classify` returns
the apparent error rate, i.e., the percentage of observations in `training` that
are misclassified, weighted by the prior probabilities for the groups.

`[class,err,POSTERIOR] = classify(...)` also
returns a matrix `POSTERIOR` of estimates of the
posterior probabilities that the *j*th training group
was the source of the *i*th sample observation, i.e., *Pr*(*group
j*|*obs i*). `POSTERIOR` is
not computed for Mahalanobis discrimination.

`[class,err,POSTERIOR,logp] = classify(...)` also
returns a vector `logp` containing estimates of the
logarithms of the unconditional predictive probability density of
the sample observations, *p*(*obs i*)
= ∑*p*(*obs i*|*group
j*)*Pr*(*group j*) over
all groups. `logp` is not computed for Mahalanobis
discrimination.

`[class,err,POSTERIOR,logp,coeff] = classify(...)` also
returns a structure array `coeff` containing coefficients
of the boundary curves between pairs of groups. Each element `coeff(I,J)`
contains information for comparing group `I` to group `J` in
the following fields:

`type`— Type of discriminant function, from theinput.`type``name1`— Name of the first group.`name2`— Name of the second group.`const`— Constant term of the boundary equation (K)`linear`— Linear coefficients of the boundary equation (L)`quadratic`— Quadratic coefficient matrix of the boundary equation (Q)

For the `linear` and `diaglinear` types,
the `quadratic` field is absent, and a row `x` from
the `sample` array is classified into group `I` rather
than group `J` if `0 < K+x*L`.
For the other types, `x` is classified into group `I` if `0
< K+x*L+x*Q*x'`.

## Examples

For training data, use Fisher's sepal measurements for iris versicolor and virginica:

load fisheriris SL = meas(51:end,1); SW = meas(51:end,2); group = species(51:end); h1 = gscatter(SL,SW,group,'rb','v^',[],'off');set(h1,'LineWidth',2) legend('Fisher versicolor','Fisher virginica',...'Location','NW')

Classify a grid of measurements on the same scale:

[X,Y] = meshgrid(linspace(4.5,8),linspace(2,4));X = X(:); Y = Y(:); [C,err,P,logp,coeff] = classify([X Y],[SL SW],... group,'quadratic');

Visualize the classification:

```
hold on;
gscatter(X,Y,C,'rb','.',1,'off');
K = coeff(1,2).const;
L = coeff(1,2).linear;
Q = coeff(1,2).quadratic;
% Function to compute K + L*v + v'*Q*v for multiple vectors
% v=[x;y]. Accepts x and y as scalars or column vectors.
f = @(x,y) K + [x y]*L + sum(([x y]*Q) .* [x y], 2);
h2 = ezplot(f,[4.5 8 2 4]);
set(h2,'Color','m','LineWidth',2)
axis([4.5 8 2 4])
xlabel('Sepal Length')
ylabel('Sepal Width')
title('{\bf Classification with Fisher Training Data}')
```

## References

[1] Krzanowski, W. J. *Principles
of Multivariate Analysis: A User's Perspective*. New York:
Oxford University Press, 1988.

[2] Seber, G. A. F. *Multivariate
Observations*. Hoboken, NJ: John Wiley & Sons, Inc.,
1984.

#### 'Enginius > Machine Learning' 카테고리의 다른 글

Expectation of X^n; X~N(u,sigma^2) (0) | 2011.10.24 |
---|---|

using FLDA to classify IRIS data set (0) | 2011.10.14 |

HMM viterbi algorithm (0) | 2011.10.13 |

라그랑즈 미정 계수법을 이용한 최적화 기법 (0) | 2011.10.12 |

"classify" in Matlab (0) | 2011.10.07 |

Linear Discriminant Analysis (LDA) (0) | 2011.10.06 |

- Filed under : Enginius/Machine Learning
- 0 Comments 0 Trackbacks