SmartSVM¶
SmartSVM is a Python package that implements the methods from Fast MetaLearning for Adaptive Hierarchical Classifier Design by Gerrit J.J. van den Burg and Alfred O. Hero. The package contains functions for estimating the Bayes error rate (BER) using the HenzePenrose divergence and a hierarchical classifier called SmartSVM. See the Usage documentation below for more details.
Usage¶
In the paper the main focus is on the accurate Bayes error estimates and the hierarchical classifier SmartSVM. These will therefore be of most interest to users of the SmartSVM package. Below we briefly explain how to use these functions.
Citing¶
If you use this package in your research, please cite the paper using the following BibTex entry:
@article{van2017fast,
title={Fast MetaLearning for Adaptive Hierarchical Classifier Design},
author={Gerrit J.J. van den Burg and Alfred O. Hero},
journal={arXiv preprint arXiv:1711.03512},
archiveprefix={arXiv},
year={2017},
eprint={1711.03512},
url={https://arxiv.org/abs/1711.03512},
primaryclass={cs.LG}
}
Bayes error estimates¶
Error estimation is implemented three functions:
hp_estimate
for the HenzePenrose estimator of the Bayes error rate. This can be used as:>>> import numpy as np >>> from smartsvm import hp_estimate >>> X1 = np.random.multivariate_normal([1, 0], [[1, 0], [0, 1]], 100) >>> X2 = np.random.multivariate_normal([1, 0], [[1, 0], [0, 1]], 100) >>> hp_estimate(X1, X2) # with normalization >>> hp_estimate(X1, X2, normalize=False) # without normalization
compute_error_graph
andcompute_ovr_error
respectively compute the complete weighted graph of pairwise BER estimates or the OnevsRest BER for each class. They have a similar interface:>>> import numpy as np >>> from smartsvm import compute_error_graph, compute_ovr_error >>> from sklearn.datasets import load_digits >>> digits = load_digits(5) >>> n_samples = len(digits.images) >>> X = digits.images.reshape((n_samples, 1)) >>> y = digits.target >>> G = compute_error_graph(X, y, n_jobs=2, normalize=True) >>> d = compute_ovr_error(X, y, normalize=True)
SmartSVM Classifier¶
SmartSVM is an adaptive hierarchical classifier which constructs a classification hierarchy based on the HenzePenrose estimates of the Bayes error between each pair of classes. The classifier is build on top of ScikitLearn and can be used in the exact same way as other sklearn classifiers:
>>> import numpy as np
>>> from smartsvm import SmartSVM
>>> from sklearn.datasets import load_digits
>>> digits = load_digits(10)
>>> n_samples = len(digits.images)
>>> X = digits.images.reshape((n_samples, 1))
>>> y = digits.target
>>> clf = SmartSVM()
>>> clf.fit(X, y)
>>> clf.predict(X)
By default, the SmartSVM classifier uses the Linear Support Vector Machine
(LinearSVC
) as the underlying binary classifier for each binary subproblem
in the hierarchy. This can easily be changed with the binary_clf
parameter to the class constructor, for instance:
>>> from sklearn.tree import DecisionTreeClassifier
>>> clf = SmartSVM(binary_clf=DecisionTreeClassifier)
>>> clf.fit(X, y)
>>> clf._get_binary()
DecisionTreeClassifier(class_weight=None, criterion='gini',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False, random_state=None,
splitter='best')
You may optionally add parameters for the classifier through the
clf_params
parameter. This should be a dict with the parameters to the
binary classifier, as follows:
>>> clf = SmartSVM(binary_clf=DecisionTreeClassifier, clf_params={'criterion': 'entropy'})
>>> clf.fit(X, y)
>>> clf._get_binary()
DecisionTreeClassifier(class_weight=None, criterion='entropy',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False, random_state=None,
splitter='best')
Finally, it’s possible to retrieve probability estimates for the classes if
the underlying classifier supports the predict_proba
method:
>>> from sklearn.svm import SVC
>>> clf = SmartSVM(binary_clf=SVC, clf_params={"probabilities": True})
>>> clf.fit(X, y)
>>> prob = clf.predict_proba(X)
>>> import pandas as pd
>>> df = pd.DataFrame(prob)
>>> df
0 1 2 ...
0 9.999997e01 1.716831e18 2.677824e13 ...
1 1.000000e07 9.956408e01 1.035589e09 ...
2 2.595652e05 1.452011e02 9.722321e01 ...
For more information about parameters to SmartSVM, see the API documentation here.
Known Limitations¶
The HenzePenrose estimator of the Bayes error rate is based on construction of the Euclidean minimal spanning tree. The current algorithm for this in the SmartSVM package uses an adaptation of Whitney’s algorithm. This is not the fastest way to construct a minimal spanning tree. The Fast Euclidean Minimal Spanning Tree algorithm by March et al., would be a faster option but this makes it more difficult to construct orthogonal MSTs. Incorporating this algorithm into the SmartSVM package is considered a topic for future work.
References¶
The main reference for this package is:
The theory of the HenzePenrose estimator is developed in:
API documentation¶
The documentation below is automatically generated from the docstrings in the
code. It is therefore also available through the builtin Python help()
function.
smartsvm.base module¶
Base object for hierarchical classifiers
This module contains the definition of a general hierarchical classifier, which
forms the basis for the SmartSVM
classifier. The
HierarchicalClassifier
class is an abstract base class, which leaves
the fit
and _split
methods to be defined in the subclass.

class
smartsvm.base.
HierarchicalClassifier
(binary_clf=<class 'sklearn.svm.classes.LinearSVC'>, clf_params=None, n_jobs=1)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.ClassifierMixin
Base class for a hierarchical classifier
This class is a base class for a hierarchical classifier which contains a hierarchy of binary classification problems. It forms the basis of the
SmartSVM
class and can be used to implement other hierarchical classifiers. Note that this class is also the node class for the binary tree: it has children which are in turn alsoHierarchicalClassifier
instances.Parameters:  binary_clf (classifier) – The type of the binary classifier to be used for each binary subproblem. This will typically be a scikitlearn classifier such as LinearSVC, DecisionTreeClassifier, etc.
 clf_params (dict) – Parameters to pass on to the constructor of the
binary_clf
type classifier. It must be a dict with a mapping from parameter to value.  n_jobs (int) – Number of parallel jobs to use where applicable.

classifier_
¶ The binary classifier for this node in the tree, if it exists. This can also be obtained with the
_get_binary()
method.Type: classifier

negative_child_
¶ The “left” child of the binary tree below this node.
Type: HierarchicalClassifier

positive_child_
¶ The “right” child of the binary tree below this node.
Type: HierarchicalClassifier

negative_
¶ Set of labels of the binary subproblem that are on the “left” part of the tree.
Type: set

positive_
¶ Set of labels on the binary subproblem that are on the “right” part of the tree.
Type: set

predict
(X)¶ Perform classification on dataset X

elements
¶ Elements in the graph or list of nodes

fit
(X, y)¶ Fit model.

is_splitable
¶ Check if the elements of this node can be split further

is_splitted
¶ Check if this node is already split

predict
(X) Predict the class labels using the hierarchical classifier

predict_proba
(X)¶ Predict the class probabilities using the hierarchical classifier.
This is only available if the underlying binary classifier has the “predict_proba” method. Note that if the probability model is created using cross validation, the results can be slightly different than those obtained with the predict method.

print_tree
(prefix='', is_tail=None)¶ Print the tree
smartsvm.cross_connections module¶
Code for computing the number of cross connections in an MST.
The functions in this module are used to compute the number of crossconnections between data points of different classes in the Euclidean Minimal Spanning Tree. It is the interface between the Cython code talking to the C implementations, and the higher level functions that compute the HenzePenrose divergence.

smartsvm.cross_connections.
binary_cross_connections
(X, y, nTrees=3)¶ Compute the number of cross connections for two classes
This function computes the average number of nontrivial crossconnections in the orthogonal MSTs between points from different classes. For each MST the number of times a point from one class is connected to a point from a different class is recorded. This is reduced by 1 to adjust for the trivial connection that will always exist, even in the case of large separation. The result is averaged for each of the orthogonal MSTs.
A warning can occur when the requested number of orthogonal MSTs can’t be computed on the data. This happens when there are either too few datapoints to construct this many MSTs, or in the extreme case where each edge to a data point is used in previous MSTs.
Parameters:  X (numpy.ndarray) – Data matrix
 y (np.ndarray) – Class labels, assumed to be only +1 and 1
 nTrees (int) – The number of orthogonal MSTs to compute
Returns: C – The (average) number of nontrivial connections between instances of different classes in the MSTs
Return type: float

smartsvm.cross_connections.
ovr_cross_connections
(X, y, nClass, nTrees=3)¶ Compute the OnevsRest crossconnection counts
This function computes the crossconnection counts in the OnevsRest setting by constructing orthogonal MSTs on the entire dataset, and collecting crossconnection counts for each class.
Parameters:  X (numpy.ndarray) – Numpy array of the data matrix
 y (numpy.ndarray) – Numpy array of the class labels, assumed to be in 0..nClass1
 nClass (int) – The number of classes in y for the full problem. This needs to be supplied in case y is a subset of the full dataset where not all classes are present. This ensures that the outcome matrix has the appropriate size.
 nTrees (int) – The number of orthogonal MSTs to compute
Returns: Cs – Vector of class crossconnection counts, ordered by class label (nClass x nClass)
Return type: numpy.ndarray
smartsvm.cut module¶
Functions for graph cutting
This module contains functions for cutting the weighted error graph. The default algorithm for cutting the graph is the StoerWagner algorithm, but this module also contains code for experimenting with the Spectral Clustering algorithm to create graph cuts.

smartsvm.cut.
graph_cut
(G, algorithm='stoer_wagner')¶ Cut a connected graph into two disjoint graphs
This is a wrapper function to make it easy to use different graph cut algorithms.
Parameters:  G (networkx.Graph) – The graph that needs to be cut
 algorithm (str) – The graph cut algorithm to use. Available values are:
'stoer_wagner'
,'spectral_clustering'
, and'normalized_cut'
.
Returns:  G1 (networkx.Graph) – One of the subgraphs
 G2 (networkx.Graph) – The other subgraph
Raises: ValueError
– When an unknown graph cut algorithm is supplied.

smartsvm.cut.
graph_cut_sc
(G, assign_labels=None)¶ Apply the Spectral Clustering algorithm to cut the graph
Create two subgraphs using a Spectral Clustering of the adjacency matrix of the graph.
Important: Since the matrix of weights is a dissimilarity matrix (high numbers correspond to difficult to separate classes, we turn it into a similarity matrix for the Spectral Clustering algorithm by using the normalized exponent of the weight matrix. This is also done in the examples of ScikitLearn for Spectral Clustering. Weights that were zero in the weight matrix are set to zero in the dissimilarity matrix as well.
Parameters:  G (networkx.Graph) – The graph that needs to be cut
 assign_labels (str) – Parameter for the ScikitLearn
spectral_clustering
function. Available values are:kmeans
anddiscretize
.
Returns:  G1 (networkx.Graph) – One of the subgraphs
 G2 (networkx.Graph) – The other subgraph

smartsvm.cut.
graph_cut_sw
(G)¶ Apply the StoerWagner graph cut
Use the StoerWagner algorithm for cutting the graph.
Parameters: G (networkx.Graph) – The graph that needs to be cut Returns:  G1 (networkx.Graph) – One of the subgraphs
 G2 (networkx.Graph) – The other subgraph
smartsvm.divergence module¶
Functions for computing the HenzePenrose divergence
This module contains functions for computing the HenzePenrose divergence
between data from two classes (divergence()
) or between one class and
the collection of all other classes (ovr_divergence()
).

smartsvm.divergence.
divergence
(X1, X2, nTrees=3, bias_correction=True)¶ Compute the HenzePenrose divergence
Compute the HenzePenrose divergence between data from two classes using the FriedmanRafsky statistic. This is based on the Euclidean minimal spanning tree between two classes. To reduce variance in the estimate, a number of orthogonal MSTs are used that can be set with a function parameter. A bias correction to the divergence is applied, unless this is disabled by the user (if disabled, the estimate is still corrected to ensure nonnegativity).
Parameters:  X1 (numpy.ndarray) – Data matrix for one of the classes
 X2 (numpy.ndarray) – Data matrix for the other class
 nTrees (int) – Number of orthogonal minimal spanning trees to use
 bias_correction (bool) – Whether or not to apply bias correction to the estimator
Returns: divergence – The HenzePenrose divergence between the classes
Return type: float

smartsvm.divergence.
merge_and_label
(X1, X2)¶ Merge two datasets
Merge the datasets into one array and create a vector of labels to preserve class membership.
Parameters:  X1 (numpy.ndarray) – Data matrix for one of the classes
 X2 (numpy.ndarray) – Data matrix for the other class
Returns:  data (numpy.ndarray) – Data matrix which vertically stacks X1 and X2
 labels (numpy.ndarray) – Labels vector of 1 and 1 for X1 and X2 data respectively

smartsvm.divergence.
ovr_divergence
(X, labels, nTrees=3, bias_correction=True)¶ Compute the OnevsRest HenzePenrose Divergence for all classes
This function is similar to
divergence()
with the difference that it computes the OnevsRest divergence. This is the divergence between one class and the collection of all the other classes together. This divergence is computed for each class simultaneously.Parameters:  X (numpy.ndarray) – Data matrix
 labels (numpy.ndarray) – Class labels for the instances
 nTrees (int) – The number of orthogonal MSTs to construct
 bias_correction (bool) – Whether or not to apply bias correction
Returns: ovr_divergence – Dictionary with a mapping from the class label to the OvR divergence estimate
Return type: dict
smartsvm.error_estimate module¶
Functions for computing the HenzePenrose estimate of the Bayes Error Rate
This module contains functions for computing the HenzePenrose estimate of the Bayes Error Rate (BER).

smartsvm.error_estimate.
compute_error_graph
(X, y, n_jobs=1, normalize=True)¶ Compute the complete weighted graph of pairwise BER estimates
Computes the estimate of the BER for each pair of classes and returns this in a complete weighted graph. If desired, computing the error can be done in parallel.
Parameters:  X (numpy.ndarray) – Numpy array of the data matrix
 y (numpy.ndarray) – Numpy array with the class labels
 n_jobs (int) – Number of parallel jobs to use
 normalize (bool) – Whether or not to use normalized BER estimation
Returns: G – Weighted complete graph where the class labels are the nodes and the edge weights are given by the BER estimate
Return type: networkx.Graph

smartsvm.error_estimate.
compute_ovr_error
(X, y, normalize=True)¶ Compute OvRBER for each class
The OnevsRest Bayes error rate is the error rate for a single class compared to all other classes combined. This function computes this error rate for each class in the dataset, using the
divergence.ovr_divergence()
function. By default the OvRBER is normalized with the estimates of the class prior probabilities.Parameters:  X (numpy.ndarray) – The data matrix
 y (numpy.ndarray) – A vector of class labels
 normalize (bool) – Whether or not to normalize the OvRBER
Returns: estimates – Dictionary with a mapping from the class label to a float representing the OvRBER estimate.
Return type: dict

smartsvm.error_estimate.
hp_binary
(X, y, normalize=True)¶ HenzePenrose estimation of the BER for a binary dataset
This is a wrapper around the
hp_estimate()
function for binary datasets.Parameters:  X (numpy.ndarray) – Data matrix
 y (numpy.ndarray) – Label array consisting of two distinct labels
 normalize (bool) – Whether or not to normalize the error using empirical estimates of prior probabilities
Returns: estimate – The HenzePenrose estimate of the Bayes error rate
Return type: float

smartsvm.error_estimate.
hp_estimate
(X1, X2, normalize=True)¶ HenzePenrose estimation of the Bayes Error Rate
Estimate the (normalized) Bayes error rate using the HenzePenrose estimator. The estimate is formed by averaging the upper and lower bounds on the Bayes error. By default, the estimate is normalized with the empirical estimates of the class prior probability.
Parameters:  X1 (numpy.ndarray) – Data matrix for one class
 X2 (numpy.ndarray) – Data matrix for the other class
 normalize (bool) – Whether or not to normalize the error using empirical estimates of prior probabilities
Returns: estimate – The HenzePenrose estimate of the Bayes error rate
Return type: float
smartsvm.smartsvm module¶

class
smartsvm.smartsvm.
SmartSVM
(binary_clf=<class 'sklearn.svm.classes.LinearSVC'>, clf_params=None, n_jobs=1, graph=None, cut_algorithm='stoer_wagner', normalize_error=True)¶ Bases:
smartsvm.base.HierarchicalClassifier
SmartSVM classifier for multiclass classification
This is the SmartSVM classifier. This classifier splits the classes into a hierarchy based on the weighted complete graph of pairwise estimates of the Bayes Error Rate between classes.
Note that this class is used in a binary tree. It therefore has children which are also elements of the
SmartSVM
class.This class inherits from the
HierarchicalClassifier
class, more documentation on the basic methods of the class can be found there.Parameters:  binary_clf (classifier) – The type of the binary classifier to be used for each binary subproblem. This will typically be a scikitlearn classifier such as LinearSVC, DecisionTreeClassifier, etc.
 clf_params (dict) – Parameters to pass on to the constructor of the
binary_clf
type classifier. It must be a dict with a mapping from parameter to value.  n_jobs (int) – Number of parallel jobs to use where applicable.
 graph (networkx.Graph) – Complete weighted graph with the pairwise Bayes error estimates. If it
is not supplied, it will be computed when the
fit()
method is called. This parameter exists for situations when the graph is precomputed.  cut_algorithm (str) – The algorithm to use for the graph cut. Several algorithms are
currently implemented, see
cut.graph_cut()
for more details.  normalize_error (bool) – Whether or not to normalize the Bayes error estimates with the empirical estimate of the prior probability.

elements
¶ The labels of this node in the hierarchy

fit
(X, y)¶ Fit the SmartSVM classifier
This method fits the SmartSVM classifier on the given data. If the
graph
attribute of the class is not defined, it will be computed witherror_estimate.compute_error_graph()
. If this node in the hierarchy can not be split further, no classifier will be fit. Otherwise, the node will be split to form a binary classification problem. Next, the binary classifier will be trained on this problem. Finally, the left and right children of the classifier will be trained as a recursive step.Important: This method constructs the
graph
attribute of the class if it does not exist yet. However, if the graph does exist, it is not recomputed. This means that a problem can occur if you construct an instance of theSmartSVM
classifier, fit it, then fit it again with different data. Namely, the graph for the first data may not be appropriate for the second dataset. The solution is however simple: when fitting with a different dataset, reset the graph usingclf.graph = None
.Parameters:  X (numpy.ndarray, accept csr sparse) – Training data, feature matrix
 y (numpy.ndarray) – Training data, label vector
Returns: self – Returns self.
Return type: object
smartsvm.utils module¶
Useful numerical utilities.

smartsvm.utils.
indices_from_classes
(classes, y)¶ Create index vector for all labels in classes
Create an index vector with the same dimensions as the vector y, with indices equal to True if the label is in the list of classes.
Parameters:  classes (list or numpy array) – Classes to check membership of
 y (numpy.ndarray) – Label vector
Returns: indices – Boolean vector with True for the elements of
y
which occur inclasses
and False otherwise.Return type: numpy.ndarray (bool)