# Scikit-learn Interface (alpha)¶

This module support an interface between hyperspectral algorithms and scikit-learn.

Utility functions.

See the example file nbex_skl_snow for a use of HyperEstimatorCrossVal and HyperSVC. See test_sklearn for an example.

Note

This is an alpha version . This module will certainly grow with time and anything can change, class name, class interface and so on.

## Cross Validation¶

class pysptools.skl.HyperEstimatorCrossVal(estimator, param_grid)[source]

Do a cross validation on a hypercube or a concatenation of hypercubes. Use scikit-learn KFold and GridSearchCV.

fit(X, y)[source]

Run the cross validation.

Parameters: X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum. y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
fit_cube(M, mask)[source]

Do a cross validation on a hypercube

Parameters: M – numpy array A HSI cube (m x n x p). mask – numpy array A class map mask.
get_best_params()[source]
Returns: dic
Dic of best match.
print(label='No title')[source]

Print a summary for the cross validation results.

Parameters: label – string The test title.

class pysptools.skl.HyperAdaBoostClassifier(base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm='SAMME.R', random_state=None)[source]

Apply scikit-learn AdaBoostClassifier on a hypercube.

For the __init__ class contructor parameters: see the sklearn.ensemble.AdaBoostClassifier class parameters

The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.

classify(M)[source]

Classify a hyperspectral cube.

Parameters: M – numpy array A HSI cube (m x n x p).
Returns: numpy array
A class map (m x n x 1).
display_feature_importances(n_labels='all', height=0.2, sort=False, suffix=None)[source]

Display the feature importances. The output can be split in n graphs.

Parameters: n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated. height – float [default 0.2] The bar height (in fact width). sort – boolean [default False] If true the feature importances are sorted. suffix – string [default None] Add a suffix to the file name.
fit(X, y, sample_weight=None)[source]

Same as the sklearn.ensemble.GradientBoostingClassifier fit call.

Parameters: X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum. y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value. sample_weight – array-like of shape = [n_samples], optional Sample weights. If None, the sample weights are initialized to 1 / n_samples.
fit_rois(M, ROIs)[source]

Fit the HS cube M with the use of ROIs.

Parameters: M – numpy array A HSI cube (m x n x p). ROIs – ROIs type Regions of interest instance.
plot_feature_importances(path, n_labels='all', height=0.2, sort=False, suffix=None)[source]

Plot the feature importances. The output can be split in n graphs.

Parameters: path – string The path where to save the plot. n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated. height – float [default 0.2] The bar height (in fact width). sort – boolean [default False] If true the feature importances are sorted. suffix – string [default None] Add a suffix to the file name.

## HyperBaggingClassifier¶

class pysptools.skl.HyperBaggingClassifier(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=1, random_state=None, verbose=0)[source]

Apply scikit-learn BaggingClassifier on a hypercube.

For the __init__ class contructor parameters: see the sklearn.ensemble.BaggingClassifier class parameters

The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.

classify(M)[source]

Classify a hyperspectral cube.

Parameters: M – numpy array A HSI cube (m x n x p).
Returns: numpy array
A class map (m x n x 1).
fit(X, y, sample_weight=None)[source]

Same as the sklearn.ensemble.BaggingClassifier fit call.

Parameters: X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum. y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value. sample_weight – array-like, shape = [n_samples] or None Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
fit_rois(M, ROIs)[source]

Fit the HS cube M with the use of ROIs.

Parameters: M – numpy array A HSI cube (m x n x p). ROIs – ROIs type Regions of interest instance.

## HyperExtraTreesClassifier¶

class pysptools.skl.HyperExtraTreesClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)[source]

Apply scikit-learn ExtraTreesClassifier on a hypercube.

For the __init__ class contructor parameters: see the sklearn.ensemble.ExtraTreesClassifier

The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.

classify(M)[source]

Classify a hyperspectral cube.

Parameters: M – numpy array A HSI cube (m x n x p).
Returns: numpy array
A class map (m x n x 1).
display_feature_importances(n_labels='all', height=0.2, sort=False, suffix=None)[source]

Display the feature importances. The output can be split in n graphs.

Parameters: n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated. height – float [default 0.2] The bar height (in fact width). sort – boolean [default False] If true the feature importances are sorted. suffix – string [default None] Add a suffix to the file name.
fit(X, y, sample_weight=None)[source]

Same as the sklearn.ensemble.ExtraTreesClassifier fit call.

Parameters: X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum. y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value. sample_weight – array-like, shape = [n_samples] or None Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. In the case of classification, splits are also ignored if they would result in any single class carrying a negative weight in either child node.
fit_rois(M, ROIs)[source]

Fit the HS cube M with the use of ROIs.

Parameters: M – numpy array A HSI cube (m x n x p). ROIs – ROIs type Regions of interest instance.
plot_feature_importances(path, n_labels='all', height=0.2, sort=False, suffix=None)[source]

Plot the feature importances. The output can be split in n graphs.

Parameters: path – string The path where to save the plot. n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated. height – float [default 0.2] The bar height (in fact width). sort – boolean [default False] If true the feature importances are sorted. suffix – string [default None] Add a suffix to the file name.

## HyperGaussianNB¶

class pysptools.skl.HyperGaussianNB(priors=None)[source]

Apply scikit-learn GaussianNB on a hypercube.

For the __init__ class contructor parameters: see the sklearn.naive_bayes.GaussianNB class parameters

The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.

classify(M)[source]

Classify a hyperspectral cube.

Parameters: M – numpy array A HSI cube (m x n x p).
Returns: numpy array
A class map (m x n x 1).
fit(X, y, sample_weight=None)[source]

Same as the sklearn.naive_bayes.GaussianNB fit call.

Parameters: X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum. y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
fit_rois(M, ROIs)[source]

Fit the HS cube M with the use of ROIs.

Parameters: M – numpy array A HSI cube (m x n x p). ROIs – ROIs type Regions of interest instance.

class pysptools.skl.HyperGradientBoostingClassifier(loss='deviance', learning_rate=0.1, n_estimators=100, subsample=1.0, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_split=1e-07, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort='auto')[source]

Apply scikit-learn GradientBoostingClassifier on a hypercube.

For the __init__ class contructor parameters: see the sklearn.ensemble.GradientBoostingClassifier class parameters

The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.

classify(M)[source]

Classify a hyperspectral cube.

Parameters: M – numpy array A HSI cube (m x n x p).
Returns: numpy array
A class map (m x n x 1).
display_feature_importances(n_labels='all', height=0.2, sort=False, suffix=None)[source]

Display the feature importances. The output can be split in n graphs.

Parameters: n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated. height – float [default 0.2] The bar height (in fact width). sort – boolean [default False] If true the feature importances are sorted. suffix – string [default None] Add a suffix to the file name.
fit(X, y)[source]

Same as the sklearn.ensemble.GradientBoostingClassifier fit call.

Parameters: X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum. y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
fit_rois(M, ROIs)[source]

Fit the HS cube M with the use of ROIs.

Parameters: M – numpy array A HSI cube (m x n x p). ROIs – ROIs type Regions of interest instance.
plot_feature_importances(path, n_labels='all', height=0.2, sort=False, suffix=None)[source]

Plot the feature importances. The output can be split in n graphs.

Parameters: path – string The path where to save the plot. n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated. height – float [default 0.2] The bar height (in fact width). sort – boolean [default False] If true the feature importances are sorted. suffix – string [default None] Add a suffix to the file name.

## HyperKNeighborsClassifier¶

class pysptools.skl.HyperKNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=1, **kwargs)[source]

Apply scikit-learn KNeighborsClassifier on a hypercube.

For the __init__ class contructor parameters: see the sklearn.neighbors.KNeighborsClassifier class parameters

The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.

classify(M)[source]

Classify a hyperspectral cube.

Parameters: M – numpy array A HSI cube (m x n x p).
Returns: numpy array
A class map (m x n x 1).
fit(X, y)[source]

Same as the sklearn.neighbors.KNeighborsClassifier fit call.

Parameters: X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum. y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
fit_rois(M, ROIs)[source]

Fit the HS cube M with the use of ROIs.

Parameters: M – numpy array A HSI cube (m x n x p). ROIs – ROIs type Regions of interest instance.

## HyperLogisticRegression¶

class pysptools.skl.HyperLogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='liblinear', max_iter=100, multi_class='ovr', verbose=0, warm_start=False, n_jobs=1)[source]

Apply scikit-learn LogisticRegression on a hypercube.

For the __init__ class contructor parameters: see the sklearn.linear_model.LogisticRegression class parameters

The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.

classify(M)[source]

Classify a hyperspectral cube.

Parameters: M – numpy array A HSI cube (m x n x p).
Returns: numpy array
A class map (m x n x 1).
fit(X, y)[source]

Same as the sklearn.linear_model.HyperLogisticRegression fit call.

Parameters: X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum. y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
fit_rois(M, ROIs)[source]

Fit the HS cube M with the use of ROIs.

Parameters: M – numpy array A HSI cube (m x n x p). ROIs – ROIs type Regions of interest instance.

## HyperRandomForestClassifier¶

class pysptools.skl.HyperRandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)[source]

Apply scikit-learn RandomForestClassifier on a hypercube.

For the __init__ class contructor parameters: see the sklearn.ensemble.RandomForestClassifier class parameters

The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.

classify(M)[source]

Classify a hyperspectral cube.

Parameters: M – numpy array A HSI cube (m x n x p).
Returns: numpy array
A class map (m x n x 1).
display_feature_importances(n_labels='all', height=0.2, sort=False, suffix=None)[source]

Display the feature importances. The output can be split in n graphs.

Parameters: n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated. height – float [default 0.2] The bar height (in fact width). sort – boolean [default False] If true the feature importances are sorted. suffix – string [default None] Add a suffix to the file name.
fit(X, y)[source]

Same as the sklearn.ensemble.RandomForestClassifier fit call.

Parameters: X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum. y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
fit_rois(M, ROIs)[source]

Fit the HS cube M with the use of ROIs.

Parameters: M – numpy array A HSI cube (m x n x p). ROIs – ROIs type Regions of interest instance.
plot_feature_importances(path, n_labels='all', height=0.2, sort=False, suffix=None)[source]

Plot the feature importances. The output can be split in n graphs.

Parameters: path – string The path where to save the plot. n_labels – string or integer The number of labels to output by graph. If the value is ‘all’, only one graph is generated. height – float [default 0.2] The bar height (in fact width). sort – boolean [default False] If true the feature importances are sorted. suffix – string [default None] Add a suffix to the file name.

## Suppot Vector Supervised Classification (HyperSVC)¶

see test_HyperSVC.py for an example

class pysptools.skl.HyperSVC(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None, random_state=None)[source]

Apply scikit-learn SVC on a hypercube.

For the __init__ class contructor parameters: see the sklearn.svm.SVC class parameters

The class is intrumented to be use with the scikit-learn cross validation. It use the plot and display methods from the class Output.

Note: the class always do a preprocessing.scale before any processing.

Note: the C parameter is set to 1, the result of this setting is that the class_weight is relative to C and that the first value of class_weight is the background. An example: you wish to fit two classes “1” and “2” with the help of one ROI for each, you declare class_weight like this:

• class_weight={0:1,1:10,2:10}
• 0: is always the background and is set to 1, 1: is the first class,
• 2: is the second. A value of 10 for both classes give good results to start with.
classify(M)[source]

Classify a hyperspectral cube. Do a preprocessing.scale before.

Parameters: M – numpy array A HSI cube (m x n x p).
Returns: numpy array
A class map (m x n x 1).
fit(X, y)[source]

Same as the sklearn.svm.SVC fit call, but with preprocessing.scale call first.

Parameters: X – numpy array A vector (n_samples, n_features) where each element n_features is a spectrum. y – numpy array Target values (n_samples,). A zero value is the background. A value of one or more is a class value.
fit_rois(M, ROIs)[source]

Fit the HS cube M with the use of ROIs.

Parameters: M – numpy array A HSI cube (m x n x p). ROIs – ROIs type Regions of interest instance.
predict(X)[source]

Same as the sklearn.svm.SVC predict call, but with a call to preprocessing.scale first.

Parameters: X – numpy array A vector where each element is a spectrum.

## Unsupervised clustering using KMeans¶

See the file test_kmeans.py for an example.

class pysptools.skl.KMeans[source]

KMeans clustering algorithm adapted to hyperspectral imaging

display(interpolation='none', colorMap='Accent', suffix=None)

Display the cluster map.

Parameters: path – string The path where to put the plot. interpolation – string [default none] A matplotlib interpolation method. colorMap – string [default ‘Accent’] A color map element of [‘Accent’, ‘Dark2’, ‘Paired’, ‘Pastel1’, ‘Pastel2’, ‘Set1’, ‘Set2’, ‘Set3’], “Accent” is the default and it fall back on “Jet”. suffix – string [default None] Add a suffix to the title.
plot(path, interpolation='none', colorMap='Accent', suffix=None)

Plot the cluster map.

Parameters: path – string The path where to put the plot. interpolation – string [default none] A matplotlib interpolation method. colorMap – string [default ‘Accent’] A color map element of [‘Accent’, ‘Dark2’, ‘Paired’, ‘Pastel1’, ‘Pastel2’, ‘Set1’, ‘Set2’, ‘Set3’], “Accent” is the default and it fall back on “Jet”. suffix – string [default None] Add a suffix to the file name.
predict(M, n_clusters=5, n_jobs=1, init='k-means++')

KMeans clustering algorithm adapted to hyperspectral imaging. It is a simple wrapper to the scikit-learn version.

Parameters: M – numpy array A HSI cube (m x n x p). n_clusters – int [default 5] The number of clusters to generate. n_jobs – int [default 1] Taken from scikit-learn doc: The number of jobs to use for the computation. This works by breaking down the pairwise matrix into n_jobs even slices and computing them in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. init – string or array [default ‘k-means++’] Taken from scikit-learn doc: Method for initialization, defaults to k-means++: k-means++ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. random: choose k observations (rows) at random from data for the initial centroids. If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.
Returns: numpy array
A cluster map (m x n x c), c is the clusters number .

## hyper_scale¶

pysptools.skl.hyper_scale(M)[source]

Center a hyperspectral image to the mean and component wise scale to unit variance.

Call scikit-learn preprocessing.scale()

## shape_to_XY¶

pysptools.skl.shape_to_XY(M_list, cmap_list)[source]

Receive as input a hypercubes list and the corresponding masks list. The function reshape and concatenate both to create the X and Y arrays.

Parameters: M_list – numpy array list A list of HSI cube (m x n x p). cmap_list – numpy array list A list of class map (m x n), as usual the classes are numbered: 0 for the background, 1 for the first class ...