| Title: | Topological k-NN Classifier Based on Self-Organising Maps |
|---|---|
| Description: | A topological version of k-NN: An abstract model is build as 2-dimensional self-organising map. Samples of unknown class are predicted by mapping them on the SOM and analysing class membership of neurons in the neighbourhood. |
| Authors: | Andreas Dominik |
| Maintainer: | Andreas Dominik <[email protected]> |
| License: | GPL-3 |
| Version: | 1.4.4 |
| Built: | 2026-05-19 08:19:52 UTC |
| Source: | https://github.com/cran/som.nn |
The package som.nn provides tools to train self-organising maps
and predict class memberships by means of a k-NN-like classifier.
The functions som.nn.train and som.nn.continue are used
train and re-train self-organising maps. The training can be performed with functions
of the packages
kohonen, som, class or with pure-R-implementations with
distance function bubble (kernel internal) or
gaussian (kernel gaussian).
(Remark: The pure-R-impelementations actually are faster as the external calls to
C implementations in the above-mentioned packages!).
In contrast to a normal som training, class lables are required for all training samples. These class lables are used to assign classes to the codebook vectors (i.e. the neurons of the map) after the training and build the set of reference vectors. This reference is used for nearest-neigbour classification.
The nearest neighbour classifier is implemented as predict method. It is controlled by the following parameters:
dist.fun: the distance function to weight the distance of reference
vectors and the sample to be predicted.
max.dist: the maximum distance to be considered.
Some distance functions are provided in the package (linear, bubble, inverse and tricubic) but any custom function scan be defined as well.
The prediction differs significantly from a standard nearest-neighbour classifier, because the neighbourhood is not defined by the distance between reference vectors and unknown sample vector. Instead the neighbourhood of the neurons on the self-oranising map is used.
Because the som have been generated by an unsupervised training, the classifier is robust against overtraining.
In addition the abstract model can be visualised as 2-dimensional map, using the plot method.
The function is used as distance-dependent weight for k-NN voting.
dist.fun.bubble(x, sigma = 1.1)dist.fun.bubble(x, sigma = 1.1)
x |
Distance or |
sigma |
Maximum distance to be considered. Default is 1.1. |
The function returns 1.0 for and 0.0 for .
Distance-dependent weight.
The function is used as distance-dependent weight for k-NN voting.
dist.fun.inverse(x, sigma = 1.1)dist.fun.inverse(x, sigma = 1.1)
x |
Distance or |
sigma |
Maximum distance to be considered. Default is 1.1. |
The function returns 1.0 for , 0.0 for and
for .
Distance-dependent weight.
The function is used as distance-dependent weight for k-NN voting.
dist.fun.linear(x, sigma = 1.1)dist.fun.linear(x, sigma = 1.1)
x |
Distance or |
sigma |
Maximum distance to be considered. Default is 1.1. |
The function returns 1.0 for , 0.0 for and
for .
Distance-dependent weight.
The tricubic function is used as distance-dependent weight for
k-NN voting.
dist.fun.tricubic(x, sigma = 1)dist.fun.tricubic(x, sigma = 1)
x |
Distance or |
sigma |
Maximum distance to be considered. |
The function returns 1.0 for , 0.0 for and
for .
Distance-dependent weight.
Calculates the distance matrix of points on the surface of a torus.
dist.torus(coors)dist.torus(coors)
coors |
|
A rectangular plane is considered as torus (i.e. on an endless plane that contimues on the left, when leaving at the right side, and in the same way connects top and bottom border). Distances between two points on the plane are calculated as the shortest distance between the points on the torus surface.
Complete distance matrix with diagonal and upper triangle values.
The constructor creates a new object of type SOMnn.
## S4 method for signature 'SOMnn' initialize( .Object, name, codes, qerror, class.idx, classes, class.counts, class.freqs, confusion, measures, accuracy, xdim, ydim, len.total, toroidal, norm, norm.center, norm.scale, dist.fun, max.dist, strict )## S4 method for signature 'SOMnn' initialize( .Object, name, codes, qerror, class.idx, classes, class.counts, class.freqs, confusion, measures, accuracy, xdim, ydim, len.total, toroidal, norm, norm.center, norm.scale, dist.fun, max.dist, strict )
.Object |
SOMnn object |
name |
optional name of the model. |
codes |
|
qerror |
sum of the mapping errors of the training data. |
class.idx |
|
classes |
|
class.counts |
|
class.freqs |
|
confusion |
|
measures |
|
accuracy |
Overall accuracy. |
xdim |
number of neurons in x-direction of the som. |
ydim |
number of neurons in y-direction of the som. |
len.total |
total number of training steps, performed to create the model. |
toroidal |
|
norm |
|
norm.center |
vector of centers for each column of training data. |
norm.scale |
vector of scale factors for each column of training data. |
dist.fun |
|
max.dist |
maximum distance |
strict |
Minimum vote for the winner (if the winner's vote is smaller than strict,
"unknown" is reported as class label ( |
The constructor needs not to be called directly, because the normal
way to create a SOMnn object is to use som.nn.train.
## Not run: new.som <- new("SOMnn", name = name, codes = codes, qerror = qerror, classes = classes, class.idx = class.idx, class.counts = class.counts, class.freqs = class.freqs, confusion = confusion, measures = measures, accuracy = accuracy, xdim = xdim, ydim = ydim, len.total = len.total, toroidal = toroidal, norm = norm, norm.center = norm.center, norm.scale = norm.scale, dist.fun = dist.fun, max.dist = max.dist. strict = strict) ## End(Not run)## Not run: new.som <- new("SOMnn", name = name, codes = codes, qerror = qerror, classes = classes, class.idx = class.idx, class.counts = class.counts, class.freqs = class.freqs, confusion = confusion, measures = measures, accuracy = accuracy, xdim = xdim, ydim = ydim, len.total = len.total, toroidal = toroidal, norm = norm, norm.center = norm.center, norm.scale = norm.scale, dist.fun = dist.fun, max.dist = max.dist. strict = strict) ## End(Not run)
Calculates a linear normalisation for the class frequencies.
norm.linear(x)norm.linear(x)
x |
vector of votes for classes |
The function is applied to a vector to squeeze the values in a way that they sum up to 1.0:
som.nn.linnorm(x) = x / sum(x)
Linear normalisation is used to normalise class distrubution during
prediction. Results seems often more reasonable, compared to softmax. The
S4 predict function for Class SOMnn allows to specify
the normalisation function as parameter.
Vector of normalised values.
Calculates a softmax-like normalisation for the class frequencies.
norm.softmax(x, t = 0.2)norm.softmax(x, t = 0.2)
x |
vector of votes for classes |
t |
temperature parameter. |
Softmax function is applied to a vector to squeeze the values in a way that they sum up to 1.0:
som.nn.softmax(x) = exp(x/T) / sum(exp(x/T))
Low values for T result in a
strong separation of output values. High values for T
make output values more equal.
Vector of softmax normalised values.
SOMnn
Creates a plot of the hexagonal som in the model of type SOMnn.
## S4 method for signature 'SOMnn,ANY' plot( x, title = TRUE, col = NA, onlyDefCols = FALSE, edit.cols = FALSE, show.legend = TRUE, legend.loc = "bottomright", legend.width = 4, window.width = NA, window.height = NA, show.box = TRUE, show.counter.border = 0.98, predict = NULL, add = FALSE, pch.col = "black", pch = 19, ... )## S4 method for signature 'SOMnn,ANY' plot( x, title = TRUE, col = NA, onlyDefCols = FALSE, edit.cols = FALSE, show.legend = TRUE, legend.loc = "bottomright", legend.width = 4, window.width = NA, window.height = NA, show.box = TRUE, show.counter.border = 0.98, predict = NULL, add = FALSE, pch.col = "black", pch = 19, ... )
x |
trained som of type |
title |
|
col |
defines colours for the classes of the dataset. Possible values include:
|
onlyDefCols |
|
edit.cols |
|
show.legend |
|
legend.loc |
Legend position as specified for |
legend.width |
size of the legend. |
window.width |
Manual setting of window width. Default is NA. |
window.height |
Manual setting of window height. Default is NA. |
show.box |
Show frame around the plot . Default is TRUE. |
show.counter.border |
Percentile as limit for the display of labels in the pie charts. Default is 0.98. Higher counts are displayed as numbers in the neuron. |
predict |
|
add |
|
pch.col |
Colour of the markers for predicted samples. |
pch |
Symbol of the markers for predicted samples. |
... |
More parameters as well as general
plot parameters are allowed; see |
In addition to the required parameters, many options can be specified to plot predicted samples and to modify colours, legend and scaling.
## get example data and add class labels: data(iris) species <- iris$Species ## train with default radius = diagonal / 2: rlen <- 500 som <- som.nn.train(iris, class.col = "Species", kernel = "internal", xdim = 15, ydim = 9, alpha = 0.2, len = rlen, norm = TRUE, toroidal = FALSE) ## continue training with different alpha and radius; som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 5) som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 2) ## predict some samples: unk <- iris[,!(names(iris) %in% "Species")] setosa <- unk[species=="setosa",] setosa <- setosa[sample(nrow(setosa), 20),] versicolor <- unk[species=="versicolor",] versicolor <- versicolor[sample(nrow(versicolor), 20),] virginica <- unk[species=="virginica",] virginica <- virginica[sample(nrow(virginica), 20),] p <- predict(som, unk) head(p) ## plot: plot(som) dev.off() plot(som, predict = predict(som, setosa)) plot(som, predict = predict(som, versicolor), add = TRUE, pch.col = "magenta", pch = 17) plot(som, predict = predict(som, virginica), add = TRUE, pch.col = "white", pch = 8)## get example data and add class labels: data(iris) species <- iris$Species ## train with default radius = diagonal / 2: rlen <- 500 som <- som.nn.train(iris, class.col = "Species", kernel = "internal", xdim = 15, ydim = 9, alpha = 0.2, len = rlen, norm = TRUE, toroidal = FALSE) ## continue training with different alpha and radius; som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 5) som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 2) ## predict some samples: unk <- iris[,!(names(iris) %in% "Species")] setosa <- unk[species=="setosa",] setosa <- setosa[sample(nrow(setosa), 20),] versicolor <- unk[species=="versicolor",] versicolor <- versicolor[sample(nrow(versicolor), 20),] virginica <- unk[species=="virginica",] virginica <- virginica[sample(nrow(virginica), 20),] p <- predict(som, unk) head(p) ## plot: plot(som) dev.off() plot(som, predict = predict(som, setosa)) plot(som, predict = predict(som, versicolor), add = TRUE, pch.col = "magenta", pch = 17) plot(som, predict = predict(som, virginica), add = TRUE, pch.col = "white", pch = 8)
SOMnn
Predicts categories for a table of data, based on the hexagonal som in the model.
This S4 method is a wrapper for the predict method stored in the slot predict
of a model of type SOMnn.
## S4 method for signature 'SOMnn' predict(object, x)## S4 method for signature 'SOMnn' predict(object, x)
object |
object of type |
x |
|
The function returns the winner neuron in codes for
each test vector in x.
x is organised as one vector per row and must have
the same number of columns (i.e. dimensions) and the identical column names
as stored in the SOMnn object.
If data have been normalised during training, the same normalisation is applied to the unknown data to be predicted.
Probablilities are softmax normalised by default.
\code{data.frame} with columns:
\code{winner}, \code{x}, \code{y}, the predicted probabilities
for all categories and the prediction
as category index (column name \code{prediction}) and
class label (column name \code{pred.class}).
Rounds a vector of probabilities preserving their sum.
## S3 method for class 'probabilities' round(x, digits = 2)## S3 method for class 'probabilities' round(x, digits = 2)
x |
|
digits |
demanded precision |
In general, if a vector of floating point values is rounded,
the sum is not preserverd.
For a vector of probabilities (which sum up to 1.0), this may lead to
strange results.
This function rounds all values of the vector and takes care, that
the sum ist not changed (with a precision given in digits).
Calculates the sensitivity, specificity and overall accuracy for a prediction result if the corresponding vector of true class labels is provided.
som.nn.accuracy(x, class.labels)som.nn.accuracy(x, class.labels)
x |
|
class.labels |
|
Sensitivity is the classifier's ability to correctly identify samples of a specific class A. It is defined as
with TP = true positives and FN = false negatives. This is equivalent to the ratio of (correctly identified samples of class A) / (total number of samples of class A).
Specificity is the classifier's ability to correctly identify samples not of a specific class A. It is defined as
with TN = true negatives and FP = false positives. This is equivalent to the ratio of (correctly identified samples not in class A) / (total number of samples not in class A).
Accuracy is the classifier's ability to correctly classify samples of a specific class A. It is defined as
with TP = true positives, TN = true negatives and total = total number of samples of a class. This is equivalent to the ratio of (correctly classified samples) / (total number of samples).
data.frame containing sensitivity, specificity and accuracy for all
class labels in the data set.
Calculates the accuracy over all class lables for a prediction result if the corresponding vector of true class labels is provided.
som.nn.all.accuracy(x, class.labels)som.nn.all.accuracy(x, class.labels)
x |
|
class.labels |
|
It is defined as
with TP = true positives, TN = true negatives and total = total number of samples of a class. This is equivalent to the ratio of (correctly classified samples) / (total number of samples).
one value overall accuracy.
Calculates the confusion matrix for a prediction result if the corresponding vector of true class labels is provided.
som.nn.confusion(x, class.labels)som.nn.confusion(x, class.labels)
x |
|
class.labels |
|
The confusion matrix (also called table of confusion) displays the number of predicted class labels for each actual class. Example:
| pred. cat | pred. dog | pred. rabbit | unknown | |
| actual cat | 5 | 3 | 0 | 0 |
| actual dog | 2 | 3 | 1 | 0 |
| actual rabbit | 0 | 2 | 9 | 2 |
The confusion matrix includes a column unknown displaying the samples
for which no unambiguous prediction is possible.
data.frame containing the confusion matrix.
An existing self-organising map with hexagonal tolology is further trained and a model created for prediction of unknown samples. In contrast to a "normal" som, class-labels for all samples of the training set are required to build the model.
som.nn.continue( model, x, kernel = "internal", len = 0, alpha = 0.2, radius = 0 )som.nn.continue( model, x, kernel = "internal", len = 0, alpha = 0.2, radius = 0 )
model |
model of type |
x |
data.fame with training data. Samples are requested as rows and taken randomly for the
training steps. All
columns except of the class lables are considered to be attributes and parts of
the training vector.
|
kernel |
Kernel for som training. One of the predefined kernels
|
len |
number of steps to be trained (steps - not epochs!). |
alpha |
initial training rate; default 0.02. |
radius |
inital radius for SOM training. If Gaussian distance function is used, radius corresponds to sigma. |
Any specified custom kernel function is used for som training. The function must match the
signature kernel(data, grid, rlen, alpha, radius, init, toroidal), with
arguments:
data numeric matrix of training data; one sample per row
classes: optional charater vector of classes for training data
grid somgrid, generated with somgrid
rlen number of training steps
alpha training rate
radius training radius
init numeric matrix of initial codebook vectors; one code per row
toroidal logical; TRUE, if the topology of grid is toroidal
The returned value must be a list with at minimum one element
codes: numeric matrix of result codebook vectors; one code per row
S4 object of type \code{\link{SOMnn}} with the trained model
## get example data and add class labels: data(iris) species <- iris$Species ## train with default radius = diagonal / 2: rlen <- 500 som <- som.nn.train(iris, class.col = "Species", kernel = "internal", xdim = 15, ydim = 9, alpha = 0.2, len = rlen, norm = TRUE, toroidal = FALSE) ## continue training with different alpha and radius; som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 5) som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 2) ## predict some samples: unk <- iris[,!(names(iris) %in% "Species")] setosa <- unk[species=="setosa",] setosa <- setosa[sample(nrow(setosa), 20),] versicolor <- unk[species=="versicolor",] versicolor <- versicolor[sample(nrow(versicolor), 20),] virginica <- unk[species=="virginica",] virginica <- virginica[sample(nrow(virginica), 20),] p <- predict(som, unk) head(p) ## plot: plot(som) dev.off() plot(som, predict = predict(som, setosa)) plot(som, predict = predict(som, versicolor), add = TRUE, pch.col = "magenta", pch = 17) plot(som, predict = predict(som, virginica), add = TRUE, pch.col = "white", pch = 8)## get example data and add class labels: data(iris) species <- iris$Species ## train with default radius = diagonal / 2: rlen <- 500 som <- som.nn.train(iris, class.col = "Species", kernel = "internal", xdim = 15, ydim = 9, alpha = 0.2, len = rlen, norm = TRUE, toroidal = FALSE) ## continue training with different alpha and radius; som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 5) som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 2) ## predict some samples: unk <- iris[,!(names(iris) %in% "Species")] setosa <- unk[species=="setosa",] setosa <- setosa[sample(nrow(setosa), 20),] versicolor <- unk[species=="versicolor",] versicolor <- versicolor[sample(nrow(versicolor), 20),] virginica <- unk[species=="virginica",] virginica <- virginica[sample(nrow(virginica), 20),] p <- predict(som, unk) head(p) ## plot: plot(som) dev.off() plot(som, predict = predict(som, setosa)) plot(som, predict = predict(som, versicolor), add = TRUE, pch.col = "magenta", pch = 17) plot(som, predict = predict(som, virginica), add = TRUE, pch.col = "white", pch = 8)
kohonen
An existing model of type SOMnn is exported as
object of type kohonen for use with the tools of the
package kohonen.
som.nn.export.kohonen(model, train)som.nn.export.kohonen(model, train)
model |
model of type |
train |
training data |
Training data is necessary to generate the kohonen object.
Vist of type \code{kohonen} with the trained som.
See \code{\link[kohonen]{som}} for details.
SOM
An existing model of type SOMnn is exported as
object of type SOM for use with the tools of the
package class.
som.nn.export.som(model)som.nn.export.som(model)
model |
model of type |
List of type \code{SOM} with the trained som.
See \code{\link[class]{SOM}} for details.
A self-organising map with hexagonal tolology is trained in several steps and a model of Type SOMnn created for prediction of unknown samples. In contrast to a "normal" som, class-labels for all samples of the training set are required to build the topological model after SOM training.
som.nn.multitrain( x, class.col = 1, kernel = "internal", xdim = 7, ydim = 5, toroidal = FALSE, len = c(0), alpha = c(0.2), radius = c(0), focus = 1, norm = TRUE, dist.fun = dist.fun.inverse, max.dist = 1.1, name = "som.nn job" )som.nn.multitrain( x, class.col = 1, kernel = "internal", xdim = 7, ydim = 5, toroidal = FALSE, len = c(0), alpha = c(0.2), radius = c(0), focus = 1, norm = TRUE, dist.fun = dist.fun.inverse, max.dist = 1.1, name = "som.nn job" )
x |
data.fame with training data. Samples are requested as rows and taken randomly for the
training steps. All
columns except of the class lables are considered to be attributes and parts of
the training vector.
One column is needed as class labels. The column with class
lables is selected by the argument |
class.col |
single string or number. If class is a string, it is considered to be the name of the column with class labels. If class is a number, the respective column will be used as class labels (after beeing coerced to character). Default is 1. |
kernel |
kernel for som training. One of the predefined kernels
|
xdim |
dimension in x-direction. |
ydim |
dimension in y-direction. |
toroidal |
|
len |
|
alpha |
initial training rate; the learning rate is decreased linearly to 0.0 for the laset training step.
Default: 0.02.
If length( |
radius |
inital radius for SOM training.
If Gaussian distance function is used, radius corresponds to sigma.
The distance is decreased linearly to 1.0 for the last training step.
If |
focus |
Enhancement factor for focussing of training of "dirty" samples. |
norm |
logical; if TRUE, input data is normalised by |
dist.fun |
parameter for k-NN prediction: Function used to calculate
distance-dependent weights. Any distance function must accept the two parameters
|
max.dist |
parameter for k-NN prediction: Parameter |
name |
optional name for the model. Name will be stored as slot |
Besides of the predefined kernels
"bubble", "gaussian", "SOM", "kohonen" or "som",
any specified custom kernel function can be used for som training. The function must match the
signature kernel(data, grid, rlen, alpha, radius, init, toroidal), with
arguments:
data: numeric matrix of training data; one sample per row
classes: optional charater vector of classes for training data
grid: somgrid, generated with somgrid
rlen: number of training steps
alpha: training rate
radius: training radius
init: numeric matrix of initial codebook vectors; one code per row
toroidal: logical; TRUE, if the topology of grid is toroidal
The returned value must be a list with at minimum one element
codes: numeric matrix of result codebook vectors; one code per row
If focus > 1 enhancement of dirty samples is activated:
Training samples, mapped to neuron with >1 classes, are preferred in the next training step.
S4 object of type \code{\link{SOMnn}} with the trained model
## get example data and add class labels: data(iris) species <- iris$Species ## train with default radius = diagonal / 2: rlen <- 500 som <- som.nn.train(iris, class.col = "Species", kernel = "internal", xdim = 15, ydim = 9, alpha = 0.2, len = rlen, norm = TRUE, toroidal = FALSE) ## continue training with different alpha and radius; som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 5) som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 2) ## predict some samples: unk <- iris[,!(names(iris) %in% "Species")] setosa <- unk[species=="setosa",] setosa <- setosa[sample(nrow(setosa), 20),] versicolor <- unk[species=="versicolor",] versicolor <- versicolor[sample(nrow(versicolor), 20),] virginica <- unk[species=="virginica",] virginica <- virginica[sample(nrow(virginica), 20),] p <- predict(som, unk) head(p) ## plot: plot(som) dev.off() plot(som, predict = predict(som, setosa)) plot(som, predict = predict(som, versicolor), add = TRUE, pch.col = "magenta", pch = 17) plot(som, predict = predict(som, virginica), add = TRUE, pch.col = "white", pch = 8)## get example data and add class labels: data(iris) species <- iris$Species ## train with default radius = diagonal / 2: rlen <- 500 som <- som.nn.train(iris, class.col = "Species", kernel = "internal", xdim = 15, ydim = 9, alpha = 0.2, len = rlen, norm = TRUE, toroidal = FALSE) ## continue training with different alpha and radius; som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 5) som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 2) ## predict some samples: unk <- iris[,!(names(iris) %in% "Species")] setosa <- unk[species=="setosa",] setosa <- setosa[sample(nrow(setosa), 20),] versicolor <- unk[species=="versicolor",] versicolor <- versicolor[sample(nrow(versicolor), 20),] virginica <- unk[species=="virginica",] virginica <- virginica[sample(nrow(virginica), 20),] p <- predict(som, unk) head(p) ## plot: plot(som) dev.off() plot(som, predict = predict(som, setosa)) plot(som, predict = predict(som, versicolor), add = TRUE, pch.col = "magenta", pch = 17) plot(som, predict = predict(som, virginica), add = TRUE, pch.col = "white", pch = 8)
Parameters for the k-NN-like classification can be set for an existing model of type SOMnn after training.
som.nn.set( model, x, dist.fun = NULL, max.dist = NULL, strict = NULL, name = NULL )som.nn.set( model, x, dist.fun = NULL, max.dist = NULL, strict = NULL, name = NULL )
model |
model of type |
x |
data.fame with training data. Samples are requested as rows and taken randomly for the
training steps. All
columns except of the class lables are considered to be attributes and parts of
the training vector.
|
dist.fun |
distance function for weighting distances between codebook vectors on the som (kernel for k-NN classifier). |
max.dist |
maximum distance to be considered by the nearest-neighbour counting. |
strict |
strictness for class label assignment. Default = 0.8. |
name |
new name of the model. |
The distance function defines the behaviour of the k-nearest-neighbour algorithm.
Choices for the distance function include dist.fun.inverse or dist.fun.tricubic,
as defined in this package, or any other function that accepts exactly two arguments x
(the distance) and sigma (a parameter defined by max.distance).
A data set must be presented to calculate the accuracy statistics of the modified predictor.
S4 object of type \code{\link{SOMnn}} with the updated model.
dist.fun.bubble, dist.fun.linear,
dist.fun.inverse, dist.fun.tricubic.
A self-organising map with hexagonal tolology is trained and a model of Type SOMnn created for prediction of unknown samples. In contrast to a "normal" som, class-labels for all samples of the training set are required to build the topological model after SOM training.
som.nn.train( x, class.col = 1, kernel = "internal", xdim = 7, ydim = 5, toroidal = FALSE, len = 0, alpha = 0.2, radius = 0, norm = TRUE, dist.fun = dist.fun.inverse, max.dist = 1.1, strict = 0.8, name = "som.nn job" )som.nn.train( x, class.col = 1, kernel = "internal", xdim = 7, ydim = 5, toroidal = FALSE, len = 0, alpha = 0.2, radius = 0, norm = TRUE, dist.fun = dist.fun.inverse, max.dist = 1.1, strict = 0.8, name = "som.nn job" )
x |
data.fame with training data. Samples are requested as rows and taken randomly for the
training steps. All
columns except of the class lables are considered to be attributes and parts of
the training vector.
One column is needed as class labels. The column with class
lables is selected by the argument |
class.col |
single string or number. If class is a string, it is considered to be the name of the column with class labels. If class is a number, the respective column will be used as class labels (after beeing coerced to character). Default is 1. |
kernel |
kernel for som training. One of the predefined kernels
|
xdim |
dimension in x-direction. |
ydim |
dimension in y-direction. |
toroidal |
|
len |
number of steps to be trained (steps - not epochs!). |
alpha |
initial training rate; the learning rate is decreased linearly to 0.0 for the laset training step. Default: 0.02. |
radius |
inital radius for SOM training.
If Gaussian distance function is used, radius corresponds to sigma.
The distance is decreased linearly to 1.0 for the last training step.
If |
norm |
logical; if TRUE, input data is normalised by |
dist.fun |
parameter for k-NN prediction: Function used to calculate
distance-dependent weights. Any distance function must accept the two parameters
|
max.dist |
parameter for k-NN prediction: Parameter |
strict |
Minimum vote for the winner (if the winner's vote is smaller than strict,
"unknown" is reported as class label ( |
name |
optional name for the model. Name will be stored as slot |
Besides of the predefined kernels
"internal", "gaussian", "SOM", "kohonen" or "som",
any specified custom kernel function can be used for som training. The function must match the
signature kernel(data, grid, rlen, alpha, radius, init, toroidal), with
arguments:
data: numeric matrix of training data; one sample per row
classes: optional charater vector of classes for training data
grid: somgrid, generated with somgrid
rlen: number of training steps
alpha: training rate
radius: training radius
init: numeric matrix of initial codebook vectors; one code per row
toroidal: logical; TRUE, if the topology of grid is toroidal
The returned value must be a list with at minimum one element
codes: numeric matrix of result codebook vectors; one code per row
S4 object of type \code{\link{SOMnn}} with the trained model
## get example data and add class labels: data(iris) species <- iris$Species ## train with default radius = diagonal / 2: rlen <- 500 som <- som.nn.train(iris, class.col = "Species", kernel = "internal", xdim = 15, ydim = 9, alpha = 0.2, len = rlen, norm = TRUE, toroidal = FALSE) ## continue training with different alpha and radius; som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 5) som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 2) ## predict some samples: unk <- iris[,!(names(iris) %in% "Species")] setosa <- unk[species=="setosa",] setosa <- setosa[sample(nrow(setosa), 20),] versicolor <- unk[species=="versicolor",] versicolor <- versicolor[sample(nrow(versicolor), 20),] virginica <- unk[species=="virginica",] virginica <- virginica[sample(nrow(virginica), 20),] p <- predict(som, unk) head(p) ## plot: plot(som) dev.off() plot(som, predict = predict(som, setosa)) plot(som, predict = predict(som, versicolor), add = TRUE, pch.col = "magenta", pch = 17) plot(som, predict = predict(som, virginica), add = TRUE, pch.col = "white", pch = 8)## get example data and add class labels: data(iris) species <- iris$Species ## train with default radius = diagonal / 2: rlen <- 500 som <- som.nn.train(iris, class.col = "Species", kernel = "internal", xdim = 15, ydim = 9, alpha = 0.2, len = rlen, norm = TRUE, toroidal = FALSE) ## continue training with different alpha and radius; som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 5) som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 2) ## predict some samples: unk <- iris[,!(names(iris) %in% "Species")] setosa <- unk[species=="setosa",] setosa <- setosa[sample(nrow(setosa), 20),] versicolor <- unk[species=="versicolor",] versicolor <- versicolor[sample(nrow(versicolor), 20),] virginica <- unk[species=="virginica",] virginica <- virginica[sample(nrow(virginica), 20),] p <- predict(som, unk) head(p) ## plot: plot(som) dev.off() plot(som, predict = predict(som, setosa)) plot(som, predict = predict(som, versicolor), add = TRUE, pch.col = "magenta", pch = 17) plot(som, predict = predict(som, virginica), add = TRUE, pch.col = "white", pch = 8)
A model of type SOMnn is tested with a validation dataset. The dataset must
include a column with correct class labels.
The model is used to predict class labels. Confusion table,
specificity, sensitivity and accuracy for each class are calculated.
som.nn.validate(model, x)som.nn.validate(model, x)
model |
model of type |
x |
data.fame with validation data. Samples are requested as rows.
|
Parameters stored in the model are applied for k-NN-like prediction. If necessary
the parameters can be changed by som.nn.set before testing.
The funcion is only a wrapper and actually calls som.nn.continue with the test data and
without training (i.e. len = 0).
S4 object of type \code{\link{SOMnn}} with the unchanged model and the
test statistics for the test data.
## get example data and add class labels: data(iris) species <- iris$Species ## train with default radius = diagonal / 2: rlen <- 500 som <- som.nn.train(iris, class.col = "Species", kernel = "internal", xdim = 15, ydim = 9, alpha = 0.2, len = rlen, norm = TRUE, toroidal = FALSE) ## continue training with different alpha and radius; som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 5) som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 2) ## predict some samples: unk <- iris[,!(names(iris) %in% "Species")] setosa <- unk[species=="setosa",] setosa <- setosa[sample(nrow(setosa), 20),] versicolor <- unk[species=="versicolor",] versicolor <- versicolor[sample(nrow(versicolor), 20),] virginica <- unk[species=="virginica",] virginica <- virginica[sample(nrow(virginica), 20),] p <- predict(som, unk) head(p) ## plot: plot(som) dev.off() plot(som, predict = predict(som, setosa)) plot(som, predict = predict(som, versicolor), add = TRUE, pch.col = "magenta", pch = 17) plot(som, predict = predict(som, virginica), add = TRUE, pch.col = "white", pch = 8)## get example data and add class labels: data(iris) species <- iris$Species ## train with default radius = diagonal / 2: rlen <- 500 som <- som.nn.train(iris, class.col = "Species", kernel = "internal", xdim = 15, ydim = 9, alpha = 0.2, len = rlen, norm = TRUE, toroidal = FALSE) ## continue training with different alpha and radius; som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 5) som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 2) ## predict some samples: unk <- iris[,!(names(iris) %in% "Species")] setosa <- unk[species=="setosa",] setosa <- setosa[sample(nrow(setosa), 20),] versicolor <- unk[species=="versicolor",] versicolor <- versicolor[sample(nrow(versicolor), 20),] virginica <- unk[species=="virginica",] virginica <- virginica[sample(nrow(virginica), 20),] p <- predict(som, unk) head(p) ## plot: plot(som) dev.off() plot(som, predict = predict(som, setosa)) plot(som, predict = predict(som, versicolor), add = TRUE, pch.col = "magenta", pch = 17) plot(som, predict = predict(som, virginica), add = TRUE, pch.col = "white", pch = 8)
Maps a sample of unknown category to a self-organising map (SOM) stored in a object of type SOMnn.
som.nn.visual(codes, data)som.nn.visual(codes, data)
codes |
|
data |
|
The function returns the winner neuron in codes for
each test vector in x.
codes and x are one vector per row and must have
the same number of columns (i.e. dimensions) and the identical column names.
som.nn.visual is the work horse for the k-NN-like classifier and normally used
from predict.
\code{data.frame} with 2 columns:
\itemize{
\item Index of the winner neuron for each row (index starting at 1).
\item Distance between winner and row.
}
Objects of type SOMnn can be created by training a self-organising map
with som.nn.train.
nameoptional name of the model.
datetime and date of creation.
codesdata.frame with codebook vectors of the som.
qerrorsum of the mapping errors of the training data.
class.idxcolumn index of column with class labels in input data.
classescharacter vector with names of categories.
class.countsdata.frame with class hits for each neuron.
class.freqsdata.frame with class frequencies for each neuron
(freqs sum up to 1).
normlogical; if TRUE, data is normalised before training and mapping.
Parameters for normalisation of training data is stored in the model and
applied before mapping of test data.
norm.centervector of centers for each column of training data.
norm.scalevector of scale factors for each column of training data.
confusiondata.frame with confusion matrix for training data.
measuresdata.frame with classes as rows and the
columns sensitivity, specificity and accuracy for each class.
accuracyThe overall accuracy calculated based on the confusion matrix cmat:
.
xdimnumber of neurons in x-direction of the som.
ydimnumber of neurons in y-direction of the som.
len.totaltotal number of training steps, performed to create the model.
toroidallogical; if TRUE, the map is toroidal (i.e. borderless).
dist.funfunction; kernel for the kNN classifier.
max.distmaximum distance for the kNN classifier.
strictMinimum vote for the winner (if the winner's vote is smaller than strict,
"unknown" is reported as class label (default = 0.8).