Title: | SOM Algorithm for the Analysis of Multivariate Environmental Data |
---|---|
Description: | Analysis of multivariate environmental high frequency data by Self-Organizing Map and k-means clustering algorithms. By means of the graphical user interface it provides a comfortable way to elaborate by self-organizing map algorithm rather big datasets (txt files up to 100 MB ) obtained by environmental high-frequency monitoring by sensors/instruments. The functions present in the package are based on 'kohonen' and 'openair' packages implemented by functions embedding Vesanto et al. (2001) <http://www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf> heuristic rules for map initialization parameters, k-means clustering algorithm and map features visualization. Cluster profiles visualization as well as graphs dedicated to the visualization of time-dependent variables Licen et al. (2020) <doi:10.4209/aaqr.2019.08.0414> are provided. |
Authors: | Sabina Licen [aut, cre], Marco Franzon [aut], Tommaso Rodani [aut], Pierluigi Barbieri [aut] |
Maintainer: | Sabina Licen <[email protected]> |
License: | GPL-2 |
Version: | 0.1.1 |
Built: | 2024-11-12 04:56:29 UTC |
Source: | https://github.com/somenv/somenv |
The function finds the Best Matching Units of the cluster centroids
BmusCentr(centroids, som_model, k)
BmusCentr(centroids, som_model, k)
centroids |
Centroids array (output of kmeans_clustersR function) |
som_model |
An object of class kohonen |
k |
Number of clusters |
An array containing the BMU for each centroid
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
Generate a vector containing the cluster assignment to experimental data
BmusClus(Bmus, Cluster)
BmusClus(Bmus, Cluster)
Bmus |
Best Matching Unit assignment to the experimental data |
Cluster |
Vector containing cluster number assignment for prototypes |
A vector containing the cluster assignment to experimental data
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
Boxplot function is used, box whiskers are omitted
BoxClus(Dms, codebook, Cluster)
BoxClus(Dms, codebook, Cluster)
Dms |
A vector of length 2, where the first argument specifies the number of rows and the second the number of columns of plots (see mfrow in par) |
codebook |
De-normalized prototype codebook |
Cluster |
Vector containing cluster number assignment for prototypes |
Boxplot of prototype variables split by cluster
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
boxplot, par
Boxplot function is used, box whiskers are omitted
BoxUnits(codebook, Cluster, Ylim = NA, pitch = NA, xdim = 0.75)
BoxUnits(codebook, Cluster, Ylim = NA, pitch = NA, xdim = 0.75)
codebook |
Prototype codebook normalized by variable |
Cluster |
Vector containing cluster number assignment for prototypes |
Ylim |
Vector of length 2 for y-axis limits |
pitch |
Vector containing the position of horizontal grid lines |
xdim |
x axes label dimensions |
Boxplot of prototype variables split by cluster
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
boxplot
Generate the sequence of colors to plot the SOM map according to clusters
ClusCol(Centroids, Cluster, colSeq = rainbow(nrow(data.frame(Centroids))))
ClusCol(Centroids, Cluster, colSeq = rainbow(nrow(data.frame(Centroids))))
Centroids |
Centroids matrix |
Cluster |
Vector containing cluster number assignment for prototypes |
colSeq |
Color sequence for the clusters |
A vector of colors with length equal to Cluster
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
Generate X and Y coordinates for plotting a SOM map shaped according to Vesanto visualization fashion
CodeCoord(Row, Col)
CodeCoord(Row, Col)
Row |
Number of SOM map rows |
Col |
Number of SOM map columns |
This function returns a data.frame
including columns:
X
Y
Sabina Licen, Pierluigi Barbieri
J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf; Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
Coord<-CodeCoord(10,5)
Coord<-CodeCoord(10,5)
The function produces a plot representing the the daily percentage for each cluster
DailyBar( experimental, TrainClus, colSeq = rainbow(length(levels(as.factor(TrainClus)))), Total = 1440, xdim = 0.7, ydim = 0.8 )
DailyBar( experimental, TrainClus, colSeq = rainbow(length(levels(as.factor(TrainClus)))), Total = 1440, xdim = 0.7, ydim = 0.8 )
experimental |
Experimental data (must contain variable "date") |
TrainClus |
Vector containing cluster number assignment for experimental data |
colSeq |
Color sequence for the clusters |
Total |
Number of observations per day |
xdim |
x axes label dimensions |
ydim |
y axes label dimensions |
Plot of daily percentages for each cluster, the latter element in the legend represents percentage of not determined data
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
The function has been coded in R code starting from db_index.m script present in somtoolbox for Matlab by Vesanto and adapted for the use in the shiny app
db_indexR(codebook, k_best, c_best)
db_indexR(codebook, k_best, c_best)
codebook |
SOM codebook |
k_best |
Vector with cluster number assignment for each sample |
c_best |
Matrix with cluster centroids |
The mean DB-index for the clustering
Sabina Licen, Pierluigi Barbieri
J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf
som_mdistR, kmeans_clustersRProg
Percentage frequency for each cluster
Freq(Cluster)
Freq(Cluster)
Cluster |
Vector containing cluster number assignment for experimental data |
A data frame containing the percentage frequency of each cluster
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
Daily percentage frequency for each cluster
FreqD(Date, Cluster, Total = 1440)
FreqD(Date, Cluster, Total = 1440)
Date |
Vector containing date/time variable for experimental data |
Cluster |
Vector containing cluster number assignment for experimental data |
Total |
Number of observations per day |
A data frame containing the daily percentage frequency of each cluster
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
Monthly percentage frequency for each cluster
FreqM(Date, Cluster)
FreqM(Date, Cluster)
Date |
Vector containing date/time variable for experimental data |
Cluster |
Vector containing cluster number assignment for experimental data |
A data frame containing the monthly percentage frequency of each cluster
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
Draws an hexagon around a point of x and y coordinates
Hexa(x, y, color = NA, border = "gray", unitcell = 1)
Hexa(x, y, color = NA, border = "gray", unitcell = 1)
x |
X-coordinate of the hexagon center |
y |
Y-coordinate of the hexagon center |
color |
Filling color of the hexagon (default NA) |
border |
Border color of the hexagon (default "gray") |
unitcell |
The distance side to side between two parallel sides of the hexagon (default 1) |
This function draws an hexagon on a plot
Sabina Licen
Draws an hexagonal SOM map using x, y coordinates for the hexagon centers
Hexagons(Coords, Row, Col, color = NA, border = "gray", unitcell = 1)
Hexagons(Coords, Row, Col, color = NA, border = "gray", unitcell = 1)
Coords |
matrix containing the x and y coordinates of the hexagon centers |
Row |
Number of SOM map rows |
Col |
Number of SOM map columns |
color |
Filling color of the hexagons (default NA) |
border |
Border color of the hexagons (default "gray") |
unitcell |
The distance side to side between two parallel sides of the hexagon (default 1) |
A hexagonal SOM map
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
Coord<-CodeCoord(10,5) Hexagons(Coord,10,5)
Coord<-CodeCoord(10,5) Hexagons(Coord,10,5)
Generates a SOM map colored according to cluster splitting
HexagonsClus( Centroids, Cluster, BCentr, Coord, Row, Col, colSeq = rainbow(nrow(Centroids)) )
HexagonsClus( Centroids, Cluster, BCentr, Coord, Row, Col, colSeq = rainbow(nrow(Centroids)) )
Centroids |
Centroids matrix |
Cluster |
Vector containing cluster number assignment for prototypes |
BCentr |
Best Matching Unit of the cluster centroids |
Coord |
Prototype coordinates for plotting the map |
Row |
Number of SOM map rows |
Col |
Number of SOM map columns |
colSeq |
Color sequence for the clusters |
A SOM map colored according to cluster splitting
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
Multiple plots that show the distribution of the modeled variables on the SOM map
HexagonsVar(Dms, codebook, Coords, Row, Col)
HexagonsVar(Dms, codebook, Coords, Row, Col)
Dms |
A vector of length 2, where the first argument specifies the number of rows and the second the number of columns of plots (see mfrow in par) |
codebook |
SOM codebook |
Coords |
Prototype coordinates for plotting the map |
Row |
Number of SOM map rows |
Col |
Number of SOM map columns |
The function plots a SOM map for the values of each modeled variable using a grayscale according to quartiles, from white (lower outliers), followed by grayscale (quartiles) and black (upper outiliers). The outilers and quartiles are evaluated by boxplot function applying default parameters.
SOM map plots for the values of each modeled variable using a grayscale according to quartiles
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
boxplot, par
Plot a SOM map with filled hexagons according to the number of hits
HexaHits(hits, Coord, Row, Col, color = "black")
HexaHits(hits, Coord, Row, Col, color = "black")
hits |
Vector with number of hits for each prototype |
Coord |
Prototype coordinates for plotting the map |
Row |
Number of SOM map rows |
Col |
Number of SOM map columns |
color |
color filling of the hexagons |
Plot a SOM map with filled hexagons according to the number of hits
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
Plot a SOM map with hits plotted as grayscale according to quartiles
HexaHitsQuant(hits, Coord, Row, Col)
HexaHitsQuant(hits, Coord, Row, Col)
hits |
Vector with number of hits for each prototype |
Coord |
Prototype coordinates for plotting the map |
Row |
Number of SOM map rows |
Col |
Number of SOM map columns |
The function plots a SOM map with hits represented as grayscale according to quartiles, from white (lower outliers) followed by grayscale (quartiles) and black (upper outiliers). The prototype with the maximum number of hits is represented by a red hexagon. The outilers and quartiles are evaluated by boxplot function applying default parameters.
Plot a SOM map with hits represented as grayscale according to quartiles
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
boxplot
Plot a SOM map with realtive quantization error plotted as grayscale according to quartiles
HexaQerrs(bmus, qerrs, Coord, Row, Col, color = "black")
HexaQerrs(bmus, qerrs, Coord, Row, Col, color = "black")
bmus |
Vector with Best Matching Unit for each experimental sample |
qerrs |
Vector with quantization error for each experimental sample |
Coord |
Prototype coordinates for plotting the map |
Row |
Number of SOM map rows |
Col |
Number of SOM map columns |
color |
color filling of the hexagonsType a message |
The function evaluate the relative quantization error for each prototype dividing the sum of quantization errors for experimental samples represented by the single prototype by the number of hits of the same prototype, then plots a SOM map with with filled hexagons according to the realtive quantization error
Plot a SOM map with filled hexagons according to the realtive quantization error
Sabina Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
Plot a SOM map with realtive quantization error plotted as grayscale according to quartiles
HexaQerrsQuant(bmus, qerrs, Coord, Row, Col)
HexaQerrsQuant(bmus, qerrs, Coord, Row, Col)
bmus |
Vector with Best Matching Unit for each experimental sample |
qerrs |
Vector with quantization error for each experimental sample |
Coord |
Prototype coordinates for plotting the map |
Row |
Number of SOM map rows |
Col |
Number of SOM map columns |
The function evaluate the relative quantization error for each prototype dividing the sum of quantization errors for experimental sample represented by the single prototype by the number of hits of the same prototype, then plots a SOM map with the realtive quantization error represented as grayscale according to quartiles, from white (lower outliers) followed by grayscale (quartiles) and black (upper outiliers). The outilers and quartiles are evaluated by boxplot function applying default parameters.
Plot a SOM map with realtive quantization error represented as grayscale according to quartiles
S. Licen
Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
boxplot
The som_kmeansR function with 100 epochs training is run for a custom number of times for each k value of clusters and the best of these is selected based on sum of squared errors (err). The Davies-Bouldin index is calculated for each k-clustering. The function has been coded in R code starting from kmeans_clusters.m script present in somtoolbox for Matlab by Vesanto and adapted to show a progress bar when working embedded in the shiny app.
kmeans_clustersRProg(codebook, k = 5, times = 5, seed = NULL)
kmeans_clustersRProg(codebook, k = 5, times = 5, seed = NULL)
codebook |
SOM codebook |
k |
Maximum number of clusters (the function will be run from 2 to k clusters) |
times |
Number of times the som_kmeansR function is iterated |
seed |
Number for set.seed function |
This function returns a list containing the cluster number assignment for each sample, the cluster centroids, the total quantization error, the DB-index for each number of clusters, and the random seed number used
Sabina Licen, Pierluigi Barbieri
J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf
som_mdistR, som_kmeansRProg, db_indexR
Changes the input vector according the custom number sequence for clusters
NClusChange(Vector, NCh)
NClusChange(Vector, NCh)
Vector |
Vector containing cluster number assignment for prototypes or experimental data |
NCh |
Vector with custom sequence of numbers for clusters |
A vector of same length as input vector with cluuster numbers changed according to custom input
Sabina Licen
Generate basic statistics for the input vector
paramQuant(param)
paramQuant(param)
param |
Numeric vector |
The outilers and quartiles are evaluated by boxplot function applying default parameters.
A table which contains basic statistics for the input vector
Sabina Licen
boxplot
library(datasets) paramQuant(iris[,1])
library(datasets) paramQuant(iris[,1])
Generate SOM map dimensions according to Vesanto heuristic rules based on the first two eigenvalues of the experimental data and their related eigenvectors The function has been coded in R code starting from som_dim.m script present in somtoolbox for Matlab by Vesanto and adapted for the use in the shiny app
som_dimR(dataset, type = "regular")
som_dimR(dataset, type = "regular")
dataset |
Experimental data |
type |
Either "regular", "small" or "big" map (default ="regular") |
This function returns a list containing the number of rows, columns and overall map units
Sabina Licen, Pierluigi Barbieri
J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf
eigen, cor
library(datasets) som_dimR(iris[,1:4], type="small")
library(datasets) som_dimR(iris[,1:4], type="small")
Generate SOM map initialization matrix according to Vesanto heuristic rules related to map dimensions, the first two eigenvalues of the experimental data and their related eigenvectors The function has been coded in R code starting from som_init.m script present in somtoolbox for Matlab by Vesanto and adapted for the use in the shiny app
som_initR(dataset, Row, Col, munits)
som_initR(dataset, Row, Col, munits)
dataset |
Experimental data |
Row |
Number of SOM map rows |
Col |
Number of SOM map columns |
munits |
Number of SOM map units (Row*Col) |
This function returns an initialization matrix for SOM training
Sabina Licen, Pierluigi Barbieri
J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf
SOMdim<-som_dimR(iris[,1:4], type="small") SOMinit<-som_initR(iris[,1:4],SOMdim$Row,SOMdim$Col,SOMdim$munits)
SOMdim<-som_dimR(iris[,1:4], type="small") SOMinit<-som_initR(iris[,1:4],SOMdim$Row,SOMdim$Col,SOMdim$munits)
The training is run for a custom number of epochs for k number of clusters
som_kmeansRProg(codebook, k, epochs, seed = NULL)
som_kmeansRProg(codebook, k, epochs, seed = NULL)
codebook |
SOM codebook |
k |
Number of clusters |
epochs |
Number of training epochs |
seed |
Number for set.seed function |
The function has been coded in R code starting from som_kmeans.m script present in somtoolbox for Matlab by Vesanto and adapted to show a progress bar when working embedded in the shiny app.
This function returns a list containing the cluster number assignment for each sample, the cluster centroids, the total quantization error, and the random seed number used
Sabina Licen, Pierluigi Barbieri
J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf
set.seed
The function has been coded in R code starting from som_mdist.m script present in somtoolbox for Matlab by Vesanto and adapted for the use in the shiny app
som_mdistR(codebook)
som_mdistR(codebook)
codebook |
SOM codebook |
Distance matrix
Sabina Licen, Pierluigi Barbieri
J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf
db_indexR
The function has been coded in R code starting from som_umat.m script present in somtoolbox for Matlab by Vesanto and adapted for the use in the shiny app
som_umatR(codebook, Row, Col)
som_umatR(codebook, Row, Col)
codebook |
SOM codebook |
Row |
Number of SOM map rows |
Col |
Number of SOM map columns |
The unified distance matrix for the SOM map
Sabina Licen, Pierluigi Barbieri
J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf; A. Ultsch, H.P. Siemon, Proceedings of International Neural Network Conference (INNC?90), Kluwer academic Publishers, Dordrecht, 1990, pp. 305?308.
The function starts the SOMEnv GUI
SomEnvGUI()
SomEnvGUI()
This function starts the graphical user interface with the default system browser. The main help suggestion for using the tool are embedded in the GUI
Sabina Licen, Marco Franzon, Tommaso Rodani
Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2019). shiny: Web Application Framework for R. R package version 1.4.0. https://CRAN.R-project.org/package=shiny
seealso shiny
## Not run: SomEnvGUI() ## End(Not run)
## Not run: SomEnvGUI() ## End(Not run)
Calculate topographical error for the SOM map
SOMtopol(dataset, codebook, grid)
SOMtopol(dataset, codebook, grid)
dataset |
Experimental data used for training the map |
codebook |
SOM codebook |
grid |
SOM grid expressed as a matrix of x and y coordinates of the map units |
This function returns the topographical error
Sabina Licen
Clark, S., Sisson, S.A., Sharma, A. (2020) Adv. Water Resour. 143, art. no. 103676 DOI: 10.1016/j.advwatres.2020.103676
Plot of Unified Distance Matrix using a colored scale according to quartiles
UmatGraph(umat, Row, Col, colorscale = c("bw", "gs"))
UmatGraph(umat, Row, Col, colorscale = c("bw", "gs"))
umat |
Unified Distance Matrix |
Row |
Number of SOM map rows |
Col |
Number of SOM map columns |
colorscale |
Either "bw" for grayscale or "gs" for green-white scale |
The function plots a U-matrix map for the values of each modeled variable using a grayscale according to quartiles, from darker color (lower distances) to lighter color (higher distances). The quartiles are evaluated by boxplot function applying default parameters.
Plot of Unified Distance Matrix using a grayscale or (green-white scale) according to quartiles
Sabina Licen
J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf; Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414
boxplot, som_umatR