Package 'SOMEnv'

Title: SOM Algorithm for the Analysis of Multivariate Environmental Data
Description: Analysis of multivariate environmental high frequency data by Self-Organizing Map and k-means clustering algorithms. By means of the graphical user interface it provides a comfortable way to elaborate by self-organizing map algorithm rather big datasets (txt files up to 100 MB ) obtained by environmental high-frequency monitoring by sensors/instruments. The functions present in the package are based on 'kohonen' and 'openair' packages implemented by functions embedding Vesanto et al. (2001) <http://www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf> heuristic rules for map initialization parameters, k-means clustering algorithm and map features visualization. Cluster profiles visualization as well as graphs dedicated to the visualization of time-dependent variables Licen et al. (2020) <doi:10.4209/aaqr.2019.08.0414> are provided.
Authors: Sabina Licen [aut, cre], Marco Franzon [aut], Tommaso Rodani [aut], Pierluigi Barbieri [aut]
Maintainer: Sabina Licen <[email protected]>
License: GPL-2
Version: 0.1.1
Built: 2024-11-12 04:56:29 UTC
Source: https://github.com/somenv/somenv

Help Index


BMUs of the cluster centroids

Description

The function finds the Best Matching Units of the cluster centroids

Usage

BmusCentr(centroids, som_model, k)

Arguments

centroids

Centroids array (output of kmeans_clustersR function)

som_model

An object of class kohonen

k

Number of clusters

Value

An array containing the BMU for each centroid

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414


Cluster assignment for the experimental data

Description

Generate a vector containing the cluster assignment to experimental data

Usage

BmusClus(Bmus, Cluster)

Arguments

Bmus

Best Matching Unit assignment to the experimental data

Cluster

Vector containing cluster number assignment for prototypes

Value

A vector containing the cluster assignment to experimental data

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414


Boxplot of prototype variables split by cluster and variable

Description

Boxplot function is used, box whiskers are omitted

Usage

BoxClus(Dms, codebook, Cluster)

Arguments

Dms

A vector of length 2, where the first argument specifies the number of rows and the second the number of columns of plots (see mfrow in par)

codebook

De-normalized prototype codebook

Cluster

Vector containing cluster number assignment for prototypes

Value

Boxplot of prototype variables split by cluster

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414

See Also

boxplot, par


Boxplot of prototype variables split by cluster

Description

Boxplot function is used, box whiskers are omitted

Usage

BoxUnits(codebook, Cluster, Ylim = NA, pitch = NA, xdim = 0.75)

Arguments

codebook

Prototype codebook normalized by variable

Cluster

Vector containing cluster number assignment for prototypes

Ylim

Vector of length 2 for y-axis limits

pitch

Vector containing the position of horizontal grid lines

xdim

x axes label dimensions

Value

Boxplot of prototype variables split by cluster

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414

See Also

boxplot


Custom color sequence for clusters

Description

Generate the sequence of colors to plot the SOM map according to clusters

Usage

ClusCol(Centroids, Cluster, colSeq = rainbow(nrow(data.frame(Centroids))))

Arguments

Centroids

Centroids matrix

Cluster

Vector containing cluster number assignment for prototypes

colSeq

Color sequence for the clusters

Value

A vector of colors with length equal to Cluster

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414


Prototype coordinates for graph

Description

Generate X and Y coordinates for plotting a SOM map shaped according to Vesanto visualization fashion

Usage

CodeCoord(Row, Col)

Arguments

Row

Number of SOM map rows

Col

Number of SOM map columns

Value

This function returns a data.frame including columns:

  • X

  • Y

Author(s)

Sabina Licen, Pierluigi Barbieri

References

J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf; Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414

Examples

Coord<-CodeCoord(10,5)

Plot of daily percentages for each cluster

Description

The function produces a plot representing the the daily percentage for each cluster

Usage

DailyBar(
  experimental,
  TrainClus,
  colSeq = rainbow(length(levels(as.factor(TrainClus)))),
  Total = 1440,
  xdim = 0.7,
  ydim = 0.8
)

Arguments

experimental

Experimental data (must contain variable "date")

TrainClus

Vector containing cluster number assignment for experimental data

colSeq

Color sequence for the clusters

Total

Number of observations per day

xdim

x axes label dimensions

ydim

y axes label dimensions

Value

Plot of daily percentages for each cluster, the latter element in the legend represents percentage of not determined data

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414


Evaluate Davis-Bouldin index for the cluster split of data input

Description

The function has been coded in R code starting from db_index.m script present in somtoolbox for Matlab by Vesanto and adapted for the use in the shiny app

Usage

db_indexR(codebook, k_best, c_best)

Arguments

codebook

SOM codebook

k_best

Vector with cluster number assignment for each sample

c_best

Matrix with cluster centroids

Value

The mean DB-index for the clustering

Author(s)

Sabina Licen, Pierluigi Barbieri

References

J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf

See Also

som_mdistR, kmeans_clustersRProg


Percentage frequency for each cluster

Description

Percentage frequency for each cluster

Usage

Freq(Cluster)

Arguments

Cluster

Vector containing cluster number assignment for experimental data

Value

A data frame containing the percentage frequency of each cluster

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414


Daily percentage frequency for each cluster

Description

Daily percentage frequency for each cluster

Usage

FreqD(Date, Cluster, Total = 1440)

Arguments

Date

Vector containing date/time variable for experimental data

Cluster

Vector containing cluster number assignment for experimental data

Total

Number of observations per day

Value

A data frame containing the daily percentage frequency of each cluster

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414


Monthly percentage frequency for each cluster

Description

Monthly percentage frequency for each cluster

Usage

FreqM(Date, Cluster)

Arguments

Date

Vector containing date/time variable for experimental data

Cluster

Vector containing cluster number assignment for experimental data

Value

A data frame containing the monthly percentage frequency of each cluster

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414


Function to draw an hexagon around a point

Description

Draws an hexagon around a point of x and y coordinates

Usage

Hexa(x, y, color = NA, border = "gray", unitcell = 1)

Arguments

x

X-coordinate of the hexagon center

y

Y-coordinate of the hexagon center

color

Filling color of the hexagon (default NA)

border

Border color of the hexagon (default "gray")

unitcell

The distance side to side between two parallel sides of the hexagon (default 1)

Value

This function draws an hexagon on a plot

Author(s)

Sabina Licen


Function to draw an hexagonal SOM map

Description

Draws an hexagonal SOM map using x, y coordinates for the hexagon centers

Usage

Hexagons(Coords, Row, Col, color = NA, border = "gray", unitcell = 1)

Arguments

Coords

matrix containing the x and y coordinates of the hexagon centers

Row

Number of SOM map rows

Col

Number of SOM map columns

color

Filling color of the hexagons (default NA)

border

Border color of the hexagons (default "gray")

unitcell

The distance side to side between two parallel sides of the hexagon (default 1)

Value

A hexagonal SOM map

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414

Examples

Coord<-CodeCoord(10,5)
Hexagons(Coord,10,5)

SOM map with clusters

Description

Generates a SOM map colored according to cluster splitting

Usage

HexagonsClus(
  Centroids,
  Cluster,
  BCentr,
  Coord,
  Row,
  Col,
  colSeq = rainbow(nrow(Centroids))
)

Arguments

Centroids

Centroids matrix

Cluster

Vector containing cluster number assignment for prototypes

BCentr

Best Matching Unit of the cluster centroids

Coord

Prototype coordinates for plotting the map

Row

Number of SOM map rows

Col

Number of SOM map columns

colSeq

Color sequence for the clusters

Value

A SOM map colored according to cluster splitting

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414


Heatmaps

Description

Multiple plots that show the distribution of the modeled variables on the SOM map

Usage

HexagonsVar(Dms, codebook, Coords, Row, Col)

Arguments

Dms

A vector of length 2, where the first argument specifies the number of rows and the second the number of columns of plots (see mfrow in par)

codebook

SOM codebook

Coords

Prototype coordinates for plotting the map

Row

Number of SOM map rows

Col

Number of SOM map columns

Details

The function plots a SOM map for the values of each modeled variable using a grayscale according to quartiles, from white (lower outliers), followed by grayscale (quartiles) and black (upper outiliers). The outilers and quartiles are evaluated by boxplot function applying default parameters.

Value

SOM map plots for the values of each modeled variable using a grayscale according to quartiles

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414

See Also

boxplot, par


Hits distribution on the SOM map

Description

Plot a SOM map with filled hexagons according to the number of hits

Usage

HexaHits(hits, Coord, Row, Col, color = "black")

Arguments

hits

Vector with number of hits for each prototype

Coord

Prototype coordinates for plotting the map

Row

Number of SOM map rows

Col

Number of SOM map columns

color

color filling of the hexagons

Value

Plot a SOM map with filled hexagons according to the number of hits

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414


Hits distribution on the SOM map

Description

Plot a SOM map with hits plotted as grayscale according to quartiles

Usage

HexaHitsQuant(hits, Coord, Row, Col)

Arguments

hits

Vector with number of hits for each prototype

Coord

Prototype coordinates for plotting the map

Row

Number of SOM map rows

Col

Number of SOM map columns

Details

The function plots a SOM map with hits represented as grayscale according to quartiles, from white (lower outliers) followed by grayscale (quartiles) and black (upper outiliers). The prototype with the maximum number of hits is represented by a red hexagon. The outilers and quartiles are evaluated by boxplot function applying default parameters.

Value

Plot a SOM map with hits represented as grayscale according to quartiles

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414

See Also

boxplot


Realtive quantization error distribution on the SOM map

Description

Plot a SOM map with realtive quantization error plotted as grayscale according to quartiles

Usage

HexaQerrs(bmus, qerrs, Coord, Row, Col, color = "black")

Arguments

bmus

Vector with Best Matching Unit for each experimental sample

qerrs

Vector with quantization error for each experimental sample

Coord

Prototype coordinates for plotting the map

Row

Number of SOM map rows

Col

Number of SOM map columns

color

color filling of the hexagonsType a message

Details

The function evaluate the relative quantization error for each prototype dividing the sum of quantization errors for experimental samples represented by the single prototype by the number of hits of the same prototype, then plots a SOM map with with filled hexagons according to the realtive quantization error

Value

Plot a SOM map with filled hexagons according to the realtive quantization error

Author(s)

Sabina Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414


Realtive quantization error distribution on the SOM map

Description

Plot a SOM map with realtive quantization error plotted as grayscale according to quartiles

Usage

HexaQerrsQuant(bmus, qerrs, Coord, Row, Col)

Arguments

bmus

Vector with Best Matching Unit for each experimental sample

qerrs

Vector with quantization error for each experimental sample

Coord

Prototype coordinates for plotting the map

Row

Number of SOM map rows

Col

Number of SOM map columns

Details

The function evaluate the relative quantization error for each prototype dividing the sum of quantization errors for experimental sample represented by the single prototype by the number of hits of the same prototype, then plots a SOM map with the realtive quantization error represented as grayscale according to quartiles, from white (lower outliers) followed by grayscale (quartiles) and black (upper outiliers). The outilers and quartiles are evaluated by boxplot function applying default parameters.

Value

Plot a SOM map with realtive quantization error represented as grayscale according to quartiles

Author(s)

S. Licen

References

Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414

See Also

boxplot


K-means algorithm applied for different values of clusters

Description

The som_kmeansR function with 100 epochs training is run for a custom number of times for each k value of clusters and the best of these is selected based on sum of squared errors (err). The Davies-Bouldin index is calculated for each k-clustering. The function has been coded in R code starting from kmeans_clusters.m script present in somtoolbox for Matlab by Vesanto and adapted to show a progress bar when working embedded in the shiny app.

Usage

kmeans_clustersRProg(codebook, k = 5, times = 5, seed = NULL)

Arguments

codebook

SOM codebook

k

Maximum number of clusters (the function will be run from 2 to k clusters)

times

Number of times the som_kmeansR function is iterated

seed

Number for set.seed function

Value

This function returns a list containing the cluster number assignment for each sample, the cluster centroids, the total quantization error, the DB-index for each number of clusters, and the random seed number used

Author(s)

Sabina Licen, Pierluigi Barbieri

References

J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf

See Also

som_mdistR, som_kmeansRProg, db_indexR


Custom number sequence for clusters

Description

Changes the input vector according the custom number sequence for clusters

Usage

NClusChange(Vector, NCh)

Arguments

Vector

Vector containing cluster number assignment for prototypes or experimental data

NCh

Vector with custom sequence of numbers for clusters

Value

A vector of same length as input vector with cluuster numbers changed according to custom input

Author(s)

Sabina Licen


Basic statistics of values present in the input vector

Description

Generate basic statistics for the input vector

Usage

paramQuant(param)

Arguments

param

Numeric vector

Details

The outilers and quartiles are evaluated by boxplot function applying default parameters.

Value

A table which contains basic statistics for the input vector

Author(s)

Sabina Licen

See Also

boxplot

Examples

library(datasets)
paramQuant(iris[,1])

Calculate map dimensions

Description

Generate SOM map dimensions according to Vesanto heuristic rules based on the first two eigenvalues of the experimental data and their related eigenvectors The function has been coded in R code starting from som_dim.m script present in somtoolbox for Matlab by Vesanto and adapted for the use in the shiny app

Usage

som_dimR(dataset, type = "regular")

Arguments

dataset

Experimental data

type

Either "regular", "small" or "big" map (default ="regular")

Value

This function returns a list containing the number of rows, columns and overall map units

Author(s)

Sabina Licen, Pierluigi Barbieri

References

J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf

See Also

eigen, cor

Examples

library(datasets)
som_dimR(iris[,1:4], type="small")

Calculate initialization matrix for SOM training

Description

Generate SOM map initialization matrix according to Vesanto heuristic rules related to map dimensions, the first two eigenvalues of the experimental data and their related eigenvectors The function has been coded in R code starting from som_init.m script present in somtoolbox for Matlab by Vesanto and adapted for the use in the shiny app

Usage

som_initR(dataset, Row, Col, munits)

Arguments

dataset

Experimental data

Row

Number of SOM map rows

Col

Number of SOM map columns

munits

Number of SOM map units (Row*Col)

Value

This function returns an initialization matrix for SOM training

Author(s)

Sabina Licen, Pierluigi Barbieri

References

J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf

Examples

SOMdim<-som_dimR(iris[,1:4], type="small")
SOMinit<-som_initR(iris[,1:4],SOMdim$Row,SOMdim$Col,SOMdim$munits)

K-means algorithm applied for a specific number of clusters

Description

The training is run for a custom number of epochs for k number of clusters

Usage

som_kmeansRProg(codebook, k, epochs, seed = NULL)

Arguments

codebook

SOM codebook

k

Number of clusters

epochs

Number of training epochs

seed

Number for set.seed function

Details

The function has been coded in R code starting from som_kmeans.m script present in somtoolbox for Matlab by Vesanto and adapted to show a progress bar when working embedded in the shiny app.

Value

This function returns a list containing the cluster number assignment for each sample, the cluster centroids, the total quantization error, and the random seed number used

Author(s)

Sabina Licen, Pierluigi Barbieri

References

J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf

See Also

set.seed


Evaluate pairwise distance matrix for the given codebook

Description

The function has been coded in R code starting from som_mdist.m script present in somtoolbox for Matlab by Vesanto and adapted for the use in the shiny app

Usage

som_mdistR(codebook)

Arguments

codebook

SOM codebook

Value

Distance matrix

Author(s)

Sabina Licen, Pierluigi Barbieri

References

J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf

See Also

db_indexR


Unified distance matrix for the SOM map

Description

The function has been coded in R code starting from som_umat.m script present in somtoolbox for Matlab by Vesanto and adapted for the use in the shiny app

Usage

som_umatR(codebook, Row, Col)

Arguments

codebook

SOM codebook

Row

Number of SOM map rows

Col

Number of SOM map columns

Value

The unified distance matrix for the SOM map

Author(s)

Sabina Licen, Pierluigi Barbieri

References

J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf; A. Ultsch, H.P. Siemon, Proceedings of International Neural Network Conference (INNC?90), Kluwer academic Publishers, Dordrecht, 1990, pp. 305?308.


The function starts the SOMEnv GUI

Description

The function starts the SOMEnv GUI

Usage

SomEnvGUI()

Value

This function starts the graphical user interface with the default system browser. The main help suggestion for using the tool are embedded in the GUI

Author(s)

Sabina Licen, Marco Franzon, Tommaso Rodani

References

Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2019). shiny: Web Application Framework for R. R package version 1.4.0. https://CRAN.R-project.org/package=shiny

seealso shiny

Examples

## Not run: 
SomEnvGUI()

## End(Not run)

Topographical error for the SOM map

Description

Calculate topographical error for the SOM map

Usage

SOMtopol(dataset, codebook, grid)

Arguments

dataset

Experimental data used for training the map

codebook

SOM codebook

grid

SOM grid expressed as a matrix of x and y coordinates of the map units

Value

This function returns the topographical error

Author(s)

Sabina Licen

References

Clark, S., Sisson, S.A., Sharma, A. (2020) Adv. Water Resour. 143, art. no. 103676 DOI: 10.1016/j.advwatres.2020.103676


U-matrix plot

Description

Plot of Unified Distance Matrix using a colored scale according to quartiles

Usage

UmatGraph(umat, Row, Col, colorscale = c("bw", "gs"))

Arguments

umat

Unified Distance Matrix

Row

Number of SOM map rows

Col

Number of SOM map columns

colorscale

Either "bw" for grayscale or "gs" for green-white scale

Details

The function plots a U-matrix map for the values of each modeled variable using a grayscale according to quartiles, from darker color (lower distances) to lighter color (higher distances). The quartiles are evaluated by boxplot function applying default parameters.

Value

Plot of Unified Distance Matrix using a grayscale or (green-white scale) according to quartiles

Author(s)

Sabina Licen

References

J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankagas, SOM Toolbox for Matlab 5, Report A57, 2000, Available at: www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf; Licen, S., Cozzutto, S., Barbieri, P. (2020) Aerosol Air Qual. Res., 20 (4), pp. 800-809. DOI: 10.4209/aaqr.2019.08.0414

See Also

boxplot, som_umatR