Title: | Self-Organising Maps Coupled with Hierarchical Cluster Analysis |
---|---|
Description: | Implements self-organising maps combined with hierarchical cluster analysis (SOM-HCA) for clustering and visualization of high-dimensional data. The package includes functions to estimate the optimal map size based on various quality measures and subsequently generates a model with the selected dimensions. It also performs hierarchical clustering on the map nodes to group similar units Documentation about the SOM-HCA method is provided in Pastorelli et al. (2024) <doi:10.1002/xrs.3388>. |
Authors: | Gianluca Pastorelli [aut, cre] |
Maintainer: | Gianluca Pastorelli <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.3 |
Built: | 2025-01-24 06:32:24 UTC |
Source: | https://github.com/gianluca-pastorelli/somhca |
Groups similar nodes of the SOM using hierarchical clustering and the KGS penalty function to determine the optimal number of clusters.
clusterSOM(model, plot_result = TRUE, file_path = NULL)
clusterSOM(model, plot_result = TRUE, file_path = NULL)
model |
The trained SOM model object. |
plot_result |
A logical value indicating whether to plot the clustering result. Default is 'TRUE'. |
file_path |
An optional string specifying the path to a CSV file. If provided, clusters are assigned to the observations in the original dataset, and the updated data is stored in a package environment as 'DataAndClusters'. |
A plot of the clusters on the SOM grid (if 'plot_result = TRUE'). If 'file_path' is specified, the clustered dataset is stored in a package environment for retrieval.
# Create a toy matrix with 9 columns and 100 rows data <- matrix(rnorm(900), ncol = 9, nrow = 100) # 900 random numbers, 100 rows, 9 columns # Run the finalSOM function with the mock data model <- finalSOM(data, dimension = 6, iterations = 700) # Perform clustering using the mock model clusterSOM(model, plot_result = TRUE) # Load the toy data from the package's inst/extdata/ directory, perform # clustering and retrieve the clustered dataset file_path <- system.file("extdata", "toy_data.csv", package = "somhca") clusterSOM(model, plot_result = FALSE, file_path) getClusterData()
# Create a toy matrix with 9 columns and 100 rows data <- matrix(rnorm(900), ncol = 9, nrow = 100) # 900 random numbers, 100 rows, 9 columns # Run the finalSOM function with the mock data model <- finalSOM(data, dimension = 6, iterations = 700) # Perform clustering using the mock model clusterSOM(model, plot_result = TRUE) # Load the toy data from the package's inst/extdata/ directory, perform # clustering and retrieve the clustered dataset file_path <- system.file("extdata", "toy_data.csv", package = "somhca") clusterSOM(model, plot_result = FALSE, file_path) getClusterData()
Re-trains the SOM using a specified optimal grid size and number of iterations.
finalSOM(data, dimension, iterations)
finalSOM(data, dimension, iterations)
data |
The preprocessed data matrix containing the input data for SOM training. |
dimension |
An integer specifying the dimension of the square SOM grid (e.g., 5 results in a 5x5 grid). |
iterations |
An integer defining the number of iterations for training the SOM model. Use a large value, e.g., 500 or higher, for improved training (an error message could suggest that reducing the number of iterations might be necessary). |
A trained SOM model object.
# Create a toy matrix with 9 columns and 100 rows data <- matrix(rnorm(900), ncol = 9, nrow = 100) # 900 random numbers, 100 rows, 9 columns # Run the finalSOM function with the mock data myFinalSOM <- finalSOM(data, dimension = 6, iterations = 700)
# Create a toy matrix with 9 columns and 100 rows data <- matrix(rnorm(900), ncol = 9, nrow = 100) # 900 random numbers, 100 rows, 9 columns # Run the finalSOM function with the mock data myFinalSOM <- finalSOM(data, dimension = 6, iterations = 700)
Creates various types of plots to visualize and evaluate the trained SOM model.
generatePlot(model, plot_type, data = NULL)
generatePlot(model, plot_type, data = NULL)
model |
The trained SOM model object. |
plot_type |
An integer specifying the type of plot to generate. Options are:
|
data |
The preprocessed data matrix containing the input data. Required only for 'plot_type = 5'. |
A plot or a series of plots is generated and displayed based on the specified type.
# Create a toy matrix with 9 columns and 100 rows data <- matrix(rnorm(900), ncol = 9, nrow = 100) # 900 random numbers, 100 rows, 9 columns # Assign column names to the data matrix colnames(data) <- paste("Var", 1:ncol(data), sep = "_") # Run the finalSOM function with the mock data model <- finalSOM(data, dimension = 6, iterations = 700) # Generate plots using the mock model generatePlot(model, plot_type = 2) generatePlot(model, plot_type = 5, data)
# Create a toy matrix with 9 columns and 100 rows data <- matrix(rnorm(900), ncol = 9, nrow = 100) # 900 random numbers, 100 rows, 9 columns # Assign column names to the data matrix colnames(data) <- paste("Var", 1:ncol(data), sep = "_") # Run the finalSOM function with the mock data model <- finalSOM(data, dimension = 6, iterations = 700) # Generate plots using the mock model generatePlot(model, plot_type = 2) generatePlot(model, plot_type = 5, data)
Access the dataset with cluster assignments stored by 'clusterSOM'.
getClusterData()
getClusterData()
A data frame with the clustered dataset.
Computes the optimal grid size for training a SOM using various quality measures and heuristic approaches.
optimalSOM(data, method = "A", increments, iterations)
optimalSOM(data, method = "A", increments, iterations)
data |
The preprocessed data matrix containing the input data for SOM training. |
method |
A character string indicating the method for estimating the maximum grid dimension. Options are:
|
increments |
An integer specifying the step size for increasing grid dimensions. For example, set increments to 2 or 5 to increment the grid size by 2 or 5 rows/columns at each step. Smaller increments lead to more granular searches but may increase computation time; larger increments risk errors if they exceed the estimated maximum SOM grid dimensions. |
iterations |
An integer defining the number of iterations for SOM training. A lower value, such as less than 500, helps reduce computation time. If the process takes too long or an error occurs, try reducing the number of iterations for quicker results. |
A data frame summarizing quality measures and their associated optimal grid dimensions. Use these results to select the most suitable grid size for your SOM.
# Create a toy matrix with 9 columns and 100 rows data <- matrix(rnorm(900), ncol = 9, nrow = 100) # 900 random numbers, 100 rows, 9 columns # Run the optimalSOM function with the mock data myOptimalSOM <- optimalSOM(data, method = "A", increments = 2, iterations = 300)
# Create a toy matrix with 9 columns and 100 rows data <- matrix(rnorm(900), ncol = 9, nrow = 100) # 900 random numbers, 100 rows, 9 columns # Run the optimalSOM function with the mock data myOptimalSOM <- optimalSOM(data, method = "A", increments = 2, iterations = 300)
Reads data from a CSV file, optionally removes row headings, and applies specified normalization methods before converting the data to a matrix. In the original dataset, rows represent observations (e.g., samples), columns represent variables (e.g., features), and all cells (except for column headers and, in case, row headers) only contain numeric values.
readMatrix(file_path, remove_row_headings = FALSE, scaling = "no")
readMatrix(file_path, remove_row_headings = FALSE, scaling = "no")
file_path |
A string specifying the path to the CSV file. |
remove_row_headings |
A logical value. If 'TRUE', removes the first column of the dataset. This is useful when the first column contains non-numeric identifiers (e.g., sample names) that should be excluded from the analysis. Default is 'FALSE'. |
scaling |
A string specifying the scaling method. Options are:
|
A matrix with the processed data.
# Load the toy data from the package's inst/extdata/ directory file_path <- system.file("extdata", "toy_data.csv", package = "somhca") # Run the readMatrix function with the mock data myMatrix <- readMatrix(file_path, TRUE, "MinMax")
# Load the toy data from the package's inst/extdata/ directory file_path <- system.file("extdata", "toy_data.csv", package = "somhca") # Run the readMatrix function with the mock data myMatrix <- readMatrix(file_path, TRUE, "MinMax")