-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Hello, I am trying to run UMAP with pre-computed "custom metric" as input distance matrix. My custom metric is Pearson distance. I know that there is an in built custom metric - "Pearson" available. But, I wanted to check whether the results match if I use pre-computed Pearson distance as the input distance matrix to the umap() function. Even after setting the random_state the same in both the cases, I got different results.
Case 1: (Using the in-built Pearson metric)
inp_n_neighbors <- 200
inp_min_dist <- 0.001
inp_spread <- 0.2
n_comp <- 2
custom.config <- umap.defaults
custom.config$random_state <- 123
custom.config$n_neighbors <- inp_n_neighbors
custom.config$min_dist <- inp_min_dist
custom.config$spread <- inp_spread
custom.config$metric <- "pearson"
custom.config$n_components <- n_comp
res.umap <- umap(data, config=custom.config, preserve.seed=TRUE)
Case 2: (Using the custom Pearson metric as input distance matrix)
inp_n_neighbors <- 200
inp_min_dist <- 0.001
inp_spread <- 0.2
n_comp <- 2
custom.config <- umap.defaults
custom.config$random_state <- 123
custom.config$input <- "dist"
custom.config$n_neighbors <- inp_n_neighbors
custom.config$min_dist <- inp_min_dist
custom.config$spread <- inp_spread
custom.config$n_components <- n_comp
data_corr <- cor(t(data), method="pearson")
data_dist <- (1 - data_corr)/2
res.umap2<- umap(data_dist, config=custom.config, preserve.seed=TRUE)
The results of res.umap and res.umap2 are different
I was curious to know what is happening and played around with things and realized that even with pre-computed custom distance metric as input, the value assigned to "custom.config$metric" parameter changes the results. For example, look at the case 3.
Case 3: (Using the custom Pearson metric as input distance matrix)
inp_n_neighbors <- 200
inp_min_dist <- 0.001
inp_spread <- 0.2
n_comp <- 2
custom.config <- umap.defaults
custom.config$random_state <- 123
custom.config$input <- "dist"
custom.config$n_neighbors <- inp_n_neighbors
custom.config$min_dist <- inp_min_dist
custom.config$spread <- inp_spread
custom.config$n_components <- n_comp
custom.config$metric <- "pearson" #### THE DEFAULT IS EUCLIDEAN DISTANCE BUT I CHANGED IT TO PEARSON"
data_corr <- cor(t(data), method="pearson")
data_dist <- (1 - data_corr)/2
res.umap3<- umap(data_dist, config=custom.config, preserve.seed=TRUE)
The results of res.umap2 and res.umap3 are different
WHEN I USE A PRE-COMPUTED CUSTOM METRIC AS INPUT DISTANCE, WHY THE VAULE ASSIGNED TO "custom.config$metric" CHANGES THE RESULTS? WHERE IS THE PROBLEM WITH MY UNDERSTANDING?
Thanks