Skip to content

Problem with using custom metric #18

@Jayesh-Kumar-Sundaram

Description

@Jayesh-Kumar-Sundaram

Hello, I am trying to run UMAP with pre-computed "custom metric" as input distance matrix. My custom metric is Pearson distance. I know that there is an in built custom metric - "Pearson" available. But, I wanted to check whether the results match if I use pre-computed Pearson distance as the input distance matrix to the umap() function. Even after setting the random_state the same in both the cases, I got different results.

Case 1: (Using the in-built Pearson metric)
inp_n_neighbors <- 200
inp_min_dist <- 0.001
inp_spread <- 0.2
n_comp <- 2
custom.config <- umap.defaults
custom.config$random_state <- 123
custom.config$n_neighbors <- inp_n_neighbors
custom.config$min_dist <- inp_min_dist
custom.config$spread <- inp_spread
custom.config$metric <- "pearson"
custom.config$n_components <- n_comp
res.umap <- umap(data, config=custom.config, preserve.seed=TRUE)

Case 2: (Using the custom Pearson metric as input distance matrix)
inp_n_neighbors <- 200
inp_min_dist <- 0.001
inp_spread <- 0.2
n_comp <- 2
custom.config <- umap.defaults
custom.config$random_state <- 123
custom.config$input <- "dist"
custom.config$n_neighbors <- inp_n_neighbors
custom.config$min_dist <- inp_min_dist
custom.config$spread <- inp_spread
custom.config$n_components <- n_comp
data_corr <- cor(t(data), method="pearson")
data_dist <- (1 - data_corr)/2
res.umap2<- umap(data_dist, config=custom.config, preserve.seed=TRUE)

The results of res.umap and res.umap2 are different

I was curious to know what is happening and played around with things and realized that even with pre-computed custom distance metric as input, the value assigned to "custom.config$metric" parameter changes the results. For example, look at the case 3.

Case 3: (Using the custom Pearson metric as input distance matrix)
inp_n_neighbors <- 200
inp_min_dist <- 0.001
inp_spread <- 0.2
n_comp <- 2
custom.config <- umap.defaults
custom.config$random_state <- 123
custom.config$input <- "dist"
custom.config$n_neighbors <- inp_n_neighbors
custom.config$min_dist <- inp_min_dist
custom.config$spread <- inp_spread
custom.config$n_components <- n_comp
custom.config$metric <- "pearson" #### THE DEFAULT IS EUCLIDEAN DISTANCE BUT I CHANGED IT TO PEARSON"
data_corr <- cor(t(data), method="pearson")
data_dist <- (1 - data_corr)/2
res.umap3<- umap(data_dist, config=custom.config, preserve.seed=TRUE)

The results of res.umap2 and res.umap3 are different

WHEN I USE A PRE-COMPUTED CUSTOM METRIC AS INPUT DISTANCE, WHY THE VAULE ASSIGNED TO "custom.config$metric" CHANGES THE RESULTS? WHERE IS THE PROBLEM WITH MY UNDERSTANDING?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions