Skip to content

Add support for mixed kernels #21

@kjohnsen

Description

@kjohnsen

I explain the idea in more detail here, but in practice one may often want to analyze data with a mix of continuous and discrete variables. Every other package I've looked at that accommodates a mix of continuous and discrete is limited to discrete X and continuous Y or vice-versa. Or in CMI, Z is treated as one or the other. Also, even among continuous variables, one may want to treat them with different kernels. For example, a Gaussian kernel for linear variables, but a von Mises kernel for circular variables. Or even use kernels of the same family, but with different bandwidth.

This maximally flexible approach would be to allow the user to specify kernel_groups=... that takes a series of kernel definitions and the indices of the variables that should be treated with it. The final kernel would then be a product of them all.

The API might look like this:

# the one-kernel-fits-all default:
im.cmi(x, y, cond=z, approach='kernel', kernel='box', bandwidth=1.5)

# the separate kernel groups idea
im.cmi(x_space1, x_space2, x_circular, y, cond=(z1, z2), approach="kernel", kernel_groups=[
    ('gaussian', 0.7, [0,1]), 
    ('von mises', 'scott', [2]),
    ('gaussian', 0.4, [3]),
    # for discrete Z, though X's could be discrete or Z's could be continuous too
    ('AA', 0.1, [4]),
    ('dirac', None, [5]),
])

This assumes you'd treat the kernel would take care of the distance metric as well as the kernel itself. Maybe it would be better to separate those.

For reference, this paper describes treating unordered discrete variables as having distance 1 or 0 when value are different or the same, respectively. I guess you'd then use those distances in whatever metric you choose. https://arxiv.org/pdf/1912.03387#page=6.61
Another approach is the Aitken-Aitchison kernel, which (skipping the distance) assigns $\lambda$ and $1-\lambda$ for values that are different or the same..
Looks like statsmodels does something to combine kernels: https://www.statsmodels.org/v0.13.5/nonparametric.html

I don't have time to implement this myself anytime soon, unfortunately!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions