⭐️ Entity embedder interface is here #286

EssamWisam · 2024-12-27T17:52:27Z

Add an interface for EntityEmbedder that can wrap any basic deep learning model in an unsupervised model.

using MLJFlux
using MLJ
using CategoricalArrays

N = 200
X = (;
    Column1 = repeat(Float32[1.0, 2.0, 3.0, 4.0, 5.0], Int(N / 5)),
    Column2 = categorical(repeat(['a', 'b', 'c', 'd', 'e'], Int(N / 5))),
    Column3 = categorical(repeat(["b", "c", "d", "f", "f"], Int(N / 5)), ordered = true),
    Column4 = repeat(Float32[1.0, 2.0, 3.0, 4.0, 5.0], Int(N / 5)),
    Column5 = randn(Float32, N),
    Column6 = categorical(
        repeat(["group1", "group1", "group2", "group2", "group3"], Int(N / 5)),
    ),
)

y = categorical([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])           # Classification

NeuralNetworkClassifier = @load NeuralNetworkClassifier pkg=MLJFlux

clf = NeuralNetworkClassifier(embedding_dims=Dict(:Column2 => 2, :Column3 => 2))

emb = EntityEmbedder(clf)

mach = machine(emb, X, y)

fit!(mach)

Xnew = transform(mach, X)
Xnew

PR Checklist

Tests are added
Documentation, if applicable

ablaom · 2024-12-28T06:18:26Z

Thanks @EssamWisam for this valuable contribution.

Do we know why buildkite tests are failing?

EssamWisam · 2024-12-28T09:30:46Z

Everything runs fine locally. For buildkite logs:

Potential change that triggered this is adding the rng argument here:

        clf = models[1](
                builder = MLJFlux.Short(n_hidden = 5, dropout = 0.2),
                optimiser = Optimisers.Adam(0.01),
                batch_size = 8,
                epochs = 100,
                acceleration = CUDALibs(),
                optimiser_changes_trigger_retraining = true,
                embedding_dims = embedding_dims[3],
                rng=42
            )

It's needed to check that the output is the same with or wirhout the wrapper.

ablaom · 2024-12-28T21:27:18Z

Thanks @EssamWisam for the diagnostics. That's very helpful. The good news is that a warning we added previously has correctly flagged the issue.

The build kite tests include GPU tests, which is why you are not seeing the issue locally, I expect. Reproducibility using RNGs on a GPU is a can of worms, and we need to dodge that. The layers are initialised on the CPU and moved across. It's just the dropout that causes the problems.

Can you please try this:

get rid of the dropout (set to 0)
in a prelminary step, define stable_rng=StableRNG(123), and in your classifier set rng=stable_rng instead of seeding it.
make a second classifier clf2 = deepcopy(clf). Use clf2 instead of clf in your second machine (line 204). (At present, when you fit using clf, it's rng hyperparameter is being mutated. By making a deep copy, we ensure clf2 has an rng beginning in the same state. Probably, we should change fit to make a deep copies of the rng before use, to ensure the model is never mutated, but that's a discussion for another day.)

Unrelated comment: It looks like models[1] was (at some time) meant to be models[i]. Is the more restrictive test intentional? Probably testing one of the models is sufficient, so I don't have a problem with this. I just noticed it.

EssamWisam · 2024-12-29T10:24:28Z

I did deep copying and no dropout as you said. When I use the stable_rng variable the equality fails so I left it as 42.

EssamWisam · 2024-12-29T10:25:36Z

It looks like models[1] was (at some time) meant to be models[i].

Indeed, I can look into this later.

ablaom · 2024-12-30T02:22:46Z

Okay, sorry, we need something without a dropout layer at all. How about something like builder = MLP(hidden=(10, 10))?

EssamWisam · 2024-12-31T07:34:33Z

Done.

src/mlj_embedder_interface.jl

ablaom

Thanks for the great work. Please see what you can do with my suggestions.

Co-authored-by: Anthony Blaom, PhD <[email protected]>

…into entity-embedder

EssamWisam · 2025-01-17T03:12:20Z

@ablaom I took actions regarding all the points. Please check if it's ready now and thank you.

ablaom

This is great work, thanks.

One final thing I noticed is that NeuralNetworkBinaryClassifier is supported by the wrapper, but this is missing from the docs.

Otherwise good to go

src/mlj_embedder_interface.jl

EssamWisam · 2025-01-20T02:12:14Z

This is great work, thanks.

One final thing I noticed is that NeuralNetworkBinaryClassifier is supported by the wrapper, but this is missing from the docs.

Otherwise good to go

I used to think that NeuralNetworkBinaryClassifier is redundant (and did not know its exposed in docs) because NeuralNetworkClassifier exists and Sigmoid function is a special case of Softmax when there are two classes.

I do indeed wonder why both exist, now that I realize that the binary classifier is exposed in MLJFlux docs.

In any case, I added it in the docs here. Feel free to merge.

ablaom · 2025-01-20T02:29:25Z

I used to think that NeuralNetworkBinaryClassifier is redundant (and did not know its exposed in docs) because NeuralNetworkClassifier exists and Sigmoid function is a special case of Softmax when there are two classes.

This classifier was added by @tiemvanderdeure . My understanding is that it provides some binary-specific optimisations, which was worth doing as it is such a common use case.

EssamWisam · 2025-01-20T02:36:04Z

I used to think that NeuralNetworkBinaryClassifier is redundant (and did not know its exposed in docs) because NeuralNetworkClassifier exists and Sigmoid function is a special case of Softmax when there are two classes.

This classifier was added by @tiemvanderdeure . My understanding is that it provides some binary-specific optimisations, which was worth doing as it is such a common use case.

If a Sigmoid is used then I can imagine that the final matrix multiplication does becomes faster because the corresponding matrix now has one less column. That said, if it's true that they are mathematically equivalent, in the way I understand, then it may be better to automatically switch the NeuralNetworkClassifier to Sigmoid when there are two classes and only expose it.

Just sharing thoughts.

tiemvanderdeure · 2025-01-20T09:20:05Z

Yes, I added the binary classifier because it wasn't possible to use a sigmoid finalizer or binary cross-entropy loss with the existing classifier.

That said, if it's true that they are mathematically equivalent, in the way I understand, then it may be better to automatically switch the NeuralNetworkClassifier to Sigmoid when there are two classes and only expose it.

I'm not against this at all. I wasn't aware that they are mathematically equivalent. I think I did some testing and got better results with the binary case (or it was just faster, I don't exactly remember).

⭐️ Entity embedder interface is here

0d738d3

👨‍🔧 Zero dropout and deep copy

4ce351e

Update entity_embedding.jl

757c186