Skip to content

Conversation

@markdouthwaite
Copy link
Owner

Xanthus 0.1.0

This is the first release of Xanthus, a Neural Recommendation Model package implemented in Python on top of TensorFlow and utilising the high-level Keras API. Xanthus came into existence as an exercise in implementing and replicating the results of a relatively current ML paper and to try out some of the new features of TensorFlow 2.0 (and changes to the Keras API over the last couple of years!).

Release notes

Here's what's in the box:

Models

Three neural recommender models implemented with the Keras Model API:

  • GeneralizedMatrixFactorization (GMF) - This model generalizes 'classic' matrix factorization (MF) as a neural model. By using the pointwise negative sampling approach outlined in literature, this model can produce higher performance than some 'classic' MF approaches for some datasets.
  • MultiLayerPerceptron (MLP) - A model with two input embedding blocks feeding into a 'classic' Multi-Layer Perceptron (MLP) block. As demonstrated in the literature, this architecture benefits from the depth of the model over 'shallower' models such as the GMF model in some cases.
  • NeuralMatrixFactorization (NMF) - This model combines the GMF and MLP models into a single model. Theoretically with the benefits of both!

Bonus features

  • Metadata support - The implementations of the above models (+ supporting utils) in this package are implemented to make it easy to quickly introduce metadata into your recommendation models. This means Xanthus natively supports 'hybrid' recommendations (interaction data + user/item metadata). This is mentioned in He et al's work, but not implemented or assessed. Here's an example if you're interested.
  • TensorBoard support - By using the Keras Model API, Xanthus natively supports TensorBoard for model training and monitoring -- plus custom callbacks too. Why not Slack yourself after each training epoch? What could possibly go wrong?

Data Utilities

Getting your data encoded neatly and quickly, generating useful training and evaluation datasets and getting that data into a format that can be used by your models can be a fiddly and time consuming process. To alleviate some of these issues and to help you get stuck into tuning your models, Xanthus provides the following utilities:

  • xanthus.datasets.Dataset - A utility class for quickly and (relatively) efficiently building recommendation-friendly datasets, with a bunch of bundled utilities for manipulating these datasets too.
  • xanthus.datasets.DatasetEncoder - Another utility class for encoding and decoding datasets, and to aid in preserving consistency across split datasets (i.e. train/test datasets).
  • xanthus.evaluate.split - An implementation of the 'Recommender Split' implemented as part of the Azure ML Studio. This gives you the option of sampling hold-out interactions, selecting subsets of interactions, and ensuring consistency between the resulting train and test sets.
  • xanthus.evaluate.create_rankings - An implementation of the common ranking evaluation protocol used for recommendation models where n 'positive' items (items a user has interacted with, but weren't present in the test set) are appended to m 'negative' items (items a user hasn't interacted with). The model can then be queried to generate a ranking for these items, with the hope that 'positive' items will appear higher in the query results.

Bonus Features

But wait, there's more! Xanthus implements some common recommendation model metric functions including:

  • xanthus.evaluate.metrics.ndcg - An implementation of the Normalized Discounted Cumulative Gain (NDCG) metric. Yes, that is a reference to Wikipedia.
  • xanthus.evaluate.metrics.hit_ratio - An implementation of the common 'hit ratio' metric used in many recommendation model evaluation activities (see also xanthus.evaluate.metrics.precision_at_k).
  • xanthus.evaluate.metrics.truncated_ndcg - A special-case NDCG implementation that has some performance optimizations for cases when the target set consists of a single 'positive' item in a set as opposed to the more general case addressed above.

Additionally, to make using these functions easier, you can use:

  • xanthus.evaluate.metrics.score - A utility function for executing a map operation over a set of recommendations, applying a provided metric function, and then returning these scores as a NumPy array. This function provides support for parallel processing too!

Finally, if you're interested in 'coverage' metrics, there's:

  • xanthus.evaluate.metrics.coverage_at_k - Coverage metrics can be handy for understanding how diverse your model's recommendations are -- exploring product catalogues is often a major motivation for recommenders in the first place, so 'pure' accuracy and ranking metrics (as above) might not give you the full picture.

Notes

Xanthus has been implemented with the aim of helping new users get a decent neural recommender model working as quickly as possible. From this point of view, it could be a good starting place for folk trying to get started with neural recommendation models.

That said, while neural models sound exciting and might attract attention, you might find that 'classic' recommendation models fit you're use-case better: 'lightweight' matrix factorization approaches are often simpler, faster and easier to use, so you might do well to look at those first. If you're interested, you should check out:

Disclaimer

The neural architectures implemented in this package are (currently) based directly upon He et al's work on Neural Collaborative Filtering. This team has their own repository with the code they used in their paper. It's a good paper, I encourage you to check it out!

douthwaite-io and others added 15 commits July 16, 2020 08:10
* ADDED - New Keras-compliant GMF, MLP and NMF models. These now use the Keras model API directly.
* ADDED - `BatchedDataset` class for streaming datasets as generators (limited use until `negative_sample` function is converted to a generator).
* ADDED - `reshape_recommended` utility function for reshaping outputs of a `Model.predict` call into nice neat recommendations.
* ADDED - `xanthus.models.utils.InputEmbeddingBlock` as custom layer to build each of the input blocks for the NMF architectures.
* UPDATED - `he_sampling` has been renamed `create_rankings`, and has the `unravel` parameter for passing rankings to `Model.predict` methods.
* UPDATED - The previous model API defined in the original prerelease is available under the `xanthus.models.legacy` subpackage. This will be removed in a later version.
* UPDATED - `Dataset` class now has `user_dim` and `item_dim` utility methods.
* UPDATED - `Dataset` class now has the `batched` method for generating a `BatchedDataset` version of itself.
* UPDATED - Improved type annotations throughout.
* UPDATED - Various errors and warning messages clarified slightly.
* UPDATED - The `xanthus.models.utils.batched` function has moved to `xanthus.datasets.utils.batched`.
* UPDATED - Bumped TensorFlow version to `tensorflow==2.3.0`.
* ADDED - `movielens` module to `xanthus.datasets`, moved `download` to this module. You can now download and load Movielens with `xanthus.datasets.movielens.download` and `xanthus.datasets.movielens.load` respectively.
* UPDATED - Refactored `create_datasets` to be `build` under `xanthus.datasets.build.build`.
* UPDATED - `examples/advanced_training.py` to use the latest versions of the Xanthus models.
* UPDATED - `setup.py` to bump TensorFlow to `tensorflow==2.3.0`.
…-model

Updates neural models to implement the Keras Model API.
* UPDATED - `examples/metadata.py` now uses `fire`, and utilizes the 'new' Keras API.
* UPDATED - Fixed some type annotation errors in `xanthus.models.neural` and `xanthus.models.utils`.
* UPDATED - Bumped version!
…-model

Update examples and type annotations.
@markdouthwaite markdouthwaite added 0.1.0 Related to 0.1.0 release umbrella This issue relates to a collection of possible issues and acts as the parent issue for these. labels Aug 15, 2020
@markdouthwaite markdouthwaite added this to the 0.1.0 milestone Aug 15, 2020
@markdouthwaite markdouthwaite self-assigned this Aug 15, 2020
douthwaite-io and others added 11 commits August 15, 2020 14:52
* UPDATED - `README.md` to include new links and drop superfluous info.
* ADDED - New 'benchmarking' tool (`ModelManager`, `benchmark`, `save`).
* UPDATED - Applied black.
* UPDATED - Refactored various files to point at the correct import paths.
Adds new benchmarking utils, fixes import path issues.
…2.3.1

Bump tensorflow from 2.3.0 to 2.3.1
…2.5.0

Bump tensorflow from 2.3.1 to 2.5.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

0.1.0 Related to 0.1.0 release umbrella This issue relates to a collection of possible issues and acts as the parent issue for these.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants