LDA and ProdLDA #10

pawel-czyz · 2023-04-11T10:55:53Z

Consider an addmixture model, where each mutation $Y_{ng}\in {0, 1}$ is generated from a "topic" $Z_{ng}\in {H, 1, ..., K}$, where $H$ is a "healthy" topic, with $P(Y_{ng}=1\mid Z_{ng}=H) \ll 1$.

Then, we can use an LDA-like model where instead of word positions we have enumerated genes and the vocabulary at each position is ${0, 1}$, sampled from the Bernoulli distribution. Hence, the mixing matrix is again $\eta_{kg} = P(Y_g=1\mid Z_g=k)$ and is interpretable (as it can be made sparse using e.g., $\mathrm{Beta}(0.1, 0.1)$ distribution).

Inference in LDA and closely-related ProdLDA can be implemented e.g., in NumPyro.

This task should be split into several smaller tasks, for example:

Simulate data sets according to LDA and ProdLDA models.
Experiment with the implementation provided. See whether simulations match the results.
If the results are satisfactory, incorporate LDA and ProdLDA into the codebase.

pawel-czyz added 🚂type: enhancement New feature or request ⏳priority: low 🐘effort: large labels Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LDA and ProdLDA #10

LDA and ProdLDA #10

pawel-czyz commented Apr 11, 2023 •

edited

Loading

LDA and ProdLDA #10

LDA and ProdLDA #10

Comments

pawel-czyz commented Apr 11, 2023 • edited Loading

pawel-czyz commented Apr 11, 2023 •

edited

Loading