Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a simple NaiveBayes model #80

Open
pommedeterresautee opened this issue Jul 31, 2015 · 4 comments
Open

Add a simple NaiveBayes model #80

pommedeterresautee opened this issue Jul 31, 2015 · 4 comments

Comments

@pommedeterresautee
Copy link
Contributor

@wush978 , I understand that this package has a very specific purpose : hashing of a matrix.
It is not a general ML package at all, and should not try to mimick carret or scikit.
However it will be mainly used in ML context.

I was thinking to add a simple NaiveBayes function to the package.

The idea is to provide a very intuitive model, with very few parameter which can be used by many as a base line for their tests.

I have noticed that none of the existing package support dgCMatrix matrix and even if you convert it to spam format which is sometimes supported, the matrix is converted to a dense data.frame before computation!!!! So you almost always finish with a memory allocation error.

I have already written a basic code for sparse Matrix working well on dgCMatrix matrix.

Moreover this basic model may be used in the example instead of more advanced one like XGBoost.

Can you tell me if you want me to make a PR?

Kind regards,
Michael

@pommedeterresautee pommedeterresautee changed the title Add a simple NaiveBaye model Add a simple NaiveBaye models Jul 31, 2015
@pommedeterresautee pommedeterresautee changed the title Add a simple NaiveBaye models Add a simple NaiveBayes models Jul 31, 2015
@pommedeterresautee pommedeterresautee changed the title Add a simple NaiveBayes models Add a simple NaiveBayes model Jul 31, 2015
@wush978
Copy link
Owner

wush978 commented Aug 1, 2015

@pommedeterresautee ,

I still do not want to include the functions which does not related to hashing of a matrix.

However, there is a compromising way to put the NaiveBayes into the package. I did include a ML algorithm FTPRL into this package, but it is not an official function for this package. It is for the vignette.

So, I suggest you to implement the NaiveBayes like what I did for FTPRL and write a vignette for showing how to do analysis with both FeatureHashing and NaiveBayes. How do you think?

Best,
Wush

@pommedeterresautee
Copy link
Contributor Author

My concern is that Vignette content is not well referenced by most R documentation search motor.
And results from these search motors are well referenced on Google.
So in some way few people would be aware.

Another solution would be to put the two algorithms in a kind of FeatureHashing "accessory" package like many packages do for datasets.

IMO Naive Bayes is too simple to deserve its own package but with FTPRL it would make sense...

However it means maintaining two packages, may be not that good idea.
Let me know what you think.

Btw I have no strong feeling against putting NB in a Vignette text too :-)

Kind regards,
Michael

@wush978
Copy link
Owner

wush978 commented Aug 3, 2015

I agree your point that the naivebayse is too simple to deserve its own package. However, writing them in vignette makes sense to me. For advance users, they can write their own implementation. For beginners, they need vignette to understand why and when they should use the naivebayse with FeatureHashing.

I think we could put the implementation under inst/ first. Once we collect many of them, we could write a package for them.

Wush

@pommedeterresautee
Copy link
Contributor Author

It sounds ok to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants