-
Notifications
You must be signed in to change notification settings - Fork 0
Google summer of code (GSOC) 2016
This is the page for coordination of the GSoC for scikit-learn.
Scikit-learn is a machine learning module in Python. See http://scikit-learn.org for more details.
Scikit-learn is taking part of the GSoC trough the Python Software Foundation: http://wiki.python.org/moin/SummerOfCode
Difficulty: Scikit-learn is a technical project. Contributing via a GSoC requires a number of expertise in Python coding as well as numerical and machine learning algorithms.
Important: Read: Expectations for prospective students
Application template: https://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2015 Please follow this template.
Also important: A letter from Gaël to former applicants. His suggestions are just as relevant this year.
Hi folks,
The deadline for applications is nearing. I'd like to stress that the scikit-learn will only be accepting high-quality application: it is a challenging, though rewarding, project to work with. To maximize the quality of your application, here are a few advice:
-
First discuss on the mailing list a pre-proposal. Make sure that both the scikit-learn team and yourself are enthusiastic about the idea. Try to have one or two possible mentors that hold a dialog with you.
-
Satisfy the PSF requirements (http://wiki.python.org/moin/SummerOfCode/Expectations) briefly:
- Demonstrate to your prospective mentor(s) that you are able to complete the project you've proposed
- Blog for your GSoC project.
- Contribute at least one patch to the project
I'd add the patch should be somewhat substantial, not just fixing typos.
To contribute patch, please have a look at the [contribution guide] (http://scikit-learn.org/dev/developers/index.html#contributing-code) and the Easy issues in the tracker.
- In parallel with 2, start a online document (google doc, for instance) to elaborate your final proposal, and if you manage to convince mentors, you can get feedback on it.
As a final note, I want to stress that GSOC projects are ambitious: we are talking about a few months of full time work. Thus the ideas proposed are idea challenging, and the students are supposed to draw a battle plan, with difficult variants and less difficult variants. The GSOC is a full major set of contributions, not a single pull request.
Good luck, I am looking forward to seeing the proposals. You'll see, the scikit is a big friendly and enthusiastic community,
Gaël
Disclaimer: This list of topics is currently being updated from last year's, and some information (like the names of possible mentors) is not definitive. Please e-mail the list with any questions.
Possible mentors: Manoj Kumar, Raghav RV, Joel Nothman
Possible candidate:
Application Link:
We have removed Cython files from the repo since https://github.com/scikit-learn/scikit-learn/pull/5492 and re-generate it for every build. This provides a good opportunity to NOT blow up the memory usage by refactoring the functions in the ".pyx" files to use Fused Cython types. This will allow float32
and int32
dtypes where data is being explicitly cast into float64
and int64
. This is obviously a project that affects the codebase extensively and hence the student must provide a detailed proposal as to which specific parts to touch (SGD, Coordinate descent etc) and the proposed line of attack for the summer.
Related issues:
https://github.com/scikit-learn/scikit-learn/issues/5776
https://github.com/scikit-learn/scikit-learn/issues/5464
A good starting point would be to review the existing Pull Request for KMeans (https://github.com/scikit-learn/scikit-learn/pull/6430) and implement fused-types for sparse functions (https://github.com/scikit-learn/scikit-learn/pull/5932)