Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Review]: Introduction to deep learning #25

Open
5 tasks done
svenvanderburg opened this issue Sep 1, 2023 · 79 comments
Open
5 tasks done

[Review]: Introduction to deep learning #25

svenvanderburg opened this issue Sep 1, 2023 · 79 comments
Assignees
Labels
6/approved Lesson has been approved and accepted jose lesson to be submitted to JOSE on approval

Comments

@svenvanderburg
Copy link

Lesson Title

Introduction to deep learning

Lesson Repository URL

https://github.com/carpentries-incubator/deep-learning-intro

Lesson Website URL

https://carpentries-incubator.github.io/deep-learning-intro/

Lesson Description

This is a hands-on introduction to the first steps in Deep Learning, intended for researchers who are familiar with (non-deep) Machine Learning.

The use of Deep Learning has seen a sharp increase of popularity and applicability over the last decade. While Deep Learning can be a useful tool for researchers from a wide range of domains, taking the first steps in the world of Deep Learning can be somewhat intimidating. This introduction aims to cover the basics of Deep Learning in a practical and hands-on manner, so that upon completion, you will be able to train your first neural network and understand what next steps to take to improve the model.

We start with explaining the basic concepts of neural networks, and then go through the different steps of a Deep Learning workflow. Learners will learn how to prepare data for deep learning, how to implement a basic Deep Learning model in Python with Keras, how to monitor and troubleshoot the training process and how to implement different layer types such as convolutional layers.

Author Usernames

@dsmits @psteinb @cpranav93 @colinsauze @CunliangGeng

Zenodo DOI

10.5281/zenodo.8308392

Differences From Existing Lessons

No response

Confirmation of Lesson Requirements

JOSE Submission Requirements

Potential Reviewers

No response

@svenvanderburg
Copy link
Author

We're still running a final round of comments for the paper (see https://github.com/carpentries-incubator/deep-learning-intro/issues/364), the plan is to submit the paper on the 15th of September through: https://openjournals.readthedocs.io/en/jose/submitting.html#submitting-your-paper . I'm not sure how that relates to this review, are the 2 independent or should we wait for this review process to finish before submitting to JOSE?

@tobyhodges
Copy link
Member

Great to see this submission, @svenvanderburg. As a listed contributor, I have a conflict of interest acting as editor for this one. I am going to find another community member who can fulfill the role for this review, and will post back here when it's ready.

To answer your question about review order: we have been asking lesson developers to submit the lesson for review here, before the JOSE review. See #11 and the related review in JOSE for an example.

@svenvanderburg
Copy link
Author

@tobyhodges any update on finding an editor?

@svenvanderburg
Copy link
Author

@tobyhodges ? 😇

@tobyhodges
Copy link
Member

I have reached out to a potential guest editor for this review and am waiting for confirmation. Hoping to be able to follow up very soon!

@svenvanderburg
Copy link
Author

I have reached out to a potential guest editor for this review and am waiting for confirmation. Hoping to be able to follow up very soon!

Perfect, thank you for the update 🙏

@tobyhodges
Copy link
Member

Good news: @brownsarahm has kindly agreed to act as Guest Editor for this review. I am extremely grateful to her for being willing to take this on.

@brownsarahm
Copy link
Collaborator

brownsarahm commented Oct 31, 2023

I'll work on this in small bits, but this way it's all in one place and the authors could work on the (very minor) accessibility issues and one small note on setup that I have checked so far.

Editor Checklist - Intro to Deep Learning

Accessibility

  • All figures are also described in image alternative text or elsewhere in the lesson body.

  • The lesson uses appropriate heading levels:

    • h2 is used for sections within a page.
    • no “jumps” are present between heading levels e.g. h2->h4.
    • no page contains more than one h1 element i.e. none of the source files include first-level headings.
  • The contrast ratio of text in all figures is at least 4.5:1.

  • the check boxes in the prereqs render poorly in the workbench + are an accessibility flag bc they're unlabeled "form elements" these should be removed

  • listed browser support is not matched to jupyter browser support notably, Windows' current default browser is a chromium browser and therefore supported (and on general web compatibility, better than safari by a nontrivial margin) this should be updated to include more windows users without having to install

  • most table headers are empty, but this might not be a problem

  • in ep 2 there is a link to the sklearn docs with the text "here" this is rated "suspcicous" in accessibility terms, recommend "explained in the scikit-learn docs" instead of explained "here"

  • minor syntax issue (: instead of =) causing "missing alt text" on an image in :

Content

  • The lesson teaches data and/or computational skills that could promote efficient, open, and reproducible research.
  • All exercises have solutions.
  • Opportunities for formative assessments are included and distributed throughout the lesson sufficiently to track learner progress. (We aim for at least one formative assessment every 10-15 minutes.)
  • Any data sets used in the lesson are published under a permissive open license i.e. CC0 or equivalent.

Datasets and licenses

  • penguin in CC0
  • weather is CC-BY
  • CIFAR10 license is unclear/not easy to find/ may not exist, but it is public data

other content notes:

  • deep learning workflow exercise in ep 1 solution is not a solution; there can be many answers here but maybe some details of what to check for in an answer would help
  • "just" used dropna in clean missing values is not Carpentries language and can be read as dismissive. "ruin the training data" is also not very precise about the impact and does not acknowledge that there are different ways to handle missing data for reasons that can impact analyses. In line with teachign best practices, a slightly broader coverage/more intentional choice of using dropna, which is fine to do, would improve the lesson here
  • ex reflecting on our resutls is oddly formatted and has an exercise in the solution
  • solution to varying the dropout rate is a term that was introduced in a different form in a parenthetical in the previous episode, but not defined as it is used.

Design

  • Learning objectives are defined for the lesson and every episode.
  • The target audience of the lesson is identified specifically and in sufficient detail.

to fix:

  • question, objectives, and keypoints all display with "" around them

Repository

The lesson repository includes:

  • a CC-BY or CC0 license.

  • a CODE_OF_CONDUCT.md file that links to The Carpentries Code of Conduct.

  • a list of lesson maintainers.

  • tabs to display Issues and Pull Requests for the project.

  • replace this with any further comments relating to the lesson repository.

Structure

  • Estimated times are included in every episode for teaching and completing exercises.
  • Episodes lengths are appropriate for the management of cognitive load throughout the lesson.

comment:

  • all episodes are 55min + episode 3 is over 3 hours long?

Supporting information

The lesson includes:

  • a list of required prior skills and/or knowledge.
  • setup and installation instructions.
  • a glossary of key terms or links out to definitions in an external glossary e.g. Glosario.

other setup note:

  • it says "open a terminal" but that is insufficient information for a Windows user (they have multiple terminals and to launch jupyter only the "anaconda prompt" by default will work; CMD or powershell typically will not)

General

  • replace this with any other comments that do not fit into any of the previous sections.

@svenvanderburg
Copy link
Author

@brownsarahm any update on the progress?

@svenvanderburg
Copy link
Author

@brownsarahm any update? Can you give us an indication when we can expect this to be done?

@brownsarahm
Copy link
Collaborator

Hi! Sorry, a bunch of unexpected things happened last fall, and then when you sent the first check-in I was off of work for the holidays. And the second came while I was in a deadline crunch,

This is now back in my active queue. I should finish the pre-reviewer stuff within a week and I'm looking for reviewers starting now.

@svenvanderburg
Copy link
Author

Hi @brownsarahm. Cool, thanks for the update 🤗

@brownsarahm
Copy link
Collaborator

editorial checks are done and @tobyhodges and I are working on finding reviewers next.

The comment above has a few things for you all to look at now, but thanks for resolving all of the previously identified ones already!

@tobyhodges
Copy link
Member

Thanks @brownsarahm. I have suggested a few reviewers to Sarah but if any of my fellow authors can also suggest anyone they think would be suitable, I am sure it would be helpful. (Please do not tag anyone here by their GitHub handle.)

@brownsarahm brownsarahm added the 1/editor-checks Editor is conducting initial checks on the lesson before seeking reviewers label Jan 31, 2024
@tobyhodges tobyhodges assigned brownsarahm and unassigned tobyhodges Feb 15, 2024
@brownsarahm
Copy link
Collaborator

@svenvanderburg Do you have any updates in response to my final editorial check?

In particular, do you have responses to the concerns about:

  • episode lengths
  • dataset licenses

and ideally before we assign reviewers, it would be nice to resolve, but these are minor:

  • " rendering on episode metadata
  • typos breaking alt-text

@svenvanderburg
Copy link
Author

svenvanderburg commented Feb 18, 2024

Hey @brownsarahm.

Sorry, I totally missed that it was final! Thanks for extra pinging me @brownsarahm :)

Some answers:

Episode lengths

Indeed episode 3 takes a bit longer than the other ones, but not twice longer. In fact, the timing for the other episode is too optimistic: Here is a PR with more realistic timing. In comparison to other lessons the episodes are a bit longer, this is because we want to finish the full deep learning workflow in each episode. When teaching, this is not really a problem though, the workshop is actually pretty balanced in terms of cognitive load because in every episode we go through this deep learning workflow once, and conceptually it makes sense to have the cuts between episodes at these points. What do you think? We could maybe write this explanation in the introductory instructor notes?

Dataset licenses

If I understand correctly, the only problem is with the CIFAR-10 dataset. It is so widely used, but I never realized it doesn't actually have a license judging from the official website.

I found a paper that extracts the statement about citation as their license:
Running example. In this paper, we download the CIFAR-10 dataset from its official website. Also on the CIFAR-10 website, we find the following request from the dataset creators: “Please cite (Krizhevsky et al., 2009) if you intend to use this dataset," alongside a link to the paper. We extract this as the dataset’s license.

It is actually a crawled dataset, and in the paper I don't read anything about the crawled images being under open-source license....

Anyway: do you think it is a problem that we use this dataset? It is such a central dataset in the field and so widely used. We would have to change the entire episode if we use a different dataset. We can write a comment about in the instructor notes.

Small issues

I resolved the two small issues you referred to, will be reviewed soon by one of my colleagues. We will soon pick up any other remaining issues.

Would be good to enter the next phase of reviewing. Let us know what we have to do to help this progress.

@brownsarahm
Copy link
Collaborator

Timing

Thanks for the explanation. I think at one level your strategy for putting breaks in the content makes sense. I am not certain if your claim about cognitive load is true, but also not certain that it is false.

However 3.5 hours without a break is a long time and most instructors will not want to give a break in the middle of an episode.

Maybe an instructor note reminding about breaks? (as context, I'm a maintainer on instructor training and we get a lot of complaints about not enough breaks there and we have them ~every 90 minutes).

Dataset License

In my understanding of carpentries policy a permissive license is required. Since the dataset was crawled, it likely has zero consent to be using the images from the original owners of the images. I think it probably does not have the same risks as imagenet, but should be checked. On the other hand, this is clearly, to me, an intended use by the people who curated the images into a dataset, despite them not putting a license on it and them possibly not having appropriate rights to the images either.

Whether it is okay or not is going to be up to Carpentries policy about the situation where there is no license. @tobyhodges can you help navigate that or let us know who else in the carpentries should be looped in?

@tobyhodges
Copy link
Member

Thanks for tagging me @brownsarahm. I need to do a bit more reading and thinking about this, and will come back soon with a full response.

@tobyhodges
Copy link
Member

Thanks for your patience while I took some time to read through the relevant pages and documents, and to reflect on the most appropriate course of action. I am sorry to say that I think we should replace the dataset in the lesson.

The lack of a license file in the dataset is somewhat problematic, even though the authors clearly intend for the dataset to be re-used and usage in the lesson is within the terms stated on their website. But my biggest concerns are with the unethical way in which the data was "collected." Images were scraped and modified for the dataset without any attempt at seeking permission from the copyright owners or giving them attribution, which feels unethical to me regardless of any arguments over its legality. (I am not a lawyer but it seems like the use may fall under "fair dealing" in Canadian copyright law, where the researchers who published the dataset are based.)

In Collaborative Lesson Development Training, alongside considerations of licensing, size, and complexity, we ask lesson developers to consider the ethics of the example datasets they include in their lessons. I would like to apply the same standard to lesson reviews in The Carpentries Lab.

I acknowledge that replacing the dataset will require significant new work on the part of the authors, and perhaps I should have noticed sooner and avoided some of this inconvenience. @svenvanderburg for my part, I would like to devote some time in the coming weeks to try to make the necessary changes (as I am already one of the authors). I hope I will be able to propose some alternative datasets soon, and of course it would help to have input from others with more DL experience than I have. However, please also be aware that you can withdraw the lesson from review here if you prefer.

Finally, many thanks to @brownsarahm for catching this and looping me into the discussion.

@svenvanderburg
Copy link
Author

svenvanderburg commented Feb 28, 2024

Timing

Ha, no 3,5 hour teaching without breaks is absurd! At the Netherlands eScience Center we usually teach in a schedule like [for this recently taught workshop](this https://esciencecenter-digital-skills.github.io/2024-02-05-ds-dl-intro/#schedule). Never more than 90 minutes of teaching! And I think beta pilots copied that schedule in rough lines. It doesn't matter that it is in the middle of an episode.

In addition, we swap instructors halfway the episodes which makes teaching load lighter as well.

See https://github.com/carpentries-incubator/deep-learning-intro/issues/446 for addressing this, do you agree @brownsarahm ? And thanks for bringing this up, this is a great outcome of the review. Since we always use the same schedule no matter what lesson material we use we had a blind spot here.

License

Thanks @tobyhodges for digging into the CIFAR10 license. I agree we should change it, indeed it goes against everything the Carpentries stands for...

So, the remaining issues to fix before the review are:

(and some more small comments from Sarah that we will definitely pick up the coming period but are not essential to do before the review)

Can you confirm this @brownsarahm ?

@brownsarahm
Copy link
Collaborator

Yes, this is correct, these two issues would get it to a point where it is ready for review.

@tobyhodges
Copy link
Member

Indeed. This is an issue with varnish, the component of the Workbench that applies styling etc to the lesson website. Specifically, it is something to do with how the MathJax JavaScript library is rendering the equation in Chrome (broken on when viewed in that browser on my system too). I am investigating further and will open an issue and/or PR on https://github.com/carpentries/varnish in due course. There is nothing to be done on the DL lesson repository to resolve this in the meantime.

@brownsarahm
Copy link
Collaborator

that it worked in firefox made me think it's not a workbench issue, so i did a quick search and found a solution that attributes it to macos but is a quick change to make:

  1. Right-click on some of the misrendered MathJax.
  2. Click on "Math Settings".
  3. Click on "Math Renderer".
  4. Click on "Common HTML".

Maybe it is worth putting a tip box somewhere in the setup or instructor notes with this solution since readers of the lesson can fix it for themselves?

@carschno
Copy link

Maybe it is worth putting a tip box somewhere in the setup or instructor notes with this solution since readers of the lesson can fix it for themselves?

I have added a PR: https://github.com/carpentries-incubator/deep-learning-intro/pull/539

@tobyhodges
Copy link
Member

Great catch @brownsarahm, thanks 🙌

@carschno
Copy link

With https://github.com/carpentries-incubator/deep-learning-intro/pull/530, the final pending PR to address the review comments has now been merged.
Are we ready to finalize the review then @brownsarahm?

@brownsarahm brownsarahm added 4/review(s)-in-awaiting-changes One or more reviewers has submitted their review; awaiting response and/or changes from author(s) 5/awaiting-reviewer(s)-response Author(s) has addressed reviewer comments; awaiting reviewer response and removed 3/reviewer(s)-assigned Reviewers have been assigned; review in progress 4/review(s)-in-awaiting-changes One or more reviewers has submitted their review; awaiting response and/or changes from author(s) labels Jan 17, 2025
@brownsarahm
Copy link
Collaborator

I had forgotten to update this to step 4 when the second review came in but we have now jumped to step 5! 🎉

@likeajumprope and @mike-ivs please take some time to look through the authors' responses above and the updated version of the lesson. When you are ready, post back here to ask follow-up questions, let the authors know about any additional changes you would like them to make, or recommend that we accept the lesson to The Carpentries Lab. You may find it helpful to refer to the Reviewer checklist again if you would like to use that as a guide.

We are so close team!

@mike-ivs
Copy link

Sorry for the slight delay, i've just moved country! I've been following along with the updates/improvements that the team have been doing to the repo and i'm happy to say I'd recommend accepting the lesson to The Carpentries Lab!

@svenvanderburg
Copy link
Author

Thank you @mike-ivs and wish you all the best settling in your new home 🙏

@likeajumprope
Copy link

Hi all, sorry I did not get the ping - I have had a look and it looks great! I am also very happy to accept the lesson to the Carpentries Lab!

@brownsarahm brownsarahm added 6/approved Lesson has been approved and accepted and removed 5/awaiting-reviewer(s)-response Author(s) has addressed reviewer comments; awaiting reviewer response labels Feb 6, 2025
@brownsarahm
Copy link
Collaborator

brownsarahm commented Feb 6, 2025

Congrats! 🎉 @svenvanderburg @dsmits @psteinb @cpranav93 @colinsauze @CunliangGeng 🎉

and thanks so much to @mike-ivs and @likeajumprope for your detailed reviews

and to @tobyhodges for contributing to the lesson and helping me so much with navigating serving as an editor.

I am so happy to see this ready for acceptance to the Carpentries lab!

the post acceptance steps:

  1. Can @tobyhodges help with the JOSE connection?
  2. it seems like life_cylce is already updated? source
  3. labels updated! 🎉
  4. default branch is already main 🎉
  5. cloud hosting seems like a Toby (core team) task?
  6. Zenodo is also a Toby (core team)

if I am supposed to be able to help with 1,5, or 6 i just need more instructions

@tobyhodges
Copy link
Member

This is fantastic news, thanks so much @brownsarahm, @mike-ivs and @likeajumprope 🙌

I can take over from here, as we will need to transfer the lesson repository into the @carpentries-lab organisation before we make the JOSE submission. I am traveling this week but have scheduled the tasks for Monday.

@svenvanderburg
Copy link
Author

Awesome! Thank you so much @likeajumprope 🙏

The lesson improved a lot in the review process. We taught the updated lesson 2 weeks ago and it worked really well.

Looking forward to the next steps!

@tobyhodges I think it's good to do some edits to the JOSE paper, see https://github.com/carpentries-incubator/deep-learning-intro/issues/505 Not sure if we should wait with submitting before that, we will do the update in the next 2 weeks.

@psteinb
Copy link

psteinb commented Feb 10, 2025

This is wonderful news. I was lurking this issue since it was conceived. I congratulate all of those who were involved. Thanks so much to the current development team of the deep-learning-intro and to all that contributed to this wonderful review!

🙏

@svenvanderburg
Copy link
Author

Here is the final update to the paper: carpentries-lab/deep-learning-intro#554. This is about to be checked by the authors. @brownsarahm, @mike-ivs, and @likeajumprope you can find yourselves in the acknowledgements :) Let me know if there are other ways we can acknowledge the hard work that went into this review process.

@tobyhodges
Copy link
Member

tobyhodges commented Feb 11, 2025

@svenvanderburg @carschno @dsmits @psteinb @cpranav93 @colinsauze @CunliangGeng the lesson repository has been moved into the @carpentries-lab organisation, and I have created a placeholder repository in the old location to redirect visitors to the new lesson website URL.

You will need to update the remote URL on any local clones of the lesson that you are maintaining. Assuming you call this remote origin and are conecting via SSH, you should run this in the repository:

git remote set-url origin [email protected]:carpentries-lab/deep-learning-intro.git

@likeajumprope
Copy link

Here is the final update to the paper: carpentries-lab/deep-learning-intro#554. This is about to be checked by the authors. @brownsarahm, @mike-ivs, and @likeajumprope you can find yourselves in the acknowledgements :) Let me know if there are other ways we can acknowledge the hard work that went into this review process.

Perfect, thank you <3

@tobyhodges tobyhodges added the jose lesson to be submitted to JOSE on approval label Feb 11, 2025
@colinsauze
Copy link

@svenvanderburg @carschno @dsmits @psteinb @cpranav93 @colinsauze @CunliangGeng the lesson repository has been moved into the @carpentries-lab organisation, and I have created a placeholder repository in the old location to redirect visitors to the new lesson website URL.

It's great to see this lesson graduate from the incubator into the lab.

Could we make sure that there is a Github pages site for the old repository (https://carpentries-incubator.github.io/deep-learning-intro/) containing a link to the new lesson. Learners from past workshops might have that bookmarked instead of the Github source page and might just think it has disappeared if they are trying to find the notes.

@bpmweel
Copy link

bpmweel commented Feb 12, 2025

Not sure if its inappropriate for me to comment on a repository I haven't been involved in for over 3 years, but...

I just want to congratulate everyone involved in successfully navigating and developing this lesson to its inclusion in the carpentries-lab!

Special shout-out to @svenvanderburg, you have shown some real tenacity the past few years to get the lesson to this point. Congratulations!

@tobyhodges
Copy link
Member

Could we make sure that there is a Github pages site for the old repository (https://carpentries-incubator.github.io/deep-learning-intro/) containing a link to the new lesson.

Thanks for raising this @colinsauze. That was my intention, and your comment helped me notice that I had forgotten to turn on GitHub Pages! I have done that now and the redirect is working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6/approved Lesson has been approved and accepted jose lesson to be submitted to JOSE on approval
Projects
None yet
Development

No branches or pull requests

9 participants