-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paper: NLP-in-the-real-world: Tool for building NLP solutions #921
Conversation
Curvenote Preview
|
Hi! I'm excited to be reviewing in this new format, but I admit that I am a little lost... Where can I find the code that accompanies the paper? When I click the link on this page to open Jupyter notebooks I get the error "error - Error: repo is required for github provider - Failed to connect to binderhub at https://xhrtcvh6l53u.curvenote.dev/services/binder/". Due to the reliance of the paper on the content of the notebooks, it is important that I be able to read them together to effectively assess the contribution. Thanks! Jane |
Hi @janeadams |
Hi @janeadams and @jsingh811 - just a quick comment on the technical side for the "open Jupyter notebooks" link. This was briefly enabled last week, and we disabled it; this is something we are excited to pilot, hopefully it will be available for 2025 proceedings more generally. |
After reviewing the notebooks, I have some concerns about this submission. My primary concern is that the submission here is only a small supplement to the primary contribution, which is the code, which is itself a supplement to an existing published work. I am not sure that this submission meets the basic criteria for SciPy proceedings, as it is not a novel contribution. Evaluating only the paper which has been submitted here:
Additionally, I have several concerns about the code itself. While the code is not included in this submission, and therefore I can't comment on it directly in this repo, I noticed several major problems:
I am not sure that these changes are within the scope of the review cycle, and would recommend foregoing inclusion of this submission in this year's SciPy proceedings. I think the notebooks might make a helpful foundation for a workshop (which is far more ephemeral and therefore not beholden to longevity concerns the way archival proceedings are) and would encourage the authors to consider building a wiki to reference package documentation, so that the maintainers of the packages referenced are always the go-to source for the latest implementation. I would have liked to see more content about why a user should choose one method over another, as I believe this guidance would have been a more valuable and novel contribution. |
Hi Thank you for your comments. I observe that most of the feedback is heavily centered around the notebooks and want to highlight that the notebooks are not the main contribution of this paper. I feel the tool selection (toolkit) part of the submission might have been missed. The repo contains 1) notebooks and 2) a toolkit component. The notebooks are an augmentation to a book. The toolkit is not. But the toolkit is certainly inspired by those previous contributions. The toolkit also contains a readme with instructions and requirements.txt. The notebooks are not meant to be the star of the submission, but I'm talking about the repo in this submission, and thus I have only mentioned the notebooks. The notebooks are designed to serve as standalone guides for users, independent of other files, which is typically how people start coding in this field and use them for experimentation. Please find my responses to some of your other comments below.
This information is shared in the paper. Kindly see the Tool Selection section. Example:
If you recommend adding more such details, I can do that.
The toolkit component is not reliant on the book. Any reader without access to any other materials will be able to follow.
There is a mix of opinions on word clouds. Word clouds can be engaging and easy to digest. However, it is good to not expect in-depth analysis or comparisons from them. Wordclouds are still quite heavily used and in some cases even preferred. That being said, nothing is a general solution and the user has to use it with caution and understanding. There are plans to add more visualization tools in the future version of the toolkit. I've not addressed some of the other comments since they are focussed on the notebooks rather than the tool selection toolkit. Here is what I plan to do to address your comments:
My intent is to showcase the tool, with focus on the toolkit. I gave the Tool Selection section the most content in the paper as well, in the attempt to make it the bulk of the submission. The toolkit, in my opinion, is the start of a very useful decision-making assistant for developers in this space, and represents novelty. When getting started on a new NLP task, Data scientists need to try various tools, experiment with solutions, learn from manual data observations, all to gauge the suitable approach for solving the problem. The toolkit brings together experience and knowledge about underlying data in different models to aid in this process. The alternative would be to spend time to experiment and read lots of material to short-list tools depending on the data and problem. According to this report(https://businessoverbroadway.com/2019/02/19/how-do-data-professionals-spend-their-time-on-data-science-projects), Data Scientists spend 20% of their time building and selecting models, which has a lot of opportunity to reduce and be optimized. This time is even larger for individuals working on a new problem that they haven't worked on before. I hope the above clarification helps address your comments. Any which way, I appreciate your review and thanks for taking the time. |
My apologies for not fully understanding what is and isn't part of the submission... This method of reviewing is new to me. I think a diagram would be a great addition to the paper. I see that the contributions are buried in the toolkit, and for the paper to be a contribution on its own, I think the best approach would be to pull out this information into text and tables or figures. For example, these two excerpts from the toolkit give examples of conditions under which one model would be better suited than another:
and
I see that these choices are briefly alluded to in the paper, but I think for the paper to stand on its own, those choices should be pulled out and explained. Why is VADER better for social media sources than review comments? Why is spacy better in cases where the corpus is not domain specific? As a reader, I would like to be able to come with my own data set and think about these criteria in my use case context. Importantly, a reader should come away feeling more confident that there is a statistical reason for choosing one model or method over another. This way, readers can apply their own reasoning too -- perhaps they are more concerned about false positives than false negatives, or care less about style and more about typos... By allowing a user to read the affordances and trade-offs of various models side-by-side in a table or even just by reading through the paper, they can make informed decisions without having to use function calls like a form. In its current state, the paper relies too heavily on code outside of the submission. Rather than referencing that code (for which I still have concerns about that I think are out of scope here), the paper would be the most beneficial as an archival submission if the focus was more on the heuristics / reasoning about which criteria make a model or method more or less suited to a task, rather than as it reads currently as supplemental to code outside this repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code repository itself is a supplement to an already published book (https://www.taylorfrancis.com/books/mono/10.1201/9781003264774/natural-language-processing-real-world-jyotika-singh), and my main concern is that the submission made here is really a minor addition to the main effort. Because it is not a novel contribution, I'm not sure if this proposal satisfies the requirements for SciPy proceedings.
Thank you for your revised comments @janeadams . I have now made revisions to address your comments. I have cleaned up the paper quite a bit, and removed content pointing to any portions of the tool other than the toolkit to avoid confusion and make it content more pointed and useful standalone. Additionally, I have added a lot more details on tool selection portion itself based on your comment. Thanks. @aparoha thanks for your comment. I am not certain at this point whether there are 2 reviewers on this. I also received your comment while I was in progress of making updates based on the previous review. Your comment overlaps with @janeadams early comments that I have now addressed. Thanks. |
Hi @jsingh811 , my name is Hongsup Shin, and I am a Proceedings co-chair and your paper's editor. Thank you for submitting the manuscript and making revisions based on the reviewers' comments. First of all, to be clear, we assign two reviewers per SciPy proceedings PR. And your reviewers are @janeadams and @aparoha as noted in the first comment of the PR. While I appreciate your revisions, I am afraid that the main criticism from both reviewers are still not sufficiently met. I very much agree with the comments from both reviewers @janeadams and @aparoha. The repo is still supporting material of the book authored by you, and it lacks elements to be considered as a decision making tool. I have raised my and reviewers' concerns to other chairs in Proceedings and Program committees, and members of both committees already favor rejection. At this point, we think the paper requires rewriting rather than mere revision in order to proceed. You have until Aug 7 to address the reviews. Sincerely, |
Hi @hongsupshin Looks like the major point of contention is toolkit repo’s association with the book and associated software. To address it, what I can do is make the toolkit as its own repo and create a python library installable via pip thereby isolating it completely from the book. & I can ensure the paper reflects this the same way. I would like to thank you and the reviewers either ways for taking time to provide feedback. |
@jsingh811 Hi, unfortunately I think you are still missing the main criticism. For instance, some of the major comments from @janeadams (see below) and I don't think you've addressed these yet.
|
@jsingh811 Hi, we noticed that your last commit message said "remove paper" and we just wanted to confirm whether this was your final decision. Would you be kind to verify this? |
I confirm. Thanks. |
Thanks @janeadams for reviewing the paper! |
If you are creating this PR in order to submit a draft of your paper, please name your PR with
Paper: <title>
. An editor will then add apaper
label and GitHub Actions will be run to check and build your paper.See the project readme for more information.
Editor: Hongsup Shin @hongsupshin
Reviewers: