Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback #1

Open
samthebest opened this issue Oct 18, 2015 · 3 comments
Open

Feedback #1

samthebest opened this issue Oct 18, 2015 · 3 comments

Comments

@samthebest
Copy link

Hey, great that you like Uncle Bob. Some Data Scientists in London are also thinking about a manifesto for Data Science - with special focus on Agile methodologies, which to be frank, are completely lacking in most Data Science set ups.

Anyway, so I have some feedback on your v0.0.1. I guess my first comment and most important comment is that as your manifesto currently stands its very long, especially if you compare with the agilemanifesto.org. The first step I think is to put some effort into the form, the language, the vocabulary, etc to condense it into the absolute minimum text. Consider the Agile principle "Simplicity--the art of maximizing the amount of work not done--is essential." The manifesto itself should be a simple and terse as possible.

Let's go one by one, I've done them in roughly the order I like them, so highest means I like them more

  1. "Data Science Ultimately Aims to Automate Micro-Decisions" brilliant, concise, to the point, true. The explanatory paragraph is unnecessary waffle in my opinion best saved for a blog or something.

  2. This has some nice points

"This means that all results can be reproduced and validated in a transparent manner" great.

But do you need "Not in a sense, that the results are used or are even useful for science. It is science because the way of work is identical to the way of work in science." this lacks significant content.

"There is no room for magic or ad-hoc methods that lack solid foundation. " stating the obvious?

"but also reasonable and comprehensible coherences in the given context" not sure coherences is the correct word here, you might want to try to find an English student to help you.

  1. "Data Science is About Statistics, Not About Algorithms or Tools"

Not really statistics either. Data Science includes many branches of mathematics that are not taught under statistics, like Probability, Information Theory, Complexity Theory, etc. In fact in many cases Data Science replaces traditional statistical methods.

Nevertheless I agree with the message I think you are trying to communicate, but I would recommend using, say "mathematics" or "uncertain reasoning" (or "nonmonotonic reasoning" if you want to be academic about it). You need something more general than "statistics".

  1. "Data Science is Part of the Production Process"

Totally agree, but the follow paragraph doesn't capture what that means to me. To be honest I don't understand it. It feels like a lot of words and very little meaning.

  1. To be honest I don't understand what this principle is trying to say.

"Measurable value implies that the impact of the data science effort is immediate" not really ... no measurable value is technically immediate due to the speed of light ... of course it's usually much slower than the speed of light. "Measurable value" is not defined by speed of impact.

---- Data Scientist ----

  1. Content is totally spot on especially "The Data Scientist is a software developer and an operator. He stays in steady contact with all departments. As he writes code which is part of the production system and therefore need to fulfill the same quality criteria as all other software in the organization he actively works together with the software developers. "

But you need to work on getting it down from 4 long sentences, to 1 long sentence or 2 short sentences.

  1. This one could be joint favourite with 4.. But particularly if focus is shifted toward the "maintainability and operational complexity". Worrying about computational resources is actually quite silly given how cheap these are. If a model costs 400$ more a month to run on cloud computing, but saves a day or two of maintenance time, then it's a good idea because it probably costs the company more to pay and support a Data Scientist for 2 days. Furthermore that Data Scientist can then add value somewhere else. In business, human resources are quite illiquid, but computers on the cloud are completely liquid these days and getting cheaper by the day.

  2. Don't use the word "Honest" - stick with "Transparent". Don't imply people lie, imply people need to be transparent.

  3. Although true not sure if it deserves being in the manifesto. Rather I think your manifesto is lacking in any principle to do with evaluation of models in terms of business value. This is currently a big problem with Data Science - Data Scientists "sell" there model by reporting things like AUC, which to a business stakeholder has absolutely no meaning and may not have any business value. Data Scientists need to evaluate there models using business focussed quantities. E.g. in advertising we wish to maximize CTR, or minimise CPM. In trading it's all about expected profit. In recommenders and online gaming it's about keeping people online and on the same website.

  4. Pointless - this is ipso facto.

@samthebest
Copy link
Author

Oh, one more thing. When you read my comments they probably sound quite negative. Please don't interpret it like I do not agree in general or that I don't think it's a good idea. I do think you have many good ideas, that's the whole reason why I spent this time to give me feedback :) Hope you understand :)

@StephanErb
Copy link

Great feedback 👍

@sebastianneubauer
Copy link
Owner

No worries, in the agile spirit it is a "minimum viable document" and I don't see this as "my" document ;-)
Thank you very much for this great feedback, I hope I find time in the near future to incorporate your suggestions...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants