You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, great that you like Uncle Bob. Some Data Scientists in London are also thinking about a manifesto for Data Science - with special focus on Agile methodologies, which to be frank, are completely lacking in most Data Science set ups.
Anyway, so I have some feedback on your v0.0.1. I guess my first comment and most important comment is that as your manifesto currently stands its very long, especially if you compare with the agilemanifesto.org. The first step I think is to put some effort into the form, the language, the vocabulary, etc to condense it into the absolute minimum text. Consider the Agile principle "Simplicity--the art of maximizing the amount of work not done--is essential." The manifesto itself should be a simple and terse as possible.
Let's go one by one, I've done them in roughly the order I like them, so highest means I like them more
"Data Science Ultimately Aims to Automate Micro-Decisions" brilliant, concise, to the point, true. The explanatory paragraph is unnecessary waffle in my opinion best saved for a blog or something.
This has some nice points
"This means that all results can be reproduced and validated in a transparent manner" great.
But do you need "Not in a sense, that the results are used or are even useful for science. It is science because the way of work is identical to the way of work in science." this lacks significant content.
"There is no room for magic or ad-hoc methods that lack solid foundation. " stating the obvious?
"but also reasonable and comprehensible coherences in the given context" not sure coherences is the correct word here, you might want to try to find an English student to help you.
"Data Science is About Statistics, Not About Algorithms or Tools"
Not really statistics either. Data Science includes many branches of mathematics that are not taught under statistics, like Probability, Information Theory, Complexity Theory, etc. In fact in many cases Data Science replaces traditional statistical methods.
Nevertheless I agree with the message I think you are trying to communicate, but I would recommend using, say "mathematics" or "uncertain reasoning" (or "nonmonotonic reasoning" if you want to be academic about it). You need something more general than "statistics".
"Data Science is Part of the Production Process"
Totally agree, but the follow paragraph doesn't capture what that means to me. To be honest I don't understand it. It feels like a lot of words and very little meaning.
To be honest I don't understand what this principle is trying to say.
"Measurable value implies that the impact of the data science effort is immediate" not really ... no measurable value is technically immediate due to the speed of light ... of course it's usually much slower than the speed of light. "Measurable value" is not defined by speed of impact.
---- Data Scientist ----
Content is totally spot on especially "The Data Scientist is a software developer and an operator. He stays in steady contact with all departments. As he writes code which is part of the production system and therefore need to fulfill the same quality criteria as all other software in the organization he actively works together with the software developers. "
But you need to work on getting it down from 4 long sentences, to 1 long sentence or 2 short sentences.
This one could be joint favourite with 4.. But particularly if focus is shifted toward the "maintainability and operational complexity". Worrying about computational resources is actually quite silly given how cheap these are. If a model costs 400$ more a month to run on cloud computing, but saves a day or two of maintenance time, then it's a good idea because it probably costs the company more to pay and support a Data Scientist for 2 days. Furthermore that Data Scientist can then add value somewhere else. In business, human resources are quite illiquid, but computers on the cloud are completely liquid these days and getting cheaper by the day.
Don't use the word "Honest" - stick with "Transparent". Don't imply people lie, imply people need to be transparent.
Although true not sure if it deserves being in the manifesto. Rather I think your manifesto is lacking in any principle to do with evaluation of models in terms of business value. This is currently a big problem with Data Science - Data Scientists "sell" there model by reporting things like AUC, which to a business stakeholder has absolutely no meaning and may not have any business value. Data Scientists need to evaluate there models using business focussed quantities. E.g. in advertising we wish to maximize CTR, or minimise CPM. In trading it's all about expected profit. In recommenders and online gaming it's about keeping people online and on the same website.
Pointless - this is ipso facto.
The text was updated successfully, but these errors were encountered:
Oh, one more thing. When you read my comments they probably sound quite negative. Please don't interpret it like I do not agree in general or that I don't think it's a good idea. I do think you have many good ideas, that's the whole reason why I spent this time to give me feedback :) Hope you understand :)
No worries, in the agile spirit it is a "minimum viable document" and I don't see this as "my" document ;-)
Thank you very much for this great feedback, I hope I find time in the near future to incorporate your suggestions...
Hey, great that you like Uncle Bob. Some Data Scientists in London are also thinking about a manifesto for Data Science - with special focus on Agile methodologies, which to be frank, are completely lacking in most Data Science set ups.
Anyway, so I have some feedback on your v0.0.1. I guess my first comment and most important comment is that as your manifesto currently stands its very long, especially if you compare with the agilemanifesto.org. The first step I think is to put some effort into the form, the language, the vocabulary, etc to condense it into the absolute minimum text. Consider the Agile principle "Simplicity--the art of maximizing the amount of work not done--is essential." The manifesto itself should be a simple and terse as possible.
Let's go one by one, I've done them in roughly the order I like them, so highest means I like them more
"Data Science Ultimately Aims to Automate Micro-Decisions" brilliant, concise, to the point, true. The explanatory paragraph is unnecessary waffle in my opinion best saved for a blog or something.
This has some nice points
"This means that all results can be reproduced and validated in a transparent manner" great.
But do you need "Not in a sense, that the results are used or are even useful for science. It is science because the way of work is identical to the way of work in science." this lacks significant content.
"There is no room for magic or ad-hoc methods that lack solid foundation. " stating the obvious?
"but also reasonable and comprehensible coherences in the given context" not sure coherences is the correct word here, you might want to try to find an English student to help you.
Not really statistics either. Data Science includes many branches of mathematics that are not taught under statistics, like Probability, Information Theory, Complexity Theory, etc. In fact in many cases Data Science replaces traditional statistical methods.
Nevertheless I agree with the message I think you are trying to communicate, but I would recommend using, say "mathematics" or "uncertain reasoning" (or "nonmonotonic reasoning" if you want to be academic about it). You need something more general than "statistics".
Totally agree, but the follow paragraph doesn't capture what that means to me. To be honest I don't understand it. It feels like a lot of words and very little meaning.
"Measurable value implies that the impact of the data science effort is immediate" not really ... no measurable value is technically immediate due to the speed of light ... of course it's usually much slower than the speed of light. "Measurable value" is not defined by speed of impact.
---- Data Scientist ----
But you need to work on getting it down from 4 long sentences, to 1 long sentence or 2 short sentences.
This one could be joint favourite with 4.. But particularly if focus is shifted toward the "maintainability and operational complexity". Worrying about computational resources is actually quite silly given how cheap these are. If a model costs 400$ more a month to run on cloud computing, but saves a day or two of maintenance time, then it's a good idea because it probably costs the company more to pay and support a Data Scientist for 2 days. Furthermore that Data Scientist can then add value somewhere else. In business, human resources are quite illiquid, but computers on the cloud are completely liquid these days and getting cheaper by the day.
Don't use the word "Honest" - stick with "Transparent". Don't imply people lie, imply people need to be transparent.
Although true not sure if it deserves being in the manifesto. Rather I think your manifesto is lacking in any principle to do with evaluation of models in terms of business value. This is currently a big problem with Data Science - Data Scientists "sell" there model by reporting things like AUC, which to a business stakeholder has absolutely no meaning and may not have any business value. Data Scientists need to evaluate there models using business focussed quantities. E.g. in advertising we wish to maximize CTR, or minimise CPM. In trading it's all about expected profit. In recommenders and online gaming it's about keeping people online and on the same website.
Pointless - this is ipso facto.
The text was updated successfully, but these errors were encountered: