diff --git a/01_science_technology_and_epistemology.Rmd b/01_science_technology_and_epistemology.Rmd index ac40bd2c..e5486914 100644 --- a/01_science_technology_and_epistemology.Rmd +++ b/01_science_technology_and_epistemology.Rmd @@ -46,8 +46,138 @@ And this happened because, in the process of trying to generalize these heuristi That is, in the process of finding the laws that rule those dynamic systems, lot of cases are ignored, something that doesn't happen to experienced traders. What cements the gap between theory and practice, is that finance PhDs then fail to understand how traders an correctly assess prices of financial derivatives without being familiar with a corpus of theorems that, to them, are indispensable to understand market dynamics. -In this book we will try to take you, the reader, through a journey that is more similar to the real way in which knowledge is built: an iterative, hands-on process of problem-solving that gradually builds intuitions about how things work and why, that we can later formalize. - +So, what role does science play in all this and why is it useful for technology? As we already said, it is a tool that technology has to formalize all that chaotic based knowledge discovery. +Formalization is very useful to teach and communicate some base knowledge and to build a solid foundation in order to keep expanding it. +But it is also the way we found to eliminate subjectivities regarding the understanding of our world. +Thanks to science, it is no longer important what we think about reality, but rather the experimentation and the consequent testing of the hypotheses we made. + +## What is Science? + +As a first approach, science is a method that was conceived to help humans define the principles -invariant laws- that describe the world. +And note that we said "method", because this is all it really is: a methodology which ensures that we are as objective and data-driven as possible. + +It is common to think that one does this by observation. +We observe all around us, and we look for theories to best explain the mass of facts. +But if we think about it for a moment, this way of defining scientific theories as conjectures (or models) that can make predictions, is not enough on its own. +The problem is that some bodies of knowledge more properly named pseudosciences would be considered scientific if the “Observe & Deduce” operating definition were left alone. + +A much more correct and general way to define what makes a theory truly scientific is not its ability to generate predictions or the number of cases positively confirming it, but its possibility to be falsified. +And in this way, every ‘good’ scientific theory is a prohibition: it forbids certain things to happen. +The more a theory forbids, the better it is. And this is because, if the theory is forbidding something to happend, you can desing an experiment and try to make that happen. +From true scientific theories is easy to make the following statement "If x happens, it would show demonstrably that theory y is not true". +It’s the opposite of looking for verification; you must try to show the theory is incorrect, and if you fail to do so, thereby strengthen it. + +So, science is a method we have built to conclude that hypotheses we propose are not false. +This doesn't mean they are true, and this is one of the core concepts of the scientific method. There are no dogmas in science. +Scientific propositions, claims, hypotheses or theories must be able to be tested to try to falsify it. +If there isn't a way to test their validity, then they aren't scientific propositions, and it is not a matter of science to discuss such statements. +Rather, we can debate them in the realms of philosophy and other disciplines. +Science happens in the context of a community; it makes no sense to talk about an hypothesis being false by its own, there must be a community that can validate and replicate the evidence. +Moreover, the descriptions we do about nature with science are not absolute, but just the best we can say. As the famous physicist Niels Bohr once said, when answering questions about quantum physics: "There is no quantum world. +There is only an abstract quantum physical description. It is wrong to think that the task of physics is to find out how nature is. Physics concerns what we can say about Nature." + +But nowadays things are a little bit different. +Experimentation has ceased to occupy its central role in science and, in some cases, it's totally left behind. +The pressure to produce academic "knowledge" and our multiple biases and beliefs (almost religious in some cases) are interfering with the progress of science. + +Take, for example, the physics field. A field that undoubtedly contributed enormously to the development of humanity. +But now, it seems to have started a new romance. +Some modern theoretical physicists seem to have ended their relationship with experimentation because of a crazy love for 'mathematical beauty'. +Tons and tons of papers, and zero experiments. It is somewhat worrying. And anyway, why should the laws of our complex reality be "elegant"? + +An interesting concept related to this is that of the low hanging fruits. +As the name suggests, it refers to scientific laws that are the ones with the simplest explanations, simplest equations and elegant conclusions. +Although simple, this doesn't mean they don't require a stroke of genius to come up with, don't get us wrong. +But naturally, this were the first discoveries made, making a lot of assumptions, and imposing various constraints. +In this way, it is quite natural to think of beautiful equations being derived that express laws of nature. +But this concept of mathematical beauty can sometimes be taken too far, seeking and trying to impose it to all scientific discovery when in reality it is a human construct. + +And there are fields of science apart from physics, like biology or chemistry, that are examples of fields that don't have any of these "beautiful" and extremely precise equations. +In the rest of science, we have what are called "emergent laws" that arise from very complicated systems where the individual details don't matter, but the system as a whole behave in a very specific and recognizable way. + +So, there is a clear relationship between Science, technology and Math. But why is the last one so important? + +### The importance of Math + +It is very difficult to imagine science without mathematics, especially the so-called natural sciences. +When there is a need for quantitative results, we know mathematics is the way to go. But, at least for some time in the past, math and science didn't have the relationship they have today. + +The first steps humanity made into the mathematical world were done to communicate ideas more efficiently. +If we ask you what the simplest form of mathematics that comes to your mind is, you probably are going to think about counting. And essentially there is where everything started. +To our minds, that are now accustomed to very complex ideas of all kinds, such as the internet or the economy, maybe counting doesn't sound like a very phenomenal idea. +But think about the conceptual leap our ancestors had to make in order to arrive to an abstract construct such as a number. +After all, what is a number? What does it look like? What comes to your mind when we speak about the number two? What do two cats and two roses have in common? Numbers appeared as a way to have a more efficient communication. +Steven Strogatz refers, in his book The Joy of x, to a particular episode of the Sesame Street show, '123 Count with Me'. +Although it may sound silly and childish, it makes a great metaphor of the usefulness of numbers. +In the show, one of the characters, Humphrey, is working as a waiter in a restaurant and he takes an order of some penguins. +When he calls out the order to the kitchen, he says: "Fish, fish, fish, fish, fish, fish". +At this point, another character, Ernie, teaches Humphrey about the concept of numbers. +As we make this first abstraction leap, new rules start appearing. +When using numbers to characterize a collection of objects of the same kind, operations such as addition and subtraction emerge naturally once the concept is created, almost as if they had a life of their own. + +As human culture and curiosity developed, new challenges appeared, and the relation between things and how they vary became a field of interest in mathematics and the physical sciences. +In particular, physics cares about the relationship about different magnitudes of the real life –or physical world–, and the tools to formalize these relations are variables and functions. +These are the next step in mathematical abstraction. +With variables, the conceptualization of magnitudes that change their value during a process was made evident, and although this value changed, the magnitude itself remained to be the same. +Moreover, functions were the key abstraction tool to represent how variables depend on other variables, and the concept of independent and dependent variables appears. + +For many years, the desire to understand the world and postulate physical laws motivated mathematical discovery. +This is really easy to understand with the creation of calculus by Leibnitz and Newton, one of the most important mathematical tools we have in our arsenal, which has today extended use along science, engineering, economy and many more fields. +Alongside calculus, another important and fundamental mathematical field started being conceptualized and developed. +With the stimulus of games of chance and gambling, the mathematical theory of probability was born by Pascal and Fermat. +Almost in paralell, statistics started as an applied field dealing with data from states, such as population demographics and economy, but it slowly growed and extended to the collection of any kind of data, its analysis, interpretation and the extraction of conclusions from it. +The evolution process of statistics was intimately related with the development of probability, and, with the theory of errors, ultimately all three of these mathematical disciplines, calculus, probability and statistics, played a fundamental role in hypothesis testing and scientific research. + +As systems under study started to grow in number of components and the interactions between them were taken into account, the modelling techniques that were most frequently used started to show some lacking capabilities to describe these systems. +A new conceptual approach revolutionized physics, and that is statistical mechanics. +Instead of having to know the exact state of a system conformed by a big number of elements that interact in complicated ways, statistical mechanics relies on probability and statistical methods to characterize the behaviour of the system. +This field settled a framework to think about systems in a different way. +The impossibility of knowing everything is a fact that is embraced in this line of thought and properly handled. +Many more of these fresh ideas kept developing along the years, such as complexity theory, a field that focuses on emergent properties of systems of aggregates of many parts. + +## The emergence of Data Science + +But it was not until recent years, where progress in computer technology made it possible to acquire, store and process large quantities of data, that the modelling techniques we have been discussing started to be applied in conjunction with statistical tools in areas that were outside of the overstudied academic scope. +This boosted the application of a lot of computational techniques and the creation of new ones at an exponential rate. +As a consequence of the enormous quantities of data available and the variety and diversity of the underlying generating mechanisms, a field of its own started growing. This field would deal with data in a general way, in principle, without caring too much about its nature. +This is what we call Data Science nowadays, and it aims to extract information and insights from any type of data. + +We are living in the so-called "Petabyte Age". +At this scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. +And this is so important because is expanding our possibilities to do science. +In the complex, messy domains that we already mention, particularly game-theoretic domains involving unpredictable agents such as human beings, there are no general theories that can be expressed in simple equations like $F = ma$ or $E = m c^2$. +But now we have massive amounts of available data describing those complex systems, so employing non-parametric density approximation models such as nearest-neighbors or kernel methods rather than parametric models such as low-dimensional linear regression may be the solution to gain useful insights. Theory is expanding into new forms. A new form where, if it allows to solve problems, correlation of variables is enough. + +The exponential progress that we have been seeing in computer technology is a key aspect of data science. +So much so that it is common to hear that Data Science is the conjunction between statistics and computer science and, undoubtedly, there is a lot of truth in this. +But, what is the real difference between data science and statistics? + +Well, the key difference is the (almost) pure practical point of view. +In the data science field the ultimate goal is to solve a huge diversity of problems and also creating a replicable solution. +In contrast to statistics where you can define the data that you are going to use, here the data itself became the object of study. +And, as the world its complicated, the data we encounter too. +In data science we make use of data collection, management, and presentation, to focus more on predicting future outcomes and less on merely inferring relationships. +So the motivation always will be to take action on the insights that you learn. + +Apart from the particular application of programming in this field, we believe that programming, and in general, the informatics field, helps in the democratization of knowledge and power, making possible for anyone with a computer, internet connection and a text editor to start a journey. +Unlike academia, nobody cares in the informatics world if you have a PhD. What really matters is the quality of your code and your methods. +The same is true here. +Here we will not show polished and definitive solutions, but we will attack a wide variety of topics, showing several heuristics that will serve for anyone to build solid foundations in this field that we love so much. +Here we don't care about academic degrees, only the ingenuity to engineer and solve the most challenging problems. + +## Our philosophy: The right balance between practice and theory + +Having talked about all this, it is easier for us to explain the purpose of our book. +And it is to propose interesting and complex problems, to think about how to solve them, and only to introduce theory as the resolution of the problem requires it. +Our goal is to show the hardcore academic that it doesn't hurt to take a few blind steps to attack real problems and to show the purely practical person the benefits of having a deeper understanding of modelling, probability and statistics beyond just being able to use a few libraries. + +We will be tackling a lot of different practical problems, emphasizing the interdisciplinary approach of the subject. +Physics, Economy and Sports are just some of the topics discussed in these chapters. data science can't be done without the help of a high-level programming language. +This is an essential skill, just as math and a scientific mindset. + +In this book, the programming tool we are going to use is Julia, a language engineered from its roots for scientific applications, in particular, Data Science. + +We will try to take you, the reader, through a journey that is more similar to the real way in which knowledge is built: an iterative, hands-on process of problem-solving that gradually builds intuitions about how things work and why, that we can later formalize. ## References diff --git a/01_science_technology_and_epistemology/01_science_technology_and_epistemology.jl b/01_science_technology_and_epistemology/01_science_technology_and_epistemology.jl index 9b4ca60e..507d613a 100644 --- a/01_science_technology_and_epistemology/01_science_technology_and_epistemology.jl +++ b/01_science_technology_and_epistemology/01_science_technology_and_epistemology.jl @@ -1,5 +1,5 @@ ### A Pluto.jl notebook ### -# v0.12.21 +# v0.15.1 using Markdown using InteractiveUtils @@ -8,31 +8,31 @@ using InteractiveUtils md""" ## The difference between Science and Technology -Anyone would agree that science and technology have a lot in common, but what are the essential differences between them? And how do they interact with each other? -These are fundamental questions for us, and ones that most people don't have a clear answer for. -This leads to misconceptions about key aspects on, for example, how knowledge is created and, more generally, how the world operates. +Anyone would agree that science and technology have a lot in common, but what are the essential differences between them? And how do they interact with each other? +These are fundamental questions for us, and ones that most people don't have a clear answer for. +This leads to misconceptions about key aspects on, for example, how knowledge is created and, more generally, how the world operates. -One fundamental difference between the two is that while science tries to understand the physical world in which we live, technology aims to transform it. -So, technology has a strong practical goal and science a more intellectual one. -A very frequent misunderstanding that is nowadays very much encouraged by academia is to think that first, one needs to acquire all of the theoretical and formal knowledge about a subject matter before being able to apply it and transform reality. -But in reality this is not the case. The process occurs, in most cases, in the opposite way: one lives in reality, interacts with it, takes action, makes mistakes, tries again, and in that way slowly discovers how the world works. +One fundamental difference between the two is that while science tries to understand the physical world in which we live, technology aims to transform it. +So, technology has a strong practical goal and science a more intellectual one. +A very frequent misunderstanding that is nowadays very much encouraged by academia is to think that first, one needs to acquire all of the theoretical and formal knowledge about a subject matter before being able to apply it and transform reality. +But in reality this is not the case. The process occurs, in most cases, in the opposite way: one lives in reality, interacts with it, takes action, makes mistakes, tries again, and in that way slowly discovers how the world works. And then, of course, it is useful to formalize our knowledge in order to be able to transmit that learning, lay solid foundations and to continue advancing in our understanding. ## What is technology? -Technology is the practice that allows us humans to transform reality. -We have been doing it since the very beginning of the human era, and it is really what makes us different from other animals. +Technology is the practice that allows us humans to transform reality. +We have been doing it since the very beginning of the human era, and it is really what makes us different from other animals. Technology enables us to expand our capabilities. -From taking a long, pointy stick to reach fruit high up in a tree, to transforming an entire landscape by building a huge electric dam, to the creation of artificial intelligence that helps us solve our deepest puzzles. +From taking a long, pointy stick to reach fruit high up in a tree, to transforming an entire landscape by building a huge electric dam, to the creation of artificial intelligence that helps us solve our deepest puzzles. With technology we transform our experience in the world. -It is evident that the more we know about the reality in which we live, the more we will be able to modify it. -But from this idea comes a very important question: what is the order in which innovation takes place? -Is it necessary first to have an absolute understanding of the process we are trying to modify? +It is evident that the more we know about the reality in which we live, the more we will be able to modify it. +But from this idea comes a very important question: what is the order in which innovation takes place? +Is it necessary first to have an absolute understanding of the process we are trying to modify? Or is technology created in a more chaotic process, through trial and error? -Does a child need to know the gear mechanisms of a bicycle to learn to ride it? Does Lionel Messi need to know about fluid dynamics to make the ball take a curved trajectory? Did the Romans need to know the Navier-Stokes equation in order to build their huge aqueducts? +Does a child need to know the gear mechanisms of a bicycle to learn to ride it? Does Lionel Messi need to know about fluid dynamics to make the ball take a curved trajectory? Did the Romans need to know the Navier-Stokes equation in order to build their huge aqueducts? Knowledge is often acquired through experimentation, implementation and heuristics, in a process that involves more trial and error and less theoretical knowledge than many believe. And technological innovation tends to follow the same mechanism. Many breakthroughs were made by people who were taking risks, exploring, and stumbling in the dark with a destination in mind but no clear directions to get there, with theories only taking their definitive form once they arrived to a solution and were able to look back on what they had discovered. @@ -41,39 +41,172 @@ Many people think of the innovation process as a more or less linear path that b This type of thinking is counterproductive because, being so widespread, it causes many people to be more concerned with constantly acquiring theoretical knowledge, rather than taking action and immersing themselves in practice. Fear of failure, of not getting it exactly right on the first try, also plays a role, perhaps encouraged by the way we tell stories about innovation. -By looking at the history of technology and innovation, and who writes it, we see that the people who made the discoveries are rarely the ones who write the books about them. -As [Nassim Nicholas Taleb](https://en.wikipedia.org/wiki/Nassim_Nicholas_Taleb) said in "The History Written by the Losers", the people that are doing stuff don't have time for writing. +By looking at the history of technology and innovation, and who writes it, we see that the people who made the discoveries are rarely the ones who write the books about them. +As [Nassim Nicholas Taleb](https://en.wikipedia.org/wiki/Nassim_Nicholas_Taleb) said in "The History Written by the Losers", the people that are doing stuff don't have time for writing. And perhaps because the non-practitioners are the ones who write about the findings of others, as time goes by, society ends up being convinced that there was indeed an arduous intellectual and academic work first, and then came its implementation. That –apparently– common sense in which knowledge is built from a purely intellectual work that can be done in the armchair at home, and that only after acquiring this sacred theoretical knowledge it is possible to come up with technology or innovation, is the one we need to question. -This confusion in the order in which technological advances occur is seen constantly and in the most varied areas of the history of innovation. +This confusion in the order in which technological advances occur is seen constantly and in the most varied areas of the history of innovation. Take, for example, the development of the jet engine. It really had nothing to do with the discoveries of physicists researching, but with the cleverness and practical heuristics based on trial and error that engineers had developed, although in academic books it is stated the other way around. Or in a really practical field, finances. For years, senior traders that have been deploying several heuristics to make their trades, build portfolios that are much more complex and better performing than the ones generated by the pricing formulas that academics came up with and often didn't stand the test of time. -And this happened because, in the process of trying to generalize these heuristics into formal equations, the academics are constantly introducing fragility. -That is, in the process of finding the laws that rule those dynamic systems, lot of cases are ignored, something that doesn't happen to experienced traders. +And this happened because, in the process of trying to generalize these heuristics into formal equations, the academics are constantly introducing fragility. +That is, in the process of finding the laws that rule those dynamic systems, lot of cases are ignored, something that doesn't happen to experienced traders. What cements the gap between theory and practice, is that finance PhDs then fail to understand how traders can correctly assess prices of financial derivatives without being familiar with a corpus of theorems that, to them, are indispensable to understand market dynamics. -In this book we will try to take you, the reader, through a journey that is more similar to the real way in which knowledge is built: an iterative, hands-on process of problem-solving that gradually builds intuitions about how things work and why, that we can later formalize. +So, what role does science play in all this and why is it useful for technology? As we already said, it is a tool that technology has to formalize all that chaotic based knowledge discovery. +Formalization is very useful to teach and communicate some base knowledge and to build a solid foundation in order to keep expanding it. +But it is also the way we found to eliminate subjectivities regarding the understanding of our world. +Thanks to science, it is no longer important what we think about reality, but rather the experimentation and the consequent testing of the hypotheses we made. + +## What is Science? + +As a first approach, science is a method that was conceived to help humans define the principles -invariant laws- that describe the world. +And note that we said "method", because this is all it really is: a methodology which ensures that we are as objective and data-driven as possible. + +It is common to think that one does this by observation. +We observe all around us, and we look for theories to best explain the mass of facts. +But if we think about it for a moment, this way of defining scientific theories as conjectures (or models) that can make predictions, is not enough on its own. +The problem is that some bodies of knowledge more properly named pseudosciences would be considered scientific if the “Observe & Deduce” operating definition were left alone. + +A much more correct and general way to define what makes a theory truly scientific is not its ability to generate predictions or the number of cases positively confirming it, but its possibility to be falsified. +And in this way, every ‘good’ scientific theory is a prohibition: it forbids certain things to happen. +The more a theory forbids, the better it is. And this is because, if the theory is forbidding something to happend, you can desing an experiment and try to make that happen. +From true scientific theories is easy to make the following statement "If x happens, it would show demonstrably that theory y is not true". +It’s the opposite of looking for verification; you must try to show the theory is incorrect, and if you fail to do so, thereby strengthen it. + +So, science is a method we have built to conclude that hypotheses we propose are not false. +This doesn't mean they are true, and this is one of the core concepts of the scientific method. There are no dogmas in science. +Scientific propositions, claims, hypotheses or theories must be able to be tested to try to falsify it. +If there isn't a way to test their validity, then they aren't scientific propositions, and it is not a matter of science to discuss such statements. +Rather, we can debate them in the realms of philosophy and other disciplines. +Science happens in the context of a community; it makes no sense to talk about an hypothesis being false by its own, there must be a community that can validate and replicate the evidence. +Moreover, the descriptions we do about nature with science are not absolute, but just the best we can say. As the famous physicist Niels Bohr once said, when answering questions about quantum physics: "There is no quantum world. +There is only an abstract quantum physical description. It is wrong to think that the task of physics is to find out how nature is. Physics concerns what we can say about Nature." + +But nowadays things are a little bit different. +Experimentation has ceased to occupy its central role in science and, in some cases, it's totally left behind. +The pressure to produce academic "knowledge" and our multiple biases and beliefs (almost religious in some cases) are interfering with the progress of science. + +Take, for example, the physics field. A field that undoubtedly contributed enormously to the development of humanity. +But now, it seems to have started a new romance. +Some modern theoretical physicists seem to have ended their relationship with experimentation because of a crazy love for 'mathematical beauty'. +Tons and tons of papers, and zero experiments. It is somewhat worrying. And anyway, why should the laws of our complex reality be "elegant"? + +An interesting concept related to this is that of the low hanging fruits. +As the name suggests, it refers to scientific laws that are the ones with the simplest explanations, simplest equations and elegant conclusions. +Although simple, this doesn't mean they don't require a stroke of genius to come up with, don't get us wrong. +But naturally, this were the first discoveries made, making a lot of assumptions, and imposing various constraints. +In this way, it is quite natural to think of beautiful equations being derived that express laws of nature. +But this concept of mathematical beauty can sometimes be taken too far, seeking and trying to impose it to all scientific discovery when in reality it is a human construct. + +And there are fields of science apart from physics, like biology or chemistry, that are examples of fields that don't have any of these "beautiful" and extremely precise equations. +In the rest of science, we have what are called "emergent laws" that arise from very complicated systems where the individual details don't matter, but the system as a whole behave in a very specific and recognizable way. + +So, there is a clear relationship between Science, technology and Math. But why is the last one so important? + +### The importance of Math + +It is very difficult to imagine science without mathematics, especially the so-called natural sciences. +When there is a need for quantitative results, we know mathematics is the way to go. But, at least for some time in the past, math and science didn't have the relationship they have today. + +The first steps humanity made into the mathematical world were done to communicate ideas more efficiently. +If we ask you what the simplest form of mathematics that comes to your mind is, you probably are going to think about counting. And essentially there is where everything started. +To our minds, that are now accustomed to very complex ideas of all kinds, such as the internet or the economy, maybe counting doesn't sound like a very phenomenal idea. +But think about the conceptual leap our ancestors had to make in order to arrive to an abstract construct such as a number. +After all, what is a number? What does it look like? What comes to your mind when we speak about the number two? What do two cats and two roses have in common? Numbers appeared as a way to have a more efficient communication. +Steven Strogatz refers, in his book The Joy of x, to a particular episode of the Sesame Street show, '123 Count with Me'. +Although it may sound silly and childish, it makes a great metaphor of the usefulness of numbers. +In the show, one of the characters, Humphrey, is working as a waiter in a restaurant and he takes an order of some penguins. +When he calls out the order to the kitchen, he says: "Fish, fish, fish, fish, fish, fish". +At this point, another character, Ernie, teaches Humphrey about the concept of numbers. +As we make this first abstraction leap, new rules start appearing. +When using numbers to characterize a collection of objects of the same kind, operations such as addition and subtraction emerge naturally once the concept is created, almost as if they had a life of their own. + +As human culture and curiosity developed, new challenges appeared, and the relation between things and how they vary became a field of interest in mathematics and the physical sciences. +In particular, physics cares about the relationship about different magnitudes of the real life –or physical world–, and the tools to formalize these relations are variables and functions. +These are the next step in mathematical abstraction. +With variables, the conceptualization of magnitudes that change their value during a process was made evident, and although this value changed, the magnitude itself remained to be the same. +Moreover, functions were the key abstraction tool to represent how variables depend on other variables, and the concept of independent and dependent variables appears. + +For many years, the desire to understand the world and postulate physical laws motivated mathematical discovery. +This is really easy to understand with the creation of calculus by Leibnitz and Newton, one of the most important mathematical tools we have in our arsenal, which has today extended use along science, engineering, economy and many more fields. +Alongside calculus, another important and fundamental mathematical field started being conceptualized and developed. +With the stimulus of games of chance and gambling, the mathematical theory of probability was born by Pascal and Fermat. +Almost in paralell, statistics started as an applied field dealing with data from states, such as population demographics and economy, but it slowly growed and extended to the collection of any kind of data, its analysis, interpretation and the extraction of conclusions from it. +The evolution process of statistics was intimately related with the development of probability, and, with the theory of errors, ultimately all three of these mathematical disciplines, calculus, probability and statistics, played a fundamental role in hypothesis testing and scientific research. + +As systems under study started to grow in number of components and the interactions between them were taken into account, the modelling techniques that were most frequently used started to show some lacking capabilities to describe these systems. +A new conceptual approach revolutionized physics, and that is statistical mechanics. +Instead of having to know the exact state of a system conformed by a big number of elements that interact in complicated ways, statistical mechanics relies on probability and statistical methods to characterize the behaviour of the system. +This field settled a framework to think about systems in a different way. +The impossibility of knowing everything is a fact that is embraced in this line of thought and properly handled. +Many more of these fresh ideas kept developing along the years, such as complexity theory, a field that focuses on emergent properties of systems of aggregates of many parts. + +## The emergence of Data Science + +But it was not until recent years, where progress in computer technology made it possible to acquire, store and process large quantities of data, that the modelling techniques we have been discussing started to be applied in conjunction with statistical tools in areas that were outside of the overstudied academic scope. +This boosted the application of a lot of computational techniques and the creation of new ones at an exponential rate. +As a consequence of the enormous quantities of data available and the variety and diversity of the underlying generating mechanisms, a field of its own started growing. This field would deal with data in a general way, in principle, without caring too much about its nature. +This is what we call Data Science nowadays, and it aims to extract information and insights from any type of data. + +We are living in the so-called "Petabyte Age". +At this scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. +And this is so important because is expanding our possibilities to do science. +In the complex, messy domains that we already mention, particularly game-theoretic domains involving unpredictable agents such as human beings, there are no general theories that can be expressed in simple equations like $F = ma$ or $E = m c^2$. +But now we have massive amounts of available data describing those complex systems, so employing non-parametric density approximation models such as nearest-neighbors or kernel methods rather than parametric models such as low-dimensional linear regression may be the solution to gain useful insights. Theory is expanding into new forms. A new form where, if it allows to solve problems, correlation of variables is enough. + +The exponential progress that we have been seeing in computer technology is a key aspect of data science. +So much so that it is common to hear that Data Science is the conjunction between statistics and computer science and, undoubtedly, there is a lot of truth in this. +But, what is the real difference between data science and statistics? + +Well, the key difference is the (almost) pure practical point of view. +In the data science field the ultimate goal is to solve a huge diversity of problems and also creating a replicable solution. +In contrast to statistics where you can define the data that you are going to use, here the data itself became the object of study. +And, as the world its complicated, the data we encounter too. +In data science we make use of data collection, management, and presentation, to focus more on predicting future outcomes and less on merely inferring relationships. +So the motivation always will be to take action on the insights that you learn. + +Apart from the particular application of programming in this field, we believe that programming, and in general, the informatics field, helps in the democratization of knowledge and power, making possible for anyone with a computer, internet connection and a text editor to start a journey. +Unlike academia, nobody cares in the informatics world if you have a PhD. What really matters is the quality of your code and your methods. +The same is true here. +Here we will not show polished and definitive solutions, but we will attack a wide variety of topics, showing several heuristics that will serve for anyone to build solid foundations in this field that we love so much. +Here we don't care about academic degrees, only the ingenuity to engineer and solve the most challenging problems. + +## Our philosophy: The right balance between practice and theory + +Having talked about all this, it is easier for us to explain the purpose of our book. +And it is to propose interesting and complex problems, to think about how to solve them, and only to introduce theory as the resolution of the problem requires it. +Our goal is to show the hardcore academic that it doesn't hurt to take a few blind steps to attack real problems and to show the purely practical person the benefits of having a deeper understanding of modelling, probability and statistics beyond just being able to use a few libraries. + +We will be tackling a lot of different practical problems, emphasizing the interdisciplinary approach of the subject. +Physics, Economy and Sports are just some of the topics discussed in these chapters. data science can't be done without the help of a high-level programming language. +This is an essential skill, just as math and a scientific mindset. +In this book, the programming tool we are going to use is Julia, a language engineered from its roots for scientific applications, in particular, Data Science. + +We will try to take you, the reader, through a journey that is more similar to the real way in which knowledge is built: an iterative, hands-on process of problem-solving that gradually builds intuitions about how things work and why, that we can later formalize. + +But enough talk, let's get our hands dirty. + """ # ╔═╡ 1c60486a-5c2d-11eb-385e-417e8c4b7ddb md"""### References -- Antifragile: Things That Gain from Disorder, ch 15 - Nassim Nicholas Taleb -- Infinite Powers: How Calculus Reveals the Secrets of the Universe - Steven Strogatz -- Lost in Math: How Beauty Leads Physics Astray - Sabine Hossenfelder +- [Antifragile: Things That Gain from Disorder, ch 15 - Nassim Nicholas Taleb]() +- [Infinite Powers: How Calculus Reveals the Secrets of the Universe - Steven Strogatz]() +- [Lost in Math: How Beauty Leads Physics Astray - Sabine Hossenfelde]() - [The Joy of X](https://www.amazon.com/Joy-Guided-Tour-Math-Infinity/dp/0544105850) - [Kolmogorov - Mathematics: It's contents, method and meaning](https://www.amazon.com/Mathematics-Content-Methods-Meaning-Volumes/dp/0486409163) - [Freeman Dyson - Where Do the Laws of Nature Come From?](https://youtu.be/wxRpa-PqUfw) - [Roger Penrose - Is Mathematics Invented or Discovered?](https://youtu.be/ujvS2K06dg4) - [How to tell science from pseudoscience](https://youtu.be/o9ylQC5bPpU) - [Sabine Hossenfelder - Why the ‘Unreasonable Effectiveness’ of Mathematics?](https://youtu.be/QUWbe5KGaQY) -- http://www.paulgraham.com/hp.html -- https://en.wikipedia.org/wiki/Apophatic_theology -- https://en.wikipedia.org/wiki/Falsifiability -- https://norvig.com/fact-check.html -- https://www.wired.com/2008/06/pb-theory/ -- https://fs.blog/2016/01/karl-popper-on-science-pseudoscience/ +- [http://www.paulgraham.com/hp.html]() +- [https://en.wikipedia.org/wiki/Apophatic_theology]() +- [https://en.wikipedia.org/wiki/Falsifiability]() +- [https://norvig.com/fact-check.html]() +- [https://www.wired.com/2008/06/pb-theory/]() +- [https://fs.blog/2016/01/karl-popper-on-science-pseudoscience/]() """ diff --git a/docs/science-technology-and-epistemology.html b/docs/science-technology-and-epistemology.html index 43c924e6..799309f2 100644 --- a/docs/science-technology-and-epistemology.html +++ b/docs/science-technology-and-epistemology.html @@ -6,35 +6,32 @@ Chapter 1 Science technology and epistemology | Data Science in Julia for Hackers - + - - + + - + - + - - - + - - + + - - + @@ -51,72 +48,12 @@ + + + + - @@ -131,157 +68,140 @@ @@ -299,9 +219,9 @@

-
+

Chapter 1 Science technology and epistemology

-
+

1.1 The difference between Science and Technology

Anyone would agree that science and technology have a lot in common, but what are the essential differences between them? And how do they interact with each other? These are fundamental questions for us, and ones that most people don’t have a clear answer for. @@ -312,7 +232,7 @@

1.1 The difference between Scienc But in reality this is not the case. The process occurs, in most cases, in the opposite way: one lives in reality, interacts with it, takes action, makes mistakes, tries again, and in that way slowly discovers how the world works. And then, of course, it is useful to formalize our knowledge in order to be able to transmit that learning, lay solid foundations and to continue advancing in our understanding.

-
+

1.2 What is technology?

Technology is the practice that allows us humans to transform reality. We have been doing it since the very beginning of the human era, and it is really what makes us different from other animals.

@@ -331,7 +251,7 @@

1.2 What is technology?

This type of thinking is counterproductive because, being so widespread, it causes many people to be more concerned with constantly acquiring theoretical knowledge, rather than taking action and immersing themselves in practice. Fear of failure, of not getting it exactly right on the first try, also plays a role, perhaps encouraged by the way we tell stories about innovation.

By looking at the history of technology and innovation, and who writes it, we see that the people who made the discoveries are rarely the ones who write the books about them. -As Nassim Nicholas Taleb said in “The History Written by the Losers,” the people that are doing stuff don’t have time for writing. +As Nassim Nicholas Taleb said in “The History Written by the Losers”, the people that are doing stuff don’t have time for writing. And perhaps because the non-practitioners are the ones who write about the findings of others, as time goes by, society ends up being convinced that there was indeed an arduous intellectual and academic work first, and then came its implementation. That –apparently– common sense in which knowledge is built from a purely intellectual work that can be done in the armchair at home, and that only after acquiring this sacred theoretical knowledge it is possible to come up with technology or innovation, is the one we need to question.

This confusion in the order in which technological advances occur is seen constantly and in the most varied areas of the history of innovation. @@ -339,10 +259,122 @@

1.2 What is technology?

And this happened because, in the process of trying to generalize these heuristics into formal equations, the academics are constantly introducing fragility. That is, in the process of finding the laws that rule those dynamic systems, lot of cases are ignored, something that doesn’t happen to experienced traders. What cements the gap between theory and practice, is that finance PhDs then fail to understand how traders an correctly assess prices of financial derivatives without being familiar with a corpus of theorems that, to them, are indispensable to understand market dynamics.

-

In this book we will try to take you, the reader, through a journey that is more similar to the real way in which knowledge is built: an iterative, hands-on process of problem-solving that gradually builds intuitions about how things work and why, that we can later formalize.

+

So, what role does science play in all this and why is it useful for technology? As we already said, it is a tool that technology has to formalize all that chaotic based knowledge discovery. +Formalization is very useful to teach and communicate some base knowledge and to build a solid foundation in order to keep expanding it. +But it is also the way we found to eliminate subjectivities regarding the understanding of our world. +Thanks to science, it is no longer important what we think about reality, but rather the experimentation and the consequent testing of the hypotheses we made.

+
+
+

1.3 What is Science?

+

As a first approach, science is a method that was conceived to help humans define the principles -invariant laws- that describe the world. +And note that we said “method”, because this is all it really is: a methodology which ensures that we are as objective and data-driven as possible.

+

It is common to think that one does this by observation. +We observe all around us, and we look for theories to best explain the mass of facts. +But if we think about it for a moment, this way of defining scientific theories as conjectures (or models) that can make predictions, is not enough on its own. +The problem is that some bodies of knowledge more properly named pseudosciences would be considered scientific if the “Observe & Deduce” operating definition were left alone.

+

A much more correct and general way to define what makes a theory truly scientific is not its ability to generate predictions or the number of cases positively confirming it, but its possibility to be falsified. +And in this way, every ‘good’ scientific theory is a prohibition: it forbids certain things to happen. +The more a theory forbids, the better it is. And this is because, if the theory is forbidding something to happend, you can desing an experiment and try to make that happen. +From true scientific theories is easy to make the following statement “If x happens, it would show demonstrably that theory y is not true”. +It’s the opposite of looking for verification; you must try to show the theory is incorrect, and if you fail to do so, thereby strengthen it.

+

So, science is a method we have built to conclude that hypotheses we propose are not false. +This doesn’t mean they are true, and this is one of the core concepts of the scientific method. There are no dogmas in science. +Scientific propositions, claims, hypotheses or theories must be able to be tested to try to falsify it. +If there isn’t a way to test their validity, then they aren’t scientific propositions, and it is not a matter of science to discuss such statements. +Rather, we can debate them in the realms of philosophy and other disciplines. +Science happens in the context of a community; it makes no sense to talk about an hypothesis being false by its own, there must be a community that can validate and replicate the evidence. +Moreover, the descriptions we do about nature with science are not absolute, but just the best we can say. As the famous physicist Niels Bohr once said, when answering questions about quantum physics: “There is no quantum world. +There is only an abstract quantum physical description. It is wrong to think that the task of physics is to find out how nature is. Physics concerns what we can say about Nature.”

+

But nowadays things are a little bit different. +Experimentation has ceased to occupy its central role in science and, in some cases, it’s totally left behind. +The pressure to produce academic “knowledge” and our multiple biases and beliefs (almost religious in some cases) are interfering with the progress of science.

+

Take, for example, the physics field. A field that undoubtedly contributed enormously to the development of humanity. +But now, it seems to have started a new romance. +Some modern theoretical physicists seem to have ended their relationship with experimentation because of a crazy love for ‘mathematical beauty’. +Tons and tons of papers, and zero experiments. It is somewhat worrying. And anyway, why should the laws of our complex reality be “elegant”?

+

An interesting concept related to this is that of the low hanging fruits. +As the name suggests, it refers to scientific laws that are the ones with the simplest explanations, simplest equations and elegant conclusions. +Although simple, this doesn’t mean they don’t require a stroke of genius to come up with, don’t get us wrong. +But naturally, this were the first discoveries made, making a lot of assumptions, and imposing various constraints. +In this way, it is quite natural to think of beautiful equations being derived that express laws of nature. +But this concept of mathematical beauty can sometimes be taken too far, seeking and trying to impose it to all scientific discovery when in reality it is a human construct.

+

And there are fields of science apart from physics, like biology or chemistry, that are examples of fields that don’t have any of these “beautiful” and extremely precise equations. +In the rest of science, we have what are called “emergent laws” that arise from very complicated systems where the individual details don’t matter, but the system as a whole behave in a very specific and recognizable way.

+

So, there is a clear relationship between Science, technology and Math. But why is the last one so important?

+
+

1.3.1 The importance of Math

+

It is very difficult to imagine science without mathematics, especially the so-called natural sciences. +When there is a need for quantitative results, we know mathematics is the way to go. But, at least for some time in the past, math and science didn’t have the relationship they have today.

+

The first steps humanity made into the mathematical world were done to communicate ideas more efficiently. +If we ask you what the simplest form of mathematics that comes to your mind is, you probably are going to think about counting. And essentially there is where everything started. +To our minds, that are now accustomed to very complex ideas of all kinds, such as the internet or the economy, maybe counting doesn’t sound like a very phenomenal idea. +But think about the conceptual leap our ancestors had to make in order to arrive to an abstract construct such as a number. +After all, what is a number? What does it look like? What comes to your mind when we speak about the number two? What do two cats and two roses have in common? Numbers appeared as a way to have a more efficient communication. +Steven Strogatz refers, in his book The Joy of x, to a particular episode of the Sesame Street show, ‘123 Count with Me’. +Although it may sound silly and childish, it makes a great metaphor of the usefulness of numbers. +In the show, one of the characters, Humphrey, is working as a waiter in a restaurant and he takes an order of some penguins. +When he calls out the order to the kitchen, he says: “Fish, fish, fish, fish, fish, fish”. +At this point, another character, Ernie, teaches Humphrey about the concept of numbers. +As we make this first abstraction leap, new rules start appearing. +When using numbers to characterize a collection of objects of the same kind, operations such as addition and subtraction emerge naturally once the concept is created, almost as if they had a life of their own.

+

As human culture and curiosity developed, new challenges appeared, and the relation between things and how they vary became a field of interest in mathematics and the physical sciences. +In particular, physics cares about the relationship about different magnitudes of the real life –or physical world–, and the tools to formalize these relations are variables and functions. +These are the next step in mathematical abstraction. +With variables, the conceptualization of magnitudes that change their value during a process was made evident, and although this value changed, the magnitude itself remained to be the same. +Moreover, functions were the key abstraction tool to represent how variables depend on other variables, and the concept of independent and dependent variables appears.

+

For many years, the desire to understand the world and postulate physical laws motivated mathematical discovery. +This is really easy to understand with the creation of calculus by Leibnitz and Newton, one of the most important mathematical tools we have in our arsenal, which has today extended use along science, engineering, economy and many more fields. +Alongside calculus, another important and fundamental mathematical field started being conceptualized and developed. +With the stimulus of games of chance and gambling, the mathematical theory of probability was born by Pascal and Fermat. +Almost in paralell, statistics started as an applied field dealing with data from states, such as population demographics and economy, but it slowly growed and extended to the collection of any kind of data, its analysis, interpretation and the extraction of conclusions from it. +The evolution process of statistics was intimately related with the development of probability, and, with the theory of errors, ultimately all three of these mathematical disciplines, calculus, probability and statistics, played a fundamental role in hypothesis testing and scientific research.

+

As systems under study started to grow in number of components and the interactions between them were taken into account, the modelling techniques that were most frequently used started to show some lacking capabilities to describe these systems. +A new conceptual approach revolutionized physics, and that is statistical mechanics. +Instead of having to know the exact state of a system conformed by a big number of elements that interact in complicated ways, statistical mechanics relies on probability and statistical methods to characterize the behaviour of the system. +This field settled a framework to think about systems in a different way. +The impossibility of knowing everything is a fact that is embraced in this line of thought and properly handled. +Many more of these fresh ideas kept developing along the years, such as complexity theory, a field that focuses on emergent properties of systems of aggregates of many parts.

-
-

1.3 References

+
+
+

1.4 The emergence of Data Science

+

But it was not until recent years, where progress in computer technology made it possible to acquire, store and process large quantities of data, that the modelling techniques we have been discussing started to be applied in conjunction with statistical tools in areas that were outside of the overstudied academic scope. +This boosted the application of a lot of computational techniques and the creation of new ones at an exponential rate. +As a consequence of the enormous quantities of data available and the variety and diversity of the underlying generating mechanisms, a field of its own started growing. This field would deal with data in a general way, in principle, without caring too much about its nature. +This is what we call Data Science nowadays, and it aims to extract information and insights from any type of data.

+

We are living in the so-called “Petabyte Age”. +At this scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. +And this is so important because is expanding our possibilities to do science. +In the complex, messy domains that we already mention, particularly game-theoretic domains involving unpredictable agents such as human beings, there are no general theories that can be expressed in simple equations like \(F = ma\) or \(E = m c^2\). +But now we have massive amounts of available data describing those complex systems, so employing non-parametric density approximation models such as nearest-neighbors or kernel methods rather than parametric models such as low-dimensional linear regression may be the solution to gain useful insights. Theory is expanding into new forms. A new form where, if it allows to solve problems, correlation of variables is enough.

+

The exponential progress that we have been seeing in computer technology is a key aspect of data science. +So much so that it is common to hear that Data Science is the conjunction between statistics and computer science and, undoubtedly, there is a lot of truth in this. +But, what is the real difference between data science and statistics?

+

Well, the key difference is the (almost) pure practical point of view. +In the data science field the ultimate goal is to solve a huge diversity of problems and also creating a replicable solution. +In contrast to statistics where you can define the data that you are going to use, here the data itself became the object of study. +And, as the world its complicated, the data we encounter too. +In data science we make use of data collection, management, and presentation, to focus more on predicting future outcomes and less on merely inferring relationships. +So the motivation always will be to take action on the insights that you learn.

+

Apart from the particular application of programming in this field, we believe that programming, and in general, the informatics field, helps in the democratization of knowledge and power, making possible for anyone with a computer, internet connection and a text editor to start a journey. +Unlike academia, nobody cares in the informatics world if you have a PhD. What really matters is the quality of your code and your methods. +The same is true here. +Here we will not show polished and definitive solutions, but we will attack a wide variety of topics, showing several heuristics that will serve for anyone to build solid foundations in this field that we love so much. +Here we don’t care about academic degrees, only the ingenuity to engineer and solve the most challenging problems.

+
+
+

1.5 Our philosophy: The right balance between practice and theory

+

Having talked about all this, it is easier for us to explain the purpose of our book. +And it is to propose interesting and complex problems, to think about how to solve them, and only to introduce theory as the resolution of the problem requires it. +Our goal is to show the hardcore academic that it doesn’t hurt to take a few blind steps to attack real problems and to show the purely practical person the benefits of having a deeper understanding of modelling, probability and statistics beyond just being able to use a few libraries.

+

We will be tackling a lot of different practical problems, emphasizing the interdisciplinary approach of the subject. +Physics, Economy and Sports are just some of the topics discussed in these chapters. data science can’t be done without the help of a high-level programming language. +This is an essential skill, just as math and a scientific mindset.

+

In this book, the programming tool we are going to use is Julia, a language engineered from its roots for scientific applications, in particular, Data Science.

+

We will try to take you, the reader, through a journey that is more similar to the real way in which knowledge is built: an iterative, hands-on process of problem-solving that gradually builds intuitions about how things work and why, that we can later formalize.

+
+
+

1.6 References