Skip to content

Chapter 1 second part #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 132 additions & 2 deletions 01_science_technology_and_epistemology.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,138 @@ And this happened because, in the process of trying to generalize these heuristi
That is, in the process of finding the laws that rule those dynamic systems, lot of cases are ignored, something that doesn't happen to experienced traders.
What cements the gap between theory and practice, is that finance PhDs then fail to understand how traders an correctly assess prices of financial derivatives without being familiar with a corpus of theorems that, to them, are indispensable to understand market dynamics.

In this book we will try to take you, the reader, through a journey that is more similar to the real way in which knowledge is built: an iterative, hands-on process of problem-solving that gradually builds intuitions about how things work and why, that we can later formalize.

So, what role does science play in all this and why is it useful for technology? As we already said, it is a tool that technology has to formalize all that chaotic based knowledge discovery.
Formalization is very useful to teach and communicate some base knowledge and to build a solid foundation in order to keep expanding it.
But it is also the way we found to eliminate subjectivities regarding the understanding of our world.
Thanks to science, it is no longer important what we think about reality, but rather the experimentation and the consequent testing of the hypotheses we made.

## What is Science?

As a first approach, science is a method that was conceived to help humans define the principles -invariant laws- that describe the world.
And note that we said "method", because this is all it really is: a methodology which ensures that we are as objective and data-driven as possible.

It is common to think that one does this by observation.
We observe all around us, and we look for theories to best explain the mass of facts.
But if we think about it for a moment, this way of defining scientific theories as conjectures (or models) that can make predictions, is not enough on its own.
The problem is that some bodies of knowledge more properly named pseudosciences would be considered scientific if the “Observe & Deduce” operating definition were left alone.

A much more correct and general way to define what makes a theory truly scientific is not its ability to generate predictions or the number of cases positively confirming it, but its possibility to be falsified.
And in this way, every ‘good’ scientific theory is a prohibition: it forbids certain things to happen.
The more a theory forbids, the better it is. And this is because, if the theory is forbidding something to happend, you can desing an experiment and try to make that happen.
From true scientific theories is easy to make the following statement "If x happens, it would show demonstrably that theory y is not true".
It’s the opposite of looking for verification; you must try to show the theory is incorrect, and if you fail to do so, thereby strengthen it.

So, science is a method we have built to conclude that hypotheses we propose are not false.
This doesn't mean they are true, and this is one of the core concepts of the scientific method. There are no dogmas in science.
Scientific propositions, claims, hypotheses or theories must be able to be tested to try to falsify it.
If there isn't a way to test their validity, then they aren't scientific propositions, and it is not a matter of science to discuss such statements.
Rather, we can debate them in the realms of philosophy and other disciplines.
Science happens in the context of a community; it makes no sense to talk about an hypothesis being false by its own, there must be a community that can validate and replicate the evidence.
Moreover, the descriptions we do about nature with science are not absolute, but just the best we can say. As the famous physicist Niels Bohr once said, when answering questions about quantum physics: "There is no quantum world.
There is only an abstract quantum physical description. It is wrong to think that the task of physics is to find out how nature is. Physics concerns what we can say about Nature."

But nowadays things are a little bit different.
Experimentation has ceased to occupy its central role in science and, in some cases, it's totally left behind.
The pressure to produce academic "knowledge" and our multiple biases and beliefs (almost religious in some cases) are interfering with the progress of science.

Take, for example, the physics field. A field that undoubtedly contributed enormously to the development of humanity.
But now, it seems to have started a new romance.
Some modern theoretical physicists seem to have ended their relationship with experimentation because of a crazy love for 'mathematical beauty'.
Tons and tons of papers, and zero experiments. It is somewhat worrying. And anyway, why should the laws of our complex reality be "elegant"?

An interesting concept related to this is that of the low hanging fruits.
As the name suggests, it refers to scientific laws that are the ones with the simplest explanations, simplest equations and elegant conclusions.
Although simple, this doesn't mean they don't require a stroke of genius to come up with, don't get us wrong.
But naturally, this were the first discoveries made, making a lot of assumptions, and imposing various constraints.
In this way, it is quite natural to think of beautiful equations being derived that express laws of nature.
But this concept of mathematical beauty can sometimes be taken too far, seeking and trying to impose it to all scientific discovery when in reality it is a human construct.

And there are fields of science apart from physics, like biology or chemistry, that are examples of fields that don't have any of these "beautiful" and extremely precise equations.
In the rest of science, we have what are called "emergent laws" that arise from very complicated systems where the individual details don't matter, but the system as a whole behave in a very specific and recognizable way.

So, there is a clear relationship between Science, technology and Math. But why is the last one so important?

### The importance of Math

It is very difficult to imagine science without mathematics, especially the so-called natural sciences.
When there is a need for quantitative results, we know mathematics is the way to go. But, at least for some time in the past, math and science didn't have the relationship they have today.

The first steps humanity made into the mathematical world were done to communicate ideas more efficiently.
If we ask you what the simplest form of mathematics that comes to your mind is, you probably are going to think about counting. And essentially there is where everything started.
To our minds, that are now accustomed to very complex ideas of all kinds, such as the internet or the economy, maybe counting doesn't sound like a very phenomenal idea.
But think about the conceptual leap our ancestors had to make in order to arrive to an abstract construct such as a number.
After all, what is a number? What does it look like? What comes to your mind when we speak about the number two? What do two cats and two roses have in common? Numbers appeared as a way to have a more efficient communication.
Steven Strogatz refers, in his book The Joy of x, to a particular episode of the Sesame Street show, '123 Count with Me'.
Although it may sound silly and childish, it makes a great metaphor of the usefulness of numbers.
In the show, one of the characters, Humphrey, is working as a waiter in a restaurant and he takes an order of some penguins.
When he calls out the order to the kitchen, he says: "Fish, fish, fish, fish, fish, fish".
At this point, another character, Ernie, teaches Humphrey about the concept of numbers.
As we make this first abstraction leap, new rules start appearing.
When using numbers to characterize a collection of objects of the same kind, operations such as addition and subtraction emerge naturally once the concept is created, almost as if they had a life of their own.

As human culture and curiosity developed, new challenges appeared, and the relation between things and how they vary became a field of interest in mathematics and the physical sciences.
In particular, physics cares about the relationship about different magnitudes of the real life –or physical world–, and the tools to formalize these relations are variables and functions.
These are the next step in mathematical abstraction.
With variables, the conceptualization of magnitudes that change their value during a process was made evident, and although this value changed, the magnitude itself remained to be the same.
Moreover, functions were the key abstraction tool to represent how variables depend on other variables, and the concept of independent and dependent variables appears.

For many years, the desire to understand the world and postulate physical laws motivated mathematical discovery.
This is really easy to understand with the creation of calculus by Leibnitz and Newton, one of the most important mathematical tools we have in our arsenal, which has today extended use along science, engineering, economy and many more fields.
Alongside calculus, another important and fundamental mathematical field started being conceptualized and developed.
With the stimulus of games of chance and gambling, the mathematical theory of probability was born by Pascal and Fermat.
Almost in paralell, statistics started as an applied field dealing with data from states, such as population demographics and economy, but it slowly growed and extended to the collection of any kind of data, its analysis, interpretation and the extraction of conclusions from it.
The evolution process of statistics was intimately related with the development of probability, and, with the theory of errors, ultimately all three of these mathematical disciplines, calculus, probability and statistics, played a fundamental role in hypothesis testing and scientific research.

As systems under study started to grow in number of components and the interactions between them were taken into account, the modelling techniques that were most frequently used started to show some lacking capabilities to describe these systems.
A new conceptual approach revolutionized physics, and that is statistical mechanics.
Instead of having to know the exact state of a system conformed by a big number of elements that interact in complicated ways, statistical mechanics relies on probability and statistical methods to characterize the behaviour of the system.
This field settled a framework to think about systems in a different way.
The impossibility of knowing everything is a fact that is embraced in this line of thought and properly handled.
Many more of these fresh ideas kept developing along the years, such as complexity theory, a field that focuses on emergent properties of systems of aggregates of many parts.

## The emergence of Data Science

But it was not until recent years, where progress in computer technology made it possible to acquire, store and process large quantities of data, that the modelling techniques we have been discussing started to be applied in conjunction with statistical tools in areas that were outside of the overstudied academic scope.
This boosted the application of a lot of computational techniques and the creation of new ones at an exponential rate.
As a consequence of the enormous quantities of data available and the variety and diversity of the underlying generating mechanisms, a field of its own started growing. This field would deal with data in a general way, in principle, without caring too much about its nature.
This is what we call Data Science nowadays, and it aims to extract information and insights from any type of data.

We are living in the so-called "Petabyte Age".
At this scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics.
And this is so important because is expanding our possibilities to do science.
In the complex, messy domains that we already mention, particularly game-theoretic domains involving unpredictable agents such as human beings, there are no general theories that can be expressed in simple equations like $F = ma$ or $E = m c^2$.
But now we have massive amounts of available data describing those complex systems, so employing non-parametric density approximation models such as nearest-neighbors or kernel methods rather than parametric models such as low-dimensional linear regression may be the solution to gain useful insights. Theory is expanding into new forms. A new form where, if it allows to solve problems, correlation of variables is enough.

The exponential progress that we have been seeing in computer technology is a key aspect of data science.
So much so that it is common to hear that Data Science is the conjunction between statistics and computer science and, undoubtedly, there is a lot of truth in this.
But, what is the real difference between data science and statistics?

Well, the key difference is the (almost) pure practical point of view.
In the data science field the ultimate goal is to solve a huge diversity of problems and also creating a replicable solution.
In contrast to statistics where you can define the data that you are going to use, here the data itself became the object of study.
And, as the world its complicated, the data we encounter too.
In data science we make use of data collection, management, and presentation, to focus more on predicting future outcomes and less on merely inferring relationships.
So the motivation always will be to take action on the insights that you learn.

Apart from the particular application of programming in this field, we believe that programming, and in general, the informatics field, helps in the democratization of knowledge and power, making possible for anyone with a computer, internet connection and a text editor to start a journey.
Unlike academia, nobody cares in the informatics world if you have a PhD. What really matters is the quality of your code and your methods.
The same is true here.
Here we will not show polished and definitive solutions, but we will attack a wide variety of topics, showing several heuristics that will serve for anyone to build solid foundations in this field that we love so much.
Here we don't care about academic degrees, only the ingenuity to engineer and solve the most challenging problems.

## Our philosophy: The right balance between practice and theory

Having talked about all this, it is easier for us to explain the purpose of our book.
And it is to propose interesting and complex problems, to think about how to solve them, and only to introduce theory as the resolution of the problem requires it.
Our goal is to show the hardcore academic that it doesn't hurt to take a few blind steps to attack real problems and to show the purely practical person the benefits of having a deeper understanding of modelling, probability and statistics beyond just being able to use a few libraries.

We will be tackling a lot of different practical problems, emphasizing the interdisciplinary approach of the subject.
Physics, Economy and Sports are just some of the topics discussed in these chapters. data science can't be done without the help of a high-level programming language.
This is an essential skill, just as math and a scientific mindset.

In this book, the programming tool we are going to use is Julia, a language engineered from its roots for scientific applications, in particular, Data Science.

We will try to take you, the reader, through a journey that is more similar to the real way in which knowledge is built: an iterative, hands-on process of problem-solving that gradually builds intuitions about how things work and why, that we can later formalize.

## References

Expand Down
Loading