Merge branch 'Superalgos:develop' into develop

merolaika · Sep 16, 2022 · 9c167da · 9c167da
2 parents cd0f3b0 + 8ca2f54
commit 9c167da
Show file tree

Hide file tree

Showing 12 changed files with 1,536 additions and 575 deletions.
diff --git a/Bitcoin-Factory/Forecast-Client/notebooks/Bitcoin_Factory_RL.py b/Bitcoin-Factory/Forecast-Client/notebooks/Bitcoin_Factory_RL.py
diff --git a/Bitcoin-Factory/ReadMeReinforcementLearning.md b/Bitcoin-Factory/ReadMeReinforcementLearning.md
@@ -0,0 +1,126 @@
+# Reinforcement Learning
+## 💫 1. Introduction
+[Reinforcement Learning](https://en.wikipedia.org/wiki/Reinforcement_learning) is a term used to describe a special machine learning process. The typical framing of a Reinforcement Learning (RL) scenario: An agent takes actions in an environment, which is interpreted into a reward and a representation of the state, which are fed back into the agent.
+
+![Learning framework](https://upload.wikimedia.org/wikipedia/commons/1/1b/Reinforcement_learning_diagram.svg "RL Framework")
+
+In our usage case, the environment is a stock trading one and the possible actions are buy,sell or hold. The reward will be our gain or loss. Based on this reward the agent will learn how to trade better. The process of learning is done with a so called [Proximal Policy Optimization (PPO)](https://en.wikipedia.org/wiki/Proximal_Policy_Optimization).
+
+At the end the agent will provide us an action for the current candle. The possible actions at the moment are:
+* 0 -> buy long
+* 1 -> sell 
+* 2 -> hold
+
+For buy and sell signals an additionaly percentage is provided.
+
+## 📒 2. Configuration
+The basic config has to be done as pointed out in [Bitcoin Factory ReadMe](./README.md). Hereafter the differences for RL are shown.
+### 2.1 Testserver config
+To run a Testserver for RL und need to change the configuration of the testserver node in SA. First you need to define the python script, which should be used for the docker sessions on the clients.
+Second you need to define the range of parameters to be tested: For example the learning rate and so on.
+```js
+{
+    ...
+    "pythonScriptName": "Bitcoin_Factory_RL.py",
+    ...
+    "parametersRanges": {
+        "LIST_OF_ASSETS": [
+            [
+                "BTC"
+            ]
+        ],
+        "LIST_OF_TIMEFRAMES": [
+            [
+                "01-hs"
+            ],
+            [
+                "02-hs"
+            ]
+        ],
+        "NUMBER_OF_LAG_TIMESTEPS": [
+            10
+        ],
+        "PERCENTAGE_OF_DATASET_FOR_TRAINING": [
+            80
+        ],
+        "NUMBER_OF_EPOCHS": [
+            750
+        ],
+        "NUMBER_OF_LSTM_NEURONS": [
+            50
+        ],
+        "TIMESTEPS_TO_TRAIN": [
+            1e7
+        ],
+        "OBSERVATION_WINDOW_SIZE": [
+            24,
+            48
+        ],
+        "INITIAL_QUOTE_ASSET": [
+            1000
+        ],
+        "INITIAL_BASE_ASSET": [
+            0
+        ],
+        "TRADING_FEE": [
+            0.01
+        ],
+        "ENV_NAME": [
+            "SimpleTrading"
+        ],
+        "ENV_VERSION": [
+            1
+        ],
+        "REWARD_FUNCTION": [
+            "unused"
+        ],
+        "EXPLORE_ON_EVAL": [
+            "unused"
+        ],
+        "ALGORITHM": [
+            "PPO"
+        ],
+        "ROLLOUT_FRAGMENT_LENGTH": [
+            200
+        ],
+        "TRAIN_BATCH_SIZE": [
+            2048
+        ],
+        "SGD_MINIBATCH_SIZE": [
+            64
+        ],
+        "BATCH_MODE": [
+            "complete_episodes"
+        ],
+        "FC_SIZE": [
+            256
+        ],
+        "LEARNING_RATE": [
+            0.00001
+        ],
+        "GAMMA": [
+            0.95
+        ]        
+    }
+}
+```
+### 2.2 Testclient config
+No special config is needed.
+But run only one client per machine (The python script takes care of parallel execution on its own).
+
+## 💡 3. Results
+> __Note__
+> The processing of one test case on the client takes roughly 2h-6h on a recent System.
+
+The provided Timeseries values are devided in 3 parts (Train, Test, Validate). The first one (train) is used to train the network. The second one (test) is used by the PPO-agent to evaluate the current net during the learning process. The third part is never seen by the agent, it is used to validate if the trained model is able to trade profitable on unseen data.
+
+The python script produces 3 charts to visualize the results. The follwing 3 examples are preliminary - made by a not good trained agent. 
+![Example Train Results](docs/BTC_train.png) "BTC train")
+![Example Test Results](docs/BTC_test.png) "BTC test")
+![Example Validate Results](docs/BTC_validate.png) "BTC validate")
+
+## 🤝 4. Support
+
+Contributions, issues, and feature requests are welcome!
+
+Give a ⭐️ if you like this project or even better become a part of the Superalgos community!