One thing that would make tut more beneficial as a testing method would be to add a concept of "golden testing" where alongside the tut block producing some output, we write the exact anticipated output. If the output matches then nothing happens but if it does not match it triggers a test failure.
```tut:golden
2 + 2
---
4
```