Skip to content

Web Workflow Access Causes Program Pause And Board Freeze #9171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
safferyj opened this issue Apr 12, 2024 · 4 comments
Open

Web Workflow Access Causes Program Pause And Board Freeze #9171

safferyj opened this issue Apr 12, 2024 · 4 comments

Comments

@safferyj
Copy link

CircuitPython version

9.0.3 and 9.1.0-beta.0 on Unexpected Maker FeatherS2 Neo and Adafruit Feather ESP32-S2 Reverse TFT

Code/REPL

import board
import digitalio
import microcontroller
import time
import watchdog 
import wifi

#print(f"Start ({microcontroller.cpu.reset_reason})")
print(f"IP address: {wifi.radio.ipv4_address}")

# LED
led = digitalio.DigitalInOut(board.LED)
led.direction = digitalio.Direction.OUTPUT

# Watchdog
dog = microcontroller.watchdog
dog.timeout = 60
dog.mode = watchdog.WatchDogMode.RAISE

while True:

    for i in range(15):

        # Feed the watchdog
        dog.feed()

        led.value = True
        print(".", end="")
        time.sleep(1)

        led.value = False
        print(".", end="")
        time.sleep(1)

    print()

Behavior

Accessing the Welcome page of the web workflow can cause the executing program to pause. With the above code, the LED stops flashing. Clicking on the Full Code Editor link causes the program to resume.

welcome-pause

Description

This issue does not seem to happen in 8.2.10.

Additional information

With the above code, if the pause happens and you wait more than a minute, then resuming by entering the Full Code Editor leads to an immediate watchdog exception.

@safferyj safferyj added the bug label Apr 12, 2024
@tannewt tannewt added this to the 9.x.x milestone Apr 12, 2024
@Neradoc
Copy link

Neradoc commented Feb 26, 2025

I see a similar situation, where code is paused when fetching a URL from the web workflow, to the point of causing a complete freeze of the board, losing USB and everything. When the board's code is particularly busy, this happens easily.

The freeze seems to be triggered by some access by web workflow, including the single scanning done by the web workflow home page of another board or the recurrent scanning by discotool manager (both of which retrieve some information on the board after detecting it on MDNS).

When connected to USB, the frozen board does not respond to ctrl-C but usually still has the USB drive work, but sometimes also error and unmount after a little while, without coming back.

The web workflow might remain working when the code is frozen, and apparently using the web workflow might make the code run again for a bit.

On a board with a more complex code, including a neopixel strip, a dotstar strip and a webserver, the freezing happens easily during normal web workflow use, making it quite difficult to use. I usually don't see it recover after the board freezes.

Repro on:

  • Adafruit QTPY ESP32-S2
  • Adafruit Feather ESP32-S2
  • Unexpected Maker FeatherS2
  • Adafruit QT PY ESP32-S3

Repro: latest, 9.2.4, 9.0.0
No repro: 8.2.10

Here is a simple code that helps visualize the board freezing:

import board
import time
import neopixel

status = neopixel.NeoPixel(board.NEOPIXEL, 1)

while True:
    for color in [0x200020, 0x002020]:
        status.fill(color)
        time.sleep(0.25)

Here is some python code that connects to the board's web workflow in a loop to force trigger the freeze:

import requests, sys, time
from datetime import datetime as d

ADDRESS = "192.168.1.38"
if len(sys.argv) > 1: ADDRESS = sys.argv[1]
url = f"http://{ADDRESS}/cp/version.json"

was_ok = None
t0 = d.now()
try:
    while True:
        is_ok = True
        try:
            with requests.get(url, timeout=1) as response:
                is_ok &= True
        except (requests.exceptions.ReadTimeout, requests.exceptions.ConnectionError):
            is_ok &= False
        print((f"{str(d.now()-t0)[:7]} " + ("ERROR","ok")[is_ok]).ljust(60), "\033[1G\033[1A")
        if is_ok != was_ok:
            print()
        was_ok = is_ok
        time.sleep(0.1)
except KeyboardInterrupt:
    print()

On a QTPY S2, this usually triggers the issue after approximately 30 seconds.
Sometimes the board ends up recovering for a little while.
When that happens I get outputs like this. The times indicate when the result of a connection change.
(So for example from 0:35 to, 2:10 it retries with a 1s timeout and errors every time)

0:00:00 ok   
0:00:32 ERROR
0:00:33 ok   
0:00:35 ERROR
0:02:10 ok   
0:02:42 ERROR
0:05:52 ok   
0:06:22 ERROR
0:07:57 ok   
0:08:29 ERROR
0:08:30 ok   
0:08:32 ERROR
0:11:42 ok   
0:12:16 ERROR

@Neradoc Neradoc changed the title Web Workflow Welcome Page Causes Program Pause Web Workflow Access Causes Program Pause And Board Freeze Feb 26, 2025
@tannewt
Copy link
Member

tannewt commented Feb 27, 2025

Web workflow responses are currently blocking. So, if they take a while, then everything else will be starved. I think the easiest way to fix this will be switching to Zephyr (or another RTOS). That way the web workflow can run in a separate thread and yield as it waits for sockets.

@Neradoc
Copy link

Neradoc commented Feb 28, 2025

Is blocking new to CP9 ?
The test code above with 8.2.10 on the same QTPY S2 goes without error for over 10 minutes.
So I hope there is some way to mitigate the issue to return to the previous behavior. I can live with the board stuttering when a file is uploaded or it's scanning MDNS, but the problem is the complete freeze for minutes or never recovering, plus drive corruption when USB fails.

On 9.x latest, the test starts failing within 30 seconds. In fact it's quite regular, the first error since reset happens after 140 to 150 requests, regardless of the sleep duration in the test script (tested 10ms, 50ms, 100ms).
With code like the following driving 2 strands of LED, it takes way less requests to freeze, around 10-20.
With Circuitpyton 8, this code just keeps doing its LED thing.

import board
import time
import neopixel
import adafruit_dotstar

pixel = neopixel.NeoPixel(board.NEOPIXEL, 90)
pidots = adafruit_dotstar.DotStar(board.SCK, board.MISO, 90)

while True:
    for color in [0x200020, 0x002020]:
        pixel.fill(color)
        pidots.fill(color)
        time.sleep(0.5)

@tannewt
Copy link
Member

tannewt commented Feb 28, 2025

No, it isn't new to CP9. CP9 did upgrade to IDF 5 though. It was a big step and the "wake circuitpython up from socket activity" is complicated.

static void socket_select_task(void *arg) {

The easiest way to hunt this down may be a git bisect. It'll be time consuming but also enlightening.

@dhalbert dhalbert modified the milestones: 9.x.x, 10.0.0 Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants