Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP/IP subtly broken on Opta (and possibly other platforms) #978

Closed
fogzot opened this issue Oct 24, 2024 · 14 comments · Fixed by #983
Closed

TCP/IP subtly broken on Opta (and possibly other platforms) #978

fogzot opened this issue Oct 24, 2024 · 14 comments · Fixed by #983

Comments

@fogzot
Copy link

fogzot commented Oct 24, 2024

Some changes done after 4.1.3 introduced a very subtle bug that garbles the data sent over TCP/IP, for example using HttpClient. When using the following code:

    httpClient->beginRequest();
    httpClient->get("/opta/002E003D3532511134313632/cfg");
    httpClient->sendHeader("X-DNDG-Stamp", String(84814));
    httpClient->sendHeader("X-DNDG-SerialNumber", "002E003D3532511134313632");
    httpClient->sendHeader("X-DNDG-Applicationname", "ParkinGreen");
    httpClient->sendHeader("X-DNDG-FirmwareVersion", String(2));
    httpClient->sendHeader("X-DNDG-PublicKey", "2nNuFMzeh+ixOWo8bD4qUTpIN4KRf1O+zcFXJaSTKkcShr/lJdgVTxugFiH1SBEAe0caSUb3jw1Cs1o1PhmtgA==");
    httpClient->sendHeader("X-DNDG-Signature", "vKFYrGgFdOP9KRQa7eYHEEa0DFl1tnIfIsYV/eFnCAh2U/JUjYDkUQspE6dUTkiPx+yM//sb5WV2ybp7kGlVzA==");
    httpClient->endRequest();

    Serial.print("Result: ");
    Serial.println(httpClient->responseStatusCode());

    httpClient->stop();

sometimes on the server side we receive partially corrupted content:

X-DNDG-Stamp: 84814
X-DNDG-SerialNumber: 002E003D3532511134313632
X-DNDG-FirmwareVersion2: 
X-DNDG-ApplicationName: ParkinGreen
X-DNDG-PublicKey: 2nNuFMzeh+ixOWo8bD4qUTpIN4KRf1O+zcFXJaSTKkcShr/lJdgVTxugFiH1SBEAe0caSUb3jw1Cs1o1PhmtgA==
X-DNDG-Signature: vKFYrGgFdOP9KRQa7eYHEEa0DFl1tnIfIsYV/eFnCAh2U/JUjYDkUQspE6dUTkiPx+yM//sb5WV2ybp7kGlVzA==

Note the X-DNDG-FirmwareVersion2: line, that should have been X-DNDG-FirmwareVersion: 2. This happens more frequently when using an encrypted connection but also happens when using a plain EthernetClient (so it is not caused by HttpClient). Also, this happens more frequently when sending "large" payloads (> 100 bytes) in a short period of time.

We think this broke when the sockes writes were switched from blocking to non-blocking to address the "ethernet disconnected" problem without explicitly checking for the EAGAIN result to correctly retry and serialize the writes.

@JAndrassy
Copy link
Contributor

could you try this changes https://github.com/arduino/ArduinoCore-mbed/pull/912/files ?

@pennam
Copy link
Contributor

pennam commented Oct 24, 2024

looks strange to me that f7d9db9 can cause this results. From your log all data reaches the server but is scrambled in a weird way....

Did you try to revert f7d9db9 ? Do you have a sketch to reproduce the issue?

@fogzot
Copy link
Author

fogzot commented Oct 24, 2024

looks strange to me that f7d9db9 can cause this results. From your log all data reaches the server but is scrambled in a weird way....

Sometime, other times some parts are just missing.

The code in the change you linked still has a sock->set_blocking(true); but the code that has the problem is missing that line:

size_t arduino::MbedClient::write(const uint8_t *buf, size_t size) {
if (sock == nullptr)
return 0;
sock->set_timeout(_timeout);
int ret = sock->send(buf, size);
sock->set_blocking(false);
return ret >= 0 ? ret : 0;
}

I don't know if that's related to the problem but not checking for specific non-blocking error codes seems suspect to me.

Did you try to revert f7d9db9 ?

No, I just tried older and older cores until I found one that didn't show the problem.

Do you have a sketch to reproduce the issue?

Sure. You will have to setup some kind of server (HTTPS is better because the problem is more evident) and dump everything received (I use Nginx to terminate the SSL and forward the connection to a netcat):

#include <Arduino.h>
#include <Ethernet.h>
#include <EthernetSSLClient.h>
#include <ArduinoHttpClient.h>

EthernetSSLClient eth;
HttpClient http = HttpClient(eth, "iot.dndg.io", 443);

void setup()
{
    Ethernet.begin();
}

void loop()
{
    delay(2000);
    getConfiguration(&http);
}

void getConfiguration(HttpClient *httpClient)
{
    httpClient->beginRequest();
    httpClient->get("/opta/002E003D3532511134313632/cfg");
    httpClient->sendHeader("X-DNDG-Stamp", String(84814));
    httpClient->sendHeader("X-DNDG-SerialNumber", "002E003D3532511134313632");
    httpClient->sendHeader("X-DNDG-Applicationname", "ParkinGreen");
    httpClient->sendHeader("X-DNDG-FirmwareVersion", String(2));
    httpClient->sendHeader("X-DNDG-PublicKey", "2nNuFMzeh+ixOWo8bD4qUTpIN4KRf1O+zcFXJaSTKkcShr/lJdgVTxugFiH1SBEAe0caSUb3jw1Cs1o1PhmtgA==");
    httpClient->sendHeader("X-DNDG-Signature", "vKFYrGgFdOP9KRQa7eYHEEa0DFl1tnIfIsYV/eFnCAh2U/JUjYDkUQspE6dUTkiPx+yM//sb5WV2ybp7kGlVzA==");
    httpClient->endRequest();

    Serial.print("Result: ");
    Serial.println(httpClient->responseStatusCode());

    httpClient->stop();
}

@fogzot
Copy link
Author

fogzot commented Oct 24, 2024

could you try this changes https://github.com/arduino/ArduinoCore-mbed/pull/912/files ?

Sorry but I am in no condition to setup an mbed development environment and build the core from scratch right now.

@JAndrassy
Copy link
Contributor

those changes are in the Arduino library

@pennam
Copy link
Contributor

pennam commented Oct 24, 2024

The code in the change you linked still has a sock->set_blocking(true); but the code that has the problem is missing that line:

got it. I've removed the set_blocking(true); call because immediatly after sock->set_timeout(SOCKET_TIMEOUT); is called vanishing the effect of set_blocking(true);

https://github.com/arduino/mbed-os/blob/e959ae057b31849f779904ce97bfabc68658f53e/connectivity/netsocket/source/InternetSocket.cpp#L189-L206

I will try to setup a test and reproduce the issue

@pennam
Copy link
Contributor

pennam commented Oct 24, 2024

Is /opta/002E003D3532511134313632/cfg a file? which is its size? Are you reading it each time or just making the request? If you don't read it the issue cloud be related to this #937 (comment)

@fogzot
Copy link
Author

fogzot commented Oct 25, 2024

Is /opta/002E003D3532511134313632/cfg a file? which is its size? Are you reading it each time or just making the request? If you don't read it the issue cloud be related to this #937 (comment)

The result of the get is a 200 bytes JSON file (generated on the fly by the server). When we received malformed headers the server just send back a "400 malformed request" error and in both cases (OK or error) we always consume the full body of the response. I don't think the problem is related to how we use the HTTP library because it sometimes happens on the first request after resetting the Opta. Also, we did some tests using a plain EthernetClient and we get the same garbled headers on the server. Here is the code:

    EthernetSSLClient eth;

    eth.connectSSL("iot.dndg.io", 443);

    size_t w = eth.println("GET /opta/002E003D3532511134313632/cfg HTTP/1.1");
    w += eth.println("User-Agent: Finder Opta");
    w += eth.println("Host: iot.dndg.io");
    w += eth.println("Connection: close");
    w += eth.println("X-DNDG-Stamp: 84814");
    w += eth.println("X-DNDG-SerialNumber: 002E003D3532511134313632");
    w += eth.println("X-DNDG-Applicationname: ParkinGreen");
    w += eth.println("X-DNDG-FirmwareVersion: 2");
    w += eth.println("X-DNDG-PublicKey: 2nNuFMzeh+ixOWo8bD4qUTpIN4KRf1O+zcFXJaSTKkcShr/lJdgVTxugFiH1SBEAe0caSUb3jw1Cs1o1PhmtgA==");
    w += eth.println("X-DNDG-Signature: vKFYrGgFdOP9KRQa7eYHEEa0DFl1tnIfIsYV/eFnCAh2U/JUjYDkUQspE6dUTkiPx+yM//sb5WV2ybp7kGlVzA==");
    w += eth.println();
    eth.flush();
    Serial.print("Write size is ");
    Serial.println(w);
    // Sometimes it writes 356, sometimes 462.

@pennam
Copy link
Contributor

pennam commented Oct 25, 2024

Yesterday I've made some test with the EthernetClient but I was not able to reproduce it. I will try again today doing multiple resets and adding the check on the write size.

@pennam
Copy link
Contributor

pennam commented Oct 28, 2024

@fogzot can you give a spin to this PR #983 ?

It includes an mbed-os patch but i've also included the libmbed.a rebuild commit

Thanks

@jbfraser1
Copy link

Pretty sure I was experiencing this as well. Don't have easy access to the server to see what was actually happening, but five out of five opta's updated to 4.1.6 last week started failing about 50% of https POST calls, with server returning an error indicating the headers or data payload were not in-spec. (http 415 in my case.)

One device connected to a different network started experiencing >90% failure, POSTing the same payload. Success or failure seems random.

Downgrading to 4.1.3 gets at least one Opta back to 100% success. I'll try others in the next 24 hours.

@pennam pennam reopened this Oct 29, 2024
@fogzot
Copy link
Author

fogzot commented Oct 29, 2024

@fogzot can you give a spin to this PR #983 ?

It includes an mbed-os patch but i've also included the libmbed.a rebuild commit

Can you tell me how am I supposed to test the PR? I don't have a clear idea about how to replace the current core as downloaded via arduino-cli. It is possible to just clone the repository and use it to build a test program from a sketch?

@pennam
Copy link
Contributor

pennam commented Oct 29, 2024

Yes, you can do it following the instructions in the README https://github.com/arduino/ArduinoCore-mbed?tab=readme-ov-file#installation

After you should see an additional core listed in the IDE or arduino cli

$ arduino-cli core list
ID                       Installato Ultimo                   Nome
arduino-git:mbed         9.9.9      9.9.9                    Arduino Mbed OS Boards

You can use this commit 944572f since I've force pushed the PR to get the patch merged

@pennam
Copy link
Contributor

pennam commented Nov 7, 2024

@fogzot @jbfraser1 release 4.2.1 includes the fix for this issue. I'm closing this.

@pennam pennam closed this as completed Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants