Skip to content

Commit 6db377e

Browse files
authored
fix: sync WebSocket max message size fix from v0.1.12 (#1678)
1 parent 91aa7f9 commit 6db377e

File tree

7 files changed

+99
-97
lines changed

7 files changed

+99
-97
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@ aider_stdlib_map.md
22

33
# Local tools and notes
44
.local-tools/
5+
*.local/
6+
*.local
57

68
# Release artifacts (downloaded binaries)
79
release-artifacts/

CLAUDE.local.md

Lines changed: 63 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -1,95 +1,74 @@
11
- contract states are commutative monoids, they can be "merged" in any order to arrive at the same result. This may reduce some potential race conditions.
22

3-
## Transport Layer Key Management Issues (2025-01-06)
3+
## Important Testing Notes
44

5-
### Problem
6-
Integration test `test_put_contract` was failing with "Failed to decrypt packet" errors after v0.1.5 deployment to production. The same decryption failures were affecting the River app in production.
5+
### Always Use Network Mode for Testing
6+
- **NEVER use local mode for testing** - it uses very different code paths
7+
- Local mode bypasses critical networking components that need to be tested
8+
- Always test with `freenet network` to ensure realistic behavior
79

8-
### Root Cause
9-
The transport layer was incorrectly handling symmetric key establishment for gateway connections:
10+
## Quick Reference - Essential Commands
1011

11-
1. **Gateway connection key misuse**: Gateway was using different keys for inbound/outbound when it should use the same client key for both directions
12-
2. **Client ACK encryption error**: Client was encrypting its final ACK response with the gateway's key instead of its own key
13-
3. **Packet routing overflow**: When existing connection channels became full, packets were misrouted to new gateway connection handlers instead of waiting
12+
### River Development
13+
```bash
14+
# Publish River (use this, not custom scripts)
15+
cd ~/code/freenet/river && RUST_MIN_STACK=16777216 cargo make publish-river-debug
1416

15-
### Key Protocol Rules
16-
- **Gateway connections**: Use the same symmetric key (client's key) for both inbound and outbound communication
17-
- **Peer-to-peer connections**: Use different symmetric keys for each direction (each peer's own inbound key)
18-
- **Connection establishment**: Only initial gateway connections and explicit connect operations should create new connections
19-
- **PUT/GET/SUBSCRIBE/UPDATE operations**: Should only use existing active connections, never create new ones
20-
21-
### Fixes Applied
22-
23-
#### 1. Gateway Connection Key Fix (`crates/core/src/transport/connection_handler.rs:578-584`)
24-
```rust
25-
// For gateway connections, use the same key for both directions
26-
let inbound_key = outbound_key.clone();
27-
let outbound_ack_packet = SymmetricMessage::ack_ok(
28-
&outbound_key,
29-
outbound_key_bytes.try_into().unwrap(),
30-
remote_addr,
31-
)?;
32-
```
33-
34-
#### 2. Client ACK Response Fix (`crates/core/src/transport/connection_handler.rs:798-811`)
35-
```rust
36-
// Use our own key to encrypt the ACK response (same key for both directions with gateway)
37-
outbound_packets
38-
.send((
39-
remote_addr,
40-
SymmetricMessage::ack_ok(
41-
&inbound_sym_key, // Use our own key, not the gateway's
42-
inbound_sym_key_bytes,
43-
remote_addr,
44-
)?
45-
.prepared_send(),
46-
))
47-
.await
48-
.map_err(|_| TransportError::ChannelClosed)?;
49-
```
50-
51-
#### 3. Packet Sending Consistency (`crates/core/src/transport/connection_handler.rs:740-747`)
52-
```rust
53-
let packet_to_send = our_inbound.prepared_send();
54-
outbound_packets
55-
.send((remote_addr, packet_to_send.clone()))
56-
.await
57-
.map_err(|_| TransportError::ChannelClosed)?;
58-
sent_tracker
59-
.report_sent_packet(SymmetricMessage::FIRST_PACKET_ID, packet_to_send);
17+
# Verify River build time (CRITICAL - only way to confirm new version is served)
18+
curl -s http://127.0.0.1:50509/v1/contract/web/BcfxyjCH4snaknrBoCiqhYc9UFvmiJvhsp5d4L5DuvRa/ | grep -o 'Built: [^<]*' | head -1
6019
```
6120

62-
### Testing
63-
- Created specialized transport tests in `crates/core/src/transport/test_gateway_handshake.rs`
64-
- `test_gateway_handshake_symmetric_key_usage()`: Verifies gateway connections use same key for both directions
65-
- `test_peer_to_peer_different_keys()`: Verifies peer-to-peer connections use different keys
66-
- Both specialized tests pass, confirming the transport layer fixes work correctly
21+
### Freenet Management
22+
```bash
23+
# Start Freenet
24+
./target/release/freenet network > freenet-debug.log 2>&1 &
6725

68-
### Root Cause Analysis Complete
26+
# Check status
27+
ps aux | grep freenet | grep -v grep | grep -v tail | grep -v journalctl
6928

70-
#### PUT Operation Connection Creation Issue
71-
**Location**: `crates/core/src/node/network_bridge/p2p_protoc.rs:242-291`
72-
73-
**Problem**: PUT/GET/SUBSCRIBE/UPDATE operations create new connections when no existing connection is found, violating the protocol rule that these operations should only use existing active connections.
74-
75-
**Behavior**: When `NetworkBridge.send()` is called and no existing connection exists:
76-
1. System logs warning: "No existing outbound connection, establishing connection first"
77-
2. Creates new connection via `NodeEvent::ConnectPeer`
78-
3. Waits up to 5 seconds for connection establishment
79-
4. Attempts to send message on newly created connection
80-
81-
**Channel Overflow Root Cause**: Channels fill up due to throughput mismatch:
82-
- **Fast UDP ingress**: Socket receives packets quickly
83-
- **Slow application processing**: `peer_connection_listener` processes one message at a time sequentially
84-
- **Limited buffering**: 100-packet channel buffer insufficient for high-throughput scenarios
85-
- **No flow control**: System creates new connections instead of implementing proper backpressure
86-
87-
**Cascade Effect**: Channel overflow → packet misrouting → wrong connection handlers → decryption failures → new connection creation
88-
89-
#### Required Fix
90-
The network bridge should fail gracefully or retry with existing connections instead of creating new ones for PUT/GET/SUBSCRIBE/UPDATE operations. Only initial gateway connections and explicit CONNECT operations should establish new connections.
29+
# Monitor logs
30+
tail -f freenet-debug.log
31+
```
9132

92-
### Next Steps
93-
1. Modify PUT/GET operation handling to use only existing connections
94-
2. Implement proper backpressure handling for full channels instead of creating new connections
95-
3. Test that integration test `test_put_contract` passes after the fix
33+
## Detailed Documentation Files
34+
35+
### Current Active Debugging
36+
- **Directory**: `freenet-invitation-bug.local/` (consolidated debugging)
37+
- `README.md` - Overview and quick commands
38+
- `river-notes/` - River-specific debugging documentation
39+
- `contract-test/` - Minimal Rust test to reproduce PUT/GET issue
40+
41+
### River Invitation Bug (2025-01-18)
42+
- **Status**: CONFIRMED - Contract operations hang on live network, work in integration tests
43+
- **Root Cause**: Freenet node receives WebSocket requests but never responds
44+
- **Test Directory**: `freenet-invitation-bug.local/live-network-test/`
45+
- **Confirmed Findings**:
46+
- River correctly sends PUT/GET requests via WebSocket
47+
- Raw WebSocket test: Receives binary error response from server
48+
- freenet-stdlib test: GET request sent but never receives response (2min timeout)
49+
- Integration test `test_put_contract` passes when run in isolation
50+
- Issue affects both PUT and GET operations
51+
- **Current Investigation**: Systematically debugging why Freenet node doesn't respond to contract operations
52+
- **See**: `freenet-invitation-bug.local/river-notes/invitation-bug-analysis-update.md`
53+
54+
### Historical Analysis (Reference Only)
55+
- **Transport Layer Issues**: See lines 3-145 in previous version of this file (archived)
56+
- **River Testing Procedures**: See lines 97-145 in previous version of this file (archived)
57+
58+
### CI Tools
59+
- **GitHub CI Monitoring**: `~/code/agent.scripts/wait-for-ci.sh [PR_NUMBER]`
60+
61+
### Testing Tools
62+
- **Puppeteer Testing Guide**: `puppeteer-testing-guide.local.md` - Essential patterns for testing Dioxus apps with MCP Puppeteer tools
63+
64+
## Key Code Locations
65+
- **River Room Creation**: `/home/ian/code/freenet/river/ui/src/components/room_list/create_room_modal.rs`
66+
- **River Room Synchronizer**: `/home/ian/code/freenet/river/ui/src/components/app/freenet_api/room_synchronizer.rs`
67+
- **River Room Data**: `/home/ian/code/freenet/river/ui/src/room_data.rs`
68+
69+
## Organization Rules
70+
1. **Check this file first** for command reference and active debugging directories
71+
2. **Use standard commands** instead of creating custom scripts
72+
3. **Verify River build timestamps** after publishing
73+
4. **Create timestamped .local directories** for complex debugging sessions
74+
5. **Update this index** when adding new debugging directories or tools

apps/freenet-ping/Cargo.lock

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

ci_test_log.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"message": "Not Found",
3+
"documentation_url": "https://docs.github.com/rest",
4+
"status": "404"
5+
}

crates/core/src/client_events/websocket.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -325,7 +325,10 @@ async fn websocket_commands(
325325
}
326326
};
327327

328-
ws.on_upgrade(on_upgrade)
328+
// Increase max message size to 100MB to handle contract uploads
329+
// Default is ~64KB which is too small for WASM contracts
330+
ws.max_message_size(100 * 1024 * 1024)
331+
.on_upgrade(on_upgrade)
329332
}
330333

331334
async fn websocket_interface(

crates/core/tests/operations.rs

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -630,11 +630,13 @@ async fn test_multiple_clients_subscription() -> TestResult {
630630
}
631631
.boxed_local();
632632

633-
let test = tokio::time::timeout(Duration::from_secs(180), async {
634-
// Wait for nodes to start up
635-
tokio::time::sleep(Duration::from_secs(20)).await;
633+
let test = tokio::time::timeout(Duration::from_secs(600), async {
634+
// Wait for nodes to start up - CI environments need more time
635+
tokio::time::sleep(Duration::from_secs(40)).await;
636636

637637
// Connect first client to node A's websocket API
638+
tracing::info!("Starting WebSocket connections after 40s startup wait");
639+
let start_time = std::time::Instant::now();
638640
let uri_a = format!(
639641
"ws://127.0.0.1:{}/v1/contract/command?encodingProtocol=native",
640642
ws_api_port_a
@@ -655,6 +657,10 @@ async fn test_multiple_clients_subscription() -> TestResult {
655657
let mut client_api_node_b = WebApi::start(stream3);
656658

657659
// First client puts contract with initial state (without subscribing)
660+
tracing::info!(
661+
"Client 1: Starting PUT operation (elapsed: {:?})",
662+
start_time.elapsed()
663+
);
658664
make_put(
659665
&mut client_api1_node_a,
660666
wrapped_state.clone(),
@@ -666,10 +672,14 @@ async fn test_multiple_clients_subscription() -> TestResult {
666672
// Wait for put response
667673
loop {
668674
let resp =
669-
tokio::time::timeout(Duration::from_secs(60), client_api1_node_a.recv()).await;
675+
tokio::time::timeout(Duration::from_secs(120), client_api1_node_a.recv()).await;
670676
match resp {
671677
Ok(Ok(HostResponse::ContractResponse(ContractResponse::PutResponse { key }))) => {
672678
assert_eq!(key, contract_key, "Contract key mismatch in PUT response");
679+
tracing::info!(
680+
"Client 1: PUT completed successfully (elapsed: {:?})",
681+
start_time.elapsed()
682+
);
673683
break;
674684
}
675685
Ok(Ok(other)) => {

scripts/deploy-to-gateways.sh

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -175,21 +175,24 @@ compile_for_target() {
175175
# Create cross-compiled directory
176176
mkdir -p "$CROSS_BINARIES_DIR"
177177

178-
local compile_output
179-
local compile_result
178+
local compile_output=""
179+
local compile_result=0
180180

181181
# Download from GitHub workflow artifacts for both architectures
182182
show_progress "Downloading binary from GitHub workflow" "start"
183183

184-
# Get the latest successful workflow run for main branch with timestamp
185-
local run_info=$(gh run list --repo freenet/freenet-core --workflow cross-compile.yml --branch main --status success --limit 1 --json databaseId,createdAt --jq '.[0]')
184+
# Get the latest successful workflow run (any branch) with timestamp
185+
local run_info=$(gh run list --repo freenet/freenet-core --workflow cross-compile.yml --status success --limit 1 --json databaseId,createdAt,headBranch --jq '.[0]')
186186

187187
if [ -z "$run_info" ]; then
188188
compile_output="Failed to find successful workflow run"
189189
compile_result=1
190190
else
191191
local run_id=$(echo "$run_info" | jq -r '.databaseId')
192192
local created_at=$(echo "$run_info" | jq -r '.createdAt')
193+
local branch=$(echo "$run_info" | jq -r '.headBranch')
194+
195+
log_verbose "Using workflow run $run_id from branch $branch"
193196

194197
# Check if artifact is older than 12 hours
195198
local current_time=$(date +%s)

0 commit comments

Comments
 (0)