-
Notifications
You must be signed in to change notification settings - Fork 549
Description
The LinearCache will clear the watches under the name of changed resources in notifyAll calls (L147):
go-control-plane/pkg/cache/v3/linear.go
Lines 140 to 148 in 996a28b
| func (cache *LinearCache) notifyAll(modified map[string]struct{}) { | |
| // de-duplicate watches that need to be responded | |
| notifyList := make(map[chan Response][]string) | |
| for name := range modified { | |
| for watch := range cache.watches[name] { | |
| notifyList[watch] = append(notifyList[watch], name) | |
| } | |
| delete(cache.watches, name) | |
| } |
However, just cleaning the the watches under the name is not enough. It needs to clean all the watches to the same chan Response. Because the Sotw v3 server creates the chan Response with only 1 buffer (L369):
go-control-plane/pkg/server/sotw/v3/server.go
Lines 362 to 371 in 7e211bd
| // cancel existing watches to (re-)request a newer version | |
| switch { | |
| case req.TypeUrl == resource.EndpointType: | |
| if values.endpointNonce == "" || values.endpointNonce == nonce { | |
| if values.endpointCancel != nil { | |
| values.endpointCancel() | |
| } | |
| values.endpoints = make(chan cache.Response, 1) | |
| values.endpointCancel = s.cache.CreateWatch(req, streamState, values.endpoints) | |
| } |
Consider the following sequence:
- The sotw server receives a
DiscoveryRequestwith 2 resource names, and calls thecache.CreateWatchto the LinearCache. - The LinearCache registers the
chan Responseprovided by the sotw server with 2 watch entries corresponding to the requested resources. - The LinearCache's
UpdateResourceis called with the first resource name. - The LinearCache's
UpdateResourceis called with the second resource name and thechan Responseis blocked. - The sotw server receives another
DiscoveryRequestbut the LinearCache is still locked therefore they are deadlocked.
The LinearCache could just maintain another chan Response -> resource name map for fast cleaning in notifyAll call. But I think the root cause is that the sotw server uses a single goroutine to handle both streams of bi-di grpc.
If it handled them in separated goroutines, there would be no such deadlock, and there might be even no need to recreate a new chan Response on each DiscoveryRequest and unregister watches on each notifyAll call in LinearCache.