Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kmesh Logs Errors and Crashes After Deploying 165 ServiceEntries #1023

Open
tmodak27 opened this issue Nov 7, 2024 · 4 comments
Open

Kmesh Logs Errors and Crashes After Deploying 165 ServiceEntries #1023

tmodak27 opened this issue Nov 7, 2024 · 4 comments
Labels
area/kernel-native kind/bug Something isn't working

Comments

@tmodak27
Copy link

tmodak27 commented Nov 7, 2024

Motivation:

A limit of 165 ServiceEntries seems lower than expected. Our production use case requires support for a very large number of services, service entries and pods

Environment Details:

Kubernetes: 1.28
OS: Openeuler 23.03
Istio: 1.19
Kmesh version: release 0.5
CPU: 8
Memory: 16 Gib

Steps To Reproduce

  • Step 1: Make sure you have the below service-entry.yaml file at the root of your repo. This config defines 1 endpoint per service entry.
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: foo-service
  namespace: default
spec:
  hosts:
  - foo-service.somedomain # not used
  addresses:
  - 192.192.192.192/24 # VIPs
  ports:
  - number: 27018
    name: foo-service
    protocol: HTTP
  location: MESH_INTERNAL
  resolution: STATIC
  endpoints: # 1 endpoint per service entry. Adjust depending on your test.
  - address: 2.2.2.2

  • Step 2: Run the below command.
$ for i in $(seq 1 165); do sed "s/foo-service/foo-service-0-$(date +%s-%N)/g" service-entry.yaml | kubectl apply -f -; done

What was observed

After the number of ServiceEntries hit 165, Kmesh started logging the below error (see attachment) and crashed.

service-entry-error.txt'

Note: After trying this multiple times, sometimes the error message was different malloc(): invalid next size

@tmodak27 tmodak27 added the kind/bug Something isn't working label Nov 7, 2024
@hzxuzhonghu
Copy link
Member

cc @nlgwcy @lec-bit

@nlgwcy
Copy link
Contributor

nlgwcy commented Nov 8, 2024

There may be other model limitations. We'll check.

@lec-bit
Copy link
Contributor

lec-bit commented Nov 8, 2024

the same issue with #941
the maximum value of inner_map, 1300, so this issue occurred. When we create 163 virtualHosts in one routeConfigs, the array 163*sizeof(ptr) > 1300.
This problem can be avoided by manually adjusting the maximum value of inner_map.
kmesh.json

@tmodak27
Copy link
Author

tmodak27 commented Nov 8, 2024

Maximum Endpoints and Services Supported by Kmesh

After modifying the command to deploy every ServiceEntry on a separate port so that each RouteConfig would have one Virtual Host. Below are the 2 scenarios we tested.

Scenario 1: 1 endpoint (minimum possible) per ServiceEntry

Steps

  • Config file: The below config file will deploy endpoint per service entry.
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: foo-service
  namespace: default
spec:
  hosts:
  - foo-service.somedomain # not used
  addresses:
  - 192.192.192.192/24 # VIPs
  ports:
  - number: 27018
    name: foo-service
    protocol: HTTP
  location: MESH_INTERNAL
  resolution: STATIC
  endpoints: # 1 endpoint per service entry. Adjust depending on your test.
  - address: 2.2.2.2
  • Command: Run the below command to deploy 1100 services, each with one endpoint
for i in $(seq 1 1100); do sed  "s/foo-service/foo-service-0-$(date +%s-%N)/g;s/27018/$i/g" service-entry-1.yaml | kubectl apply -f -; done

Results

The below errors are observed when slightly more than 1000 ServiceEntries are deployed.

time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_943 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_48 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_117 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_138 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_383 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_603 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_739 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_786 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_79 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_354 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_591 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_675 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_729 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2
time="2024-11-08T18:41:39Z" level=error msg="listener 0.0.0.0_816 NONE flush failed: ListenerUpdate deserial_update_elem failed" subsys=cache/v2

Why is this an issue ?

Our use case needs to support higher number of endpoints, and this is far lower than the theoretical 100,000 endpoints and 5000 services.

Scenario 2: 150 endpoints (maximum possible) per ServiceEntry

Steps

  • Config file: Run the same test as before, but increase the number of addresses per endpoint to 150 (last line in the config)

  • Command: Run the below command to deploy 600 services, each with 150 endpoints

for i in $(seq  1 600); do sed  "s/foo-service/foo-service-0-$(date +%s-%N)/g;s/27018/$i/g" service-entry-1.yaml | kubectl apply -f -; done

Results

The below errors are observed at approx 500 services (total 75000 endpoints)

error-logs-max-pods.txt

Why is this an issue ?

Our use case needs to support highter number of endpoints, and 75000 endpoints is lower than theoretical 100,000 maximum endpoints and 5000 maximum services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kernel-native kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants