-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[BUG] BlobContainerClient.listBlobsByHierarchy throws StackOverflowError #20523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for submitting this issue @zhoujia1974. This issue is related to #12453 and should be resolved by #15646 and #15929 which are available to versions of |
@alzimmermsft Thank you for the fix. I'll test the new version. Quick question, what do you think could cause the same error for our second test after we removed all blobs from the container? Does azure data lake v2 caches the deleted files for sometime. We checked the config and verified it doesn't turn on soft delete. So why does it still throw StackOverflowError when there is only 10 blobs in the container? |
@zhoujia1974, for the 10 blob scenario that is a good question, does your project include a dependency on Jackson directly? Historically there have been a few cases where version skews of Jackson allowed for compilation due to API compatibility but had runtime changes that led to runaway states. |
@alzimmermsft This is our project's jackson dependency.
| | +--- com.fasterxml.jackson.core:jackson-annotations:2.10.5 -> 2.11.4 |
Thank you for the dependency list @zhoujia1974. I'll take a look into what happens when Jackson 2.10 is used with the SDKs. Depending on which version of the SDKs being used the Jackson dependency we use is either 2.11 or 2.12. Also, please let me know if upgrading to a later version of the library resolves this issue. |
@alzimmermsft We have some troubles to use the azure-storage-blob 12.9.0 version. This version causes jvm crash potentially from the call to upload blob. I have trouble to find exactly where the crash comes from. I'll try to upgrade to 12.10.2 to see it helps. **A fatal error has been detected by the Java Runtime Environment: SIGSEGV (0xb) at pc=0x00007f8919af0c70, pid=6, tid=452 JRE version: OpenJDK Runtime Environment (Zulu11.31+11-CA) (11.0.3+7) (build 11.0.3+7-LTS) |
@zhoujia1974, for the issues you are seeing with upload are you using |
@alzimmermsft The code is like below. We are using ByteArrayInputStream to serve the content to upload. We are currently stucked. With new sdk version, we got jmv crash issue. After go back to the 12.6.1 version, we got list blob stackoverflow issue. But the same 12.6.1 sdk used in production branch doesn't have the same listBlob issue even when the container has 13,000 blobs. For the development branch, we have minor library upgrade. But those changes don't look like relevent. Does the stackoverflow issue link to any specific condition or configuration in a blob storage? containerClient = blobClient.getBlobContainerClient(containerName) |
@alzimmermsft Finally figured out what is the issue with the listBlobHierarchy api. We hit the bug reported in #9465 and #18881. Thanks, |
Great to hear @zhoujia1974 I will go ahead and close this issue as it looks like it has been resolved. Feel free to reopen if you hit the issue again. |
Describe the bug
BlobContainerClient.listBlobsByHierarchy runs into a deep recursive call that in the end overflow the stack
Exception or Stack Trace
Only include partial stack trace. The remaining is just repeat on ConcatArraySubscriber.onNext call
r.netty.channel.ChannelOperationsHandler - [id: 0xa74beec8, L:/10.241.108.174:44766 - R:xxxx.blob.core.windows.net/20.150.43.196:443] Error was received while reading the incoming data. The connection will be closed.
! java.lang.StackOverflowError: null
! at reactor.core.publisher.Operators.reportThrowInSubscribe(Operators.java:204)
! at reactor.core.publisher.Flux.subscribe(Flux.java:8328)
! at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:418)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)
! at reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber.onNext(FluxConcatArray.java:176)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)
! at reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber.onNext(FluxConcatArray.java:176)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)
To Reproduce
We have 12,843 blobs in the container under different parent folders (Azure Data Lake Storage V2). We list the blob with a parent folder path and the parent folder only contains 10 blobs.
We suspected this issue is related to the total blob count in a container. The listBlobsByHierarchy probably traverse all the blob files in a recursive way to find the blobs that match the path prefix. When there are too many files to traverse in a recursive way, it blows up the calling stack.
To reproduce, create 4 folders in a container, upload 5000 blobs to each of first three folders (total 15,000 blobs), upload 10 blobs to last folder. Then use java sdk to listBlobsByHierarchy for the last folder.
We clear all blobs from the container and run another test with 10 blobs in a folder. It still runs to the same error.
Code Snippet
Expected behavior
The method should return the list of blobs without error. Or please advice a better way to list all blobs in a folder.
Screenshots
Setup (please complete the following information):
Additional context
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
The text was updated successfully, but these errors were encountered: