Skip to content

[BUG] BlobContainerClient.listBlobsByHierarchy throws StackOverflowError #20523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
zhoujia1974 opened this issue Apr 9, 2021 · 10 comments
Closed
3 tasks done
Labels
Azure.Core azure-core Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. dependency-issue-jackson Issue caused by dependency version mismatch with one of the Jackson libraries question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@zhoujia1974
Copy link

zhoujia1974 commented Apr 9, 2021

Describe the bug
BlobContainerClient.listBlobsByHierarchy runs into a deep recursive call that in the end overflow the stack

Exception or Stack Trace
Only include partial stack trace. The remaining is just repeat on ConcatArraySubscriber.onNext call

r.netty.channel.ChannelOperationsHandler - [id: 0xa74beec8, L:/10.241.108.174:44766 - R:xxxx.blob.core.windows.net/20.150.43.196:443] Error was received while reading the incoming data. The connection will be closed.
! java.lang.StackOverflowError: null
! at reactor.core.publisher.Operators.reportThrowInSubscribe(Operators.java:204)
! at reactor.core.publisher.Flux.subscribe(Flux.java:8328)
! at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:418)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)
! at reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber.onNext(FluxConcatArray.java:176)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)
! at reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber.onNext(FluxConcatArray.java:176)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onNext$0(TracingSubscriber.java:42)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:63)
! at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onNext(TracingSubscriber.java:42)

To Reproduce
We have 12,843 blobs in the container under different parent folders (Azure Data Lake Storage V2). We list the blob with a parent folder path and the parent folder only contains 10 blobs.

We suspected this issue is related to the total blob count in a container. The listBlobsByHierarchy probably traverse all the blob files in a recursive way to find the blobs that match the path prefix. When there are too many files to traverse in a recursive way, it blows up the calling stack.

To reproduce, create 4 folders in a container, upload 5000 blobs to each of first three folders (total 15,000 blobs), upload 10 blobs to last folder. Then use java sdk to listBlobsByHierarchy for the last folder.

We clear all blobs from the container and run another test with 10 blobs in a folder. It still runs to the same error.

Code Snippet

PagedIterable<BlobItem> blobs = blobContainerClient.listBlobsByHierarchy(path);
return blobs.stream()
    .filter(blobItem -> blobItem.isPrefix() == null || !blobItem.isPrefix())
    .map(blobItem -> blobItem.getName().replaceFirst(path, ""))
    .collect(Collectors.toList());

Expected behavior
The method should return the list of blobs without error. Or please advice a better way to list all blobs in a folder.

Screenshots

Setup (please complete the following information):

  • OS: CentOS
  • IDE : [e.g. IntelliJ]
  • Version of the Library used: 12.6.1

Additional context

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Apr 9, 2021
@alzimmermsft alzimmermsft added the Azure.Core azure-core label Apr 12, 2021
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Apr 12, 2021
@alzimmermsft alzimmermsft added Client This issue points to a problem in the data-plane of the library. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Apr 12, 2021
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Apr 12, 2021
@alzimmermsft
Copy link
Member

Thank you for submitting this issue @zhoujia1974.

This issue is related to #12453 and should be resolved by #15646 and #15929 which are available to versions of azure-storage-blob 12.9.0 and above.

@zhoujia1974
Copy link
Author

@alzimmermsft Thank you for the fix. I'll test the new version. Quick question, what do you think could cause the same error for our second test after we removed all blobs from the container? Does azure data lake v2 caches the deleted files for sometime. We checked the config and verified it doesn't turn on soft delete. So why does it still throw StackOverflowError when there is only 10 blobs in the container?

@alzimmermsft
Copy link
Member

@zhoujia1974, for the 10 blob scenario that is a good question, does your project include a dependency on Jackson directly? Historically there have been a few cases where version skews of Jackson allowed for compilation due to API compatibility but had runtime changes that led to runaway states.

@zhoujia1974
Copy link
Author

@alzimmermsft This is our project's jackson dependency.

|    +--- com.fasterxml.jackson.core:jackson-core:2.10.5 -> 2.11.4

| | +--- com.fasterxml.jackson.core:jackson-annotations:2.10.5 -> 2.11.4
| | +--- com.fasterxml.jackson.core:jackson-databind:2.10.5 -> 2.11.4
| | | +--- com.fasterxml.jackson.core:jackson-annotations:2.11.4
| | | --- com.fasterxml.jackson.core:jackson-core:2.11.4
| | +--- com.fasterxml.jackson.datatype:jackson-datatype-guava:2.10.5
| | | +--- com.google.guava:guava:20.0 -> 30.0-jre ()
| | | +--- com.fasterxml.jackson.core:jackson-core:2.10.5 -> 2.11.4
| | | --- com.fasterxml.jackson.core:jackson-databind:2.10.5 -> 2.11.4 (
)
| | +--- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.10.5 -> 2.11.2
| | | +--- com.fasterxml.jackson.core:jackson-annotations:2.11.2 -> 2.11.4
| | | +--- com.fasterxml.jackson.core:jackson-core:2.11.2 -> 2.11.4
| | | --- com.fasterxml.jackson.core:jackson-databind:2.11.2 -> 2.11.4 ()
| | +--- com.fasterxml.jackson.datatype:jackson-datatype-jdk8:2.10.5
| | | +--- com.fasterxml.jackson.core:jackson-core:2.10.5 -> 2.11.4
| | | --- com.fasterxml.jackson.core:jackson-databind:2.10.5 -> 2.11.4 (
)
| | +--- com.fasterxml.jackson.module:jackson-module-parameter-names:2.10.5
| | | +--- com.fasterxml.jackson.core:jackson-core:2.10.5 -> 2.11.4
| | | --- com.fasterxml.jackson.core:jackson-databind:2.10.5 -> 2.11.4 ()
| | +--- com.fasterxml.jackson.module:jackson-module-afterburner:2.10.5
| | | +--- com.fasterxml.jackson.core:jackson-core:2.10.5 -> 2.11.4
| | | --- com.fasterxml.jackson.core:jackson-databind:2.10.5 -> 2.11.4 (
)
| | +--- com.fasterxml.jackson.datatype:jackson-datatype-joda:2.10.5
| | | +--- com.fasterxml.jackson.core:jackson-annotations:2.10.5 -> 2.11.4
| | | +--- com.fasterxml.jackson.core:jackson-core:2.10.5 -> 2.11.4
| | | +--- com.fasterxml.jackson.core:jackson-databind:2.10.5 -> 2.11.4 (*)
| | | --- joda-time:joda-time:2.9.9 -> 2.10.6

@alzimmermsft
Copy link
Member

Thank you for the dependency list @zhoujia1974. I'll take a look into what happens when Jackson 2.10 is used with the SDKs. Depending on which version of the SDKs being used the Jackson dependency we use is either 2.11 or 2.12.

Also, please let me know if upgrading to a later version of the library resolves this issue.

@zhoujia1974
Copy link
Author

zhoujia1974 commented Apr 16, 2021

@alzimmermsft We have some troubles to use the azure-storage-blob 12.9.0 version. This version causes jvm crash potentially from the call to upload blob. I have trouble to find exactly where the crash comes from. I'll try to upgrade to 12.10.2 to see it helps.

**A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f8919af0c70, pid=6, tid=452

JRE version: OpenJDK Runtime Environment (Zulu11.31+11-CA) (11.0.3+7) (build 11.0.3+7-LTS)
Java VM: OpenJDK 64-Bit Server VM (11.0.3+7-LTS, mixed mode, tiered, compressed oops, serial gc, linux-amd64)
Problematic frame:
C 0x00007f8919af0c70**

@alzimmermsft
Copy link
Member

@zhoujia1974, for the issues you are seeing with upload are you using BlobOutputStream? If so, you may seeing this issue: #20358

@zhoujia1974
Copy link
Author

@alzimmermsft The code is like below. We are using ByteArrayInputStream to serve the content to upload. We are currently stucked. With new sdk version, we got jmv crash issue. After go back to the 12.6.1 version, we got list blob stackoverflow issue. But the same 12.6.1 sdk used in production branch doesn't have the same listBlob issue even when the container has 13,000 blobs. For the development branch, we have minor library upgrade. But those changes don't look like relevent. Does the stackoverflow issue link to any specific condition or configuration in a blob storage?

containerClient = blobClient.getBlobContainerClient(containerName)
if (!containerClient.exists()) {
containerClient.create();
}
blobClient = containerClient.getBlobClient(blobName);
blobClient.upload(inputStream, length);

@zhoujia1974
Copy link
Author

@alzimmermsft Finally figured out what is the issue with the listBlobHierarchy api. We hit the bug reported in #9465 and #18881.
I'm still not understanding why we hit this bug since we are using com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.11.3. It's supposedly only broken with 2.12.0. But anyway, I added some additional check on continuationToken after getting each page of blobItems and break the loop when it is either null or empty string. This solved the issue.

Thanks,

@gapra-msft
Copy link
Member

Great to hear @zhoujia1974 I will go ahead and close this issue as it looks like it has been resolved. Feel free to reopen if you hit the issue again.

@alzimmermsft alzimmermsft added the dependency-issue-jackson Issue caused by dependency version mismatch with one of the Jackson libraries label Oct 6, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Azure.Core azure-core Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. dependency-issue-jackson Issue caused by dependency version mismatch with one of the Jackson libraries question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

3 participants