-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Caused by: java.io.IOException: java.nio.channels.ClosedChannelException #4958
Comments
Hi @skdfeitian ,Thanks for your issue! Could you provide more information for this problem?Such as code, env and so on. |
To add more information: After adding the parameter fs.hdfs.impl.disable.cache, the error Caused by: java.io.IOException: Filesystem closed seems to have disappeared. It appears that this parameter is very effective. However, the occasional occurrence of the java.nio.channels.ClosedChannelException error has not been resolved. I have nearly 200 Flink-related Paimon tasks, and 4 different types of tasks report this error. Some of them occur after running for dozens of days, but it is not very frequent. It feels like there might be an issue with the HDFS configuration, but we also have many real-time Flink tasks writing to HDFS, and we haven't encountered similar issues. For example: such as: java.io.IOException: Could not perform checkpoint 14784 for operator Writer : ods_analyser_realtime (1/20) such as: org.apache.flink.runtime.JobException: Recovery is suppressed by FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=5, backoffTimeMS=60000) |
Hi @xuzifu666 ,Can I have some of your time to help look at this?Thanks! relation issue. |
This maybe due to hdfs config exception when read snapshot file, could consider about this way: set fs.hdfs.impl.disable.cache=true |
@skdfeitian Please try it again. |
Hi , @xuzifu666 |
It is not related to paimon, maybe you can communicate with hdfs engineer in your company to improve it. @skdfeitian |
@xuzifu666 @yangjf2019 thanks |
Search before asking
Paimon version
paimon version 0.8.2
Compute Engine
flink version 1.17.1
Minimal reproduce step
java.lang.Exception: Could not perform checkpoint 18073 for operator Source: bg_action_source[1] -> Calc[2] -> Map -> Writer : dwd_new_user_detail_realtime (1/20)#5.
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointAsyncInMailbox(StreamTask.java:1184)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$13(StreamTask.java:1131)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMail(MailboxProcessor.java:398)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:367)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:352)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:229)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.nio.channels.ClosedChannelException
at org.apache.paimon.flink.sink.StoreSinkWriteImpl.prepareCommit(StoreSinkWriteImpl.java:220)
at org.apache.paimon.flink.sink.TableWriteOperator.prepareCommit(TableWriteOperator.java:121)
at org.apache.paimon.flink.sink.RowDataStoreWriteOperator.prepareCommit(RowDataStoreWriteOperator.java:189)
at org.apache.paimon.flink.sink.PrepareCommitOperator.emitCommittables(PrepareCommitOperator.java:100)
at org.apache.paimon.flink.sink.PrepareCommitOperator.prepareSnapshotPreBarrier(PrepareCommitOperator.java:80)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.prepareSnapshotPreBarrier(RegularOperatorChain.java:89)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:321)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$14(StreamTask.java:1299)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1287)
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointAsyncInMailbox(StreamTask.java:1172)
... 14 more
Caused by: java.nio.channels.ClosedChannelException
at org.apache.hadoop.hdfs.ExceptionLastSeen.throwException4Close(ExceptionLastSeen.java:73)
at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:158)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:106)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:62)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.paimon.fs.hadoop.HadoopFileIO$HadoopPositionOutputStream.write(HadoopFileIO.java:300)
at org.apache.paimon.format.parquet.writer.PositionOutputStreamAdapter.write(PositionOutputStreamAdapter.java:54)
at java.io.OutputStream.write(OutputStream.java:75)
at org.apache.paimon.shade.org.apache.parquet.bytes.ConcatenatingByteArrayCollector.writeAllTo(ConcatenatingByteArrayCollector.java:46)
at org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetFileWriter.writeColumnChunk(ParquetFileWriter.java:903)
at org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetFileWriter.writeColumnChunk(ParquetFileWriter.java:848)
at org.apache.paimon.shade.org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:310)
at org.apache.paimon.shade.org.apache.parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:458)
at org.apache.paimon.shade.org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:186)
at org.apache.paimon.shade.org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:124)
at org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:319)
at org.apache.paimon.format.parquet.writer.ParquetBulkWriter.finish(ParquetBulkWriter.java:57)
at org.apache.paimon.io.SingleFileWriter.close(SingleFileWriter.java:144)
at org.apache.paimon.io.RowDataFileWriter.close(RowDataFileWriter.java:95)
at org.apache.paimon.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:107)
at org.apache.paimon.io.RollingFileWriter.close(RollingFileWriter.java:144)
at org.apache.paimon.append.AppendOnlyWriter$DirectSinkWriter.flush(AppendOnlyWriter.java:365)
at org.apache.paimon.append.AppendOnlyWriter.flush(AppendOnlyWriter.java:195)
at org.apache.paimon.append.AppendOnlyWriter.prepareCommit(AppendOnlyWriter.java:183)
at org.apache.paimon.operation.AbstractFileStoreWrite.prepareCommit(AbstractFileStoreWrite.java:198)
at org.apache.paimon.table.sink.TableWriteImpl.prepareCommit(TableWriteImpl.java:207)
at org.apache.paimon.flink.sink.StoreSinkWriteImpl.prepareCommit(StoreSinkWriteImpl.java:215)
... 24 more
What doesn't meet your expectations?
#3678
I referred to this link and set fs.hdfs.impl.disable.cache=true, but now I occasionally encounter the error above. How should I resolve it?
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: