Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Connect: Add kerberos authentication option #10173

Closed
wants to merge 0 commits into from

Conversation

Dawnpool
Copy link

Hello,
I am making the same PR as the one in the original repository because I was told that it is being moved to this core repository.
I was going to use this connector for hdfs, but I have noticed it does not support Kerberos authentication.
To address this, I have implemented Kerberos authentication functionality along with related options.
I have used the existing hdfs sink connector code as a reference, which already does support Kerberos authentication. (document link)

An example config would look like:

iceberg.hdfs.authentication.kerberos: true,
iceberg.connect.hdfs.principal: "[email protected]",
iceberg.connect.hdfs.keytab: "/tmp/user.keytab",
kerberos.ticket.renew.period.ms: 3600000

It seems that the code hasn't been fully migrated to this core repository yet and I am aware of there should be further tasks such as updating README and adding test code, but I want to share the core idea with you first for initial feedback.
Please take a look. Thank you.

@@ -43,6 +43,7 @@ project(":iceberg-kafka-connect:iceberg-kafka-connect") {
implementation "com.fasterxml.jackson.core:jackson-core"
implementation "com.fasterxml.jackson.core:jackson-databind"
implementation libs.avro.avro
implementation libs.hadoop3.client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not introduce a Hadoop dependency to the core library.

@@ -28,6 +28,8 @@
import java.util.Set;
import java.util.UUID;
import java.util.stream.Collectors;
import org.apache.hadoop.conf.Configuration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to avoid dependencies on Hadoop. Take a look at how we initialize the Hadoop config in Utilities using reflection.

@Dawnpool
Copy link
Author

@bryanck
Hi, I have removed Hadoop dependencies and applied reflection to dynamically load UserGroupInformation class and methods, as you guided. Please take a look at the changes.
I also want to test the built code in Kafka Connect standalone mode, but it seems this repository doesn't provide the zip archive like the original repository does.
Do you have any guides that I can follow to test built code?

@Dawnpool
Copy link
Author

@bryanck
Hi, I am mentioning you in case you have forgot this PR. Please take a look and let me know if there is anything more to fix.

@bryanck
Copy link
Contributor

bryanck commented Jun 16, 2024

Thanks for the submission @Dawnpool , one question I have is if we can use the existing facility for setting Hadoop properties, i.e. via the iceberg.hadoop.* config?

@Dawnpool
Copy link
Author

Dawnpool commented Jun 21, 2024

@bryanck
If I got it right, you’re asking if we can add those configs for Kerberos authentication via Hadoop config. I found there are Kerberos options for service principals like dfs.namenode.kerberos.principal in hdfs config, but these don’t apply to user principals. So, as far as I know, we can't use iceberg.hadoop.* settings for user's Kerberos authentication.

You can see the HDFS sink connector also does accept configs for Kerberos authentication separately, even though it has hadoop.conf.dir config.

Feel free to ask if you have any more questions :)

@Dawnpool
Copy link
Author

Hi @bryanck ,
Just a gentle reminder to take a look at my comment when you get a chance. Thank you!

@bryanck
Copy link
Contributor

bryanck commented Jul 17, 2024

@Dawnpool I'd prefer to keep Hadoop/HDFS specific code out of the sink, beyond loading generic properties as we're doing now. IIRC you can set the user principal via an environment variable. I have this branch I'm working on to add in the runtime distribution if you want to try that to test. You might want to investigate how the Flink sink handles this without special properties. If it turns out we do need it, I feel like it might belong in HadoopFileIO, rather than here.

@Dawnpool
Copy link
Author

Dawnpool commented Aug 6, 2024

Hi @bryanck
I did some research and I couldn't really find a way to set the user principal via an environment variable. Could you please provide more detailed information about that?
Also, while Flink does have its own options for Kerberos authentication, Kafka Connect does not. That's why I think the connector itself should have options for Kerberos authentication.

@bryanck
Copy link
Contributor

bryanck commented Oct 16, 2024

@Dawnpool sorry for the delayed reply. With the Flink Iceberg sink, I don't believe there are kerberos-specific options, if you could point those out that would be helpful.

@Dawnpool
Copy link
Author

@bryanck
Flink itself has options for Kerberos authentication, not the Flink Iceberg sink. Here is the reference link to the official documentation you can check.
As far as I know, Kafka Connect itself does not have those kinds of options since it is a plugin-based architecture where you plug connectors to it.
I believe this is why the connector itself should handle these options as the HDFS sink connector exactly does.
I would be glad if you could reconsider adding this new feature because I really need it for a HDFS system that requires Kerberos authentication.
Feel free to ask if you have further questions or concerns.

Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Nov 17, 2024
@Dawnpool
Copy link
Author

@bryanck
Hi, just a gentle reminder in case you missed. Please take a look when you get a chance.

@github-actions github-actions bot removed the stale label Nov 19, 2024
@juanluhidalgo
Copy link

Hi @Dawnpool I think that the branch has conflicts, anyway @bryanck could you take a look at this PR?

Thanks in advance.

@Dawnpool
Copy link
Author

Hi @juanluhidalgo
Sorry for the late reply, and appreciate your interest in this functionality. (I have also checked your email.)
It's been a while since I last made changes, so let me have some time to figure out what's changed in the main branch and I'll resolve the conflict ASAP. Thank you.

@Dawnpool
Copy link
Author

@bryanck @juanluhidalgo
The PR has been closed for some reason as I was trying to update my main branch I forked.
I will post a new PR.

@juanluhidalgo
Copy link

@Dawnpool if you can share the new PR link once it's created it would be great.
Thanks in advance!

@Dawnpool
Copy link
Author

@juanluhidalgo
Just created a new PR here.
#12119
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants