Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Connect: Add kerberos authentication option #12119

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Dawnpool
Copy link

Hi, I previously submitted a PR for the same feature, but I’m reposting it since the previous one was closed for some reason. You can find the full history in the original PR.
To explain the background again, I wanted to use the Iceberg connector in an HDFS environment that requires Kerberos authentication, but I couldn’t because it doesn’t support it now. I’ve made some changes to the connect part to have an option for Kerberos authentication, referring to the HDFS sink connector which already supports it.

An example config would look like:

iceberg.hdfs.authentication.kerberos: true,
iceberg.connect.hdfs.principal: "[email protected]",
iceberg.connect.hdfs.keytab: "/tmp/user.keytab",
kerberos.ticket.renew.period.ms: 3600000

One more thing, as you can see in the original PR, it wasn't accepted for quite a long time. I think I gave pretty convincing reasons why the connector itself should have this option since there's no way to inject it through the Hadoop configuration.

If you still think something is missing or have any concerns about this feature, please feel free to share. Thank you.

@juanluhidalgo
Copy link

@bryanck could you take a look at this?

Thanks in advance.

Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. Do you think it would make sense to add a test (using a mock) or itest (with docker) ?

@Dawnpool
Copy link
Author

Hi @jbonofre
Absolutely! I'll try to add test code once the maintainers agree on this feature.

@bryanck
Copy link
Contributor

bryanck commented Jan 29, 2025

If possible, I would like to avoid adding Hadoop/HDFS specific configuration parameters. Could you research how this is avoided in the Iceberg Flink sink? We already have an option to set the Hadoop config directory and load the config from that. Also, if this really is needed, we should consider adding this someplace where it can be reused by other integrations.

@Dawnpool
Copy link
Author

Hi @bryanck ,
As mentioned in the original PR, Flink has its own security configuration for Kerberos authentication . Here is the reference link to the official documentation you can check.

AFAIK you can't set the user principal just via the Hadoop/HDFS specific configuration parameters. This seems to be why Flink and the HDFS sink connector have Kerberos configuration separately.

@bryanck
Copy link
Contributor

bryanck commented Jan 30, 2025

As mentioned in the original PR, Flink has its own security configuration for Kerberos authentication

I believe what you are referring to is not related to the Iceberg Flink sink, which uses Iceberg's FileIO. My question was more centered around Iceberg, i.e. is this a feature that is needed by HadoopFileIO so all engines can benefit, and if so, we should add the configuration there.

@Dawnpool
Copy link
Author

Dawnpool commented Jan 31, 2025

@bryanck
I believe what I'm referring to is related to the Iceberg Flink sink, which uses Kerberos-authenticated HDFS as its FileIO.
I'm actually using Flink with Kerberos-authenticated HDFS, and that's how I set Kerberos configuration.

I understand your point that this feature should be included in HadoopFileIO part so all engines can benefit from it, but at the same time I'm not sure that would be really necessary, since Flink or other engines like Spark already have their own ways of setting the Kerberos configuration, regardless of Iceberg.

My opinion is these kinds of settings should be under each engine because each engine would have a different way of setting it.

@nastra
Hi, I saw your previous comment about Kerberos settings for Iceberg Flink. Could you please share your thoughts on this feature if you have a chance?

@bryanck
Copy link
Contributor

bryanck commented Jan 31, 2025

Thanks @Dawnpool for looking into it. I'm interested to hear what others think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants