Apache Spark extension filter to enable Oauth2/OpenID Connect based authentication for Spark UIs and Spark History.
The project consists of two main components:
- Authentication filter: Authenticates the user against an Oauth2/OIDC provider by implementing the Authorization Code grant flow. The filter supports all the providers compliant with Oauth2 and OpenID standards.
- Authorization provider: An additional optional layer, on top of the- Authentication filter, authorizes Spark UI/History UI user access by comparing the user email and/or groups and/or roles returned by the Oauth2/OIDC provider during the authentication phase with the configured spark ACLs
| Authorization Grant | Support | Description | 
|---|---|---|
| Authorization Code | ✔️ | Confidential clients (server side apps/trusted environments) - Authorization Code standard flow. | 
| Authorization Code + PKCE extension | ✔️ | Confidential clients (Server side apps/trusted environments) - Authorization Code flow with PKCE extension for Confidential clients. OAuth 2.1 requirement. | 
| Authorization Code + PKCE extension | ✔️ | Public clients (SPAs/Native apps/Untrusted environments use cases) - Authorization Code flow with PKCE extension for Public clients | 
The following authorization grants are not suitable, hence not supported:
| Authorization Grant | Support | Description | 
|---|---|---|
| Implicit | NA | Deprecated (replaced by Authorization Code + PKCE extension for public clients). | 
| Resource Owner Password Credentials | NA | Not suitable. | 
| Client Credentials | NA | Not suitable. | 
The different releases are published to Maven Central Repository.
Please, check the latest release note and download the latest version from Maven Central Repository.
- Using Docker
Spark 3.x (default jar):
ADD https://repo1.maven.org/maven2/io/okdp/okdp-spark-auth-filter/1.4.2/okdp-spark-auth-filter-1.4.2.jar ${SPARK_HOME}/jarsSpark 4+ (jakarta jar):
ADD https://repo1.maven.org/maven2/io/okdp/okdp-spark-auth-filter/1.4.2/okdp-spark-auth-filter-1.4.2-jakarta.jar ${SPARK_HOME}/jars- Using Maven
<dependency>
  <groupId>io.okdp</groupId>
  <artifactId>okdp-spark-auth-filter</artifactId>
  <version>1.4.2</version>
  <!-- Spark 4+: Use jakarta classifier -->
  <!-- <classifier>jakarta</classifier> -->
</dependency>- Spark on Yarn/Standalone mode
Spark 3.x (default jar):
Copy the jar https://repo1.maven.org/maven2/io/okdp/okdp-spark-auth-filter/1.4.2/okdp-spark-auth-filter-1.4.2.jar into ${SPARK_HOME}/jars/ in the different spark nodes
Spark 4+ (jakarta jar):
Copy the jar https://repo1.maven.org/maven2/io/okdp/okdp-spark-auth-filter/1.4.2/okdp-spark-auth-filter-1.4.2-jakarta.jar into ${SPARK_HOME}/jars/ in the different spark nodes
Create an Oauth2/OIDC client with an Authorization Code grant flow (confidential client).
Set the redirect URL to a valid spark UI or Spark History UI home page.
For keycloak:
- Confidential Clients:
- Access Type: Confidential
- Standard Flow Enabled: Enabled
- Implicit Flow Enabled: Disabled
- Direct Access Grants Enabled: Disabled
 
- Access Type: 
- Public Clients:
- Access Type: Public
- Standard Flow Enabled: Enabled
- Implicit Flow Enabled: Disabled
- Direct Access Grants Enabled: Disabled
 
- Access Type: 
Once done, save the client_id and client_secret (Confidential clients only) into your secret management vault.
The filter relies on the spark spark.ui.filters configuration property.
| Property | Equivalent env variable | Default | Description | 
|---|---|---|---|
| issuer-uri | AUTH_ISSUER_URI | - | OIDC Provider issuer URL This is used to discover OIDC endpoints | 
| client-id | AUTH_CLIENT_ID | - | The Oauth2/OIDC client Id | 
| client-secret | AUTH_CLIENT_SECRET | - | The Oauth2/OIDC client secret * Mandatory for Confidential Clients. * Optional for Public clients. | 
| redirect-uri | AUTH_REDIRECT_URI | - | Spark UI/History home page ex.: https://spark-history.example.com/home | 
| scope | AUTH_SCOPE | - | The scope(s) requested by the Authorization Request. Example: openid+profile+email+roles+offline_access | 
| use-pkce | AUTH_USE_PKCE | auto | * true: Force the usage of PKCE (The OIDC provider should support it).* false: Disable the usage of PKCE for confidential clients.* auto: Detect if OIDC provider supports PKCE and use it, otherwise use Authorization Code standard flow. | 
| use-id-token | AUTH_USE_IDTOKEN | false | * false: consume claims fromaccessToken.* true: consume claims fromidToken | 
| cookie-max-age-minutes | AUTH_COOKE_MAX_AGE_MINUTES | 12 * 60 | The maximum spark-cookie cookie duration in minutes | 
| cookie-cipher-secret-key | AUTH_COOKIE_ENCRYPTION_KEY | - | Cookie encryption key Can be generated using: openssl enc -aes-128-cbc -k <PASS PHRASE> -P -md sha1 -pbkdf2 | 
| cookie-is-secure | AUTH_COOKE_IS_SECURE | true | When enabled, the cookie is transmitted over a secure connection only (HTTPS). Disable the option if your run with a non secure connection (HTTP) | 
| user-id | AUTH_USER_ID | * email: set the id seen by spark acls as the email filled in the access token.* sub: set the id seen by spark acls as the sub filled in the access.* google: set the id to the sub sent by google but remove the prefix 'account.google.com:'. | |
| jwt-header | JWT_HEADER | jwt_token | Header that may contain the JWT Token that will be used for authentication. If not present, it will fall back with the default autentication workflow with a redirection on the login page. | 
| jwt-header-signing-alg | JWT_HEADER_SIGNING_ALG | RS256, ES256 | Signature algorithm used to verify the JWT Token provided. | 
| jwt-header-issuer | JWT_HEADER_ISSUER | issuer-uri from well known configuration | Issuer if different from the default issuer uri retrieved from the well known configuration fetched with 'issuer-uri' parameter. | 
| jwt-header-jwks-uri | JWT_HEADER_JWKS_URI | jwks uri from well known configuration | JWKS URI used to retrieve the key needed to verify the JWT token signature. By default will use the JWKS URI filled in the well known configuration fetched with 'issuer-uri' parameter. | 
| jwt-extra-group-claim | JWT_EXTRA_GROUP_CLAIM | - | if not empty, the groups will be extended with this new claim, this claim must be a string array. | 
| ignore-refresh-token | IGNORE_REFRESH_TOKEN | false | * true: Ignore refresh token storage in the cookie (Prevent exceeding the cookie size limit).* false: Store the refresh token in the cookie. | 
Note
- 
issuer-uriproperty orAUTH_ISSUER_URIenv variableTry to access the endpoint <issuer-uri>/.well-known/openid-configuration(public access) to check if theissuer-uriis valid.This should return the different authentication endpoints (authorization, access token, user info endpoints, supported scopes etc.). For keycloack, the default issuer-uriis athttps://<keycloak.example.com>/auth/realms/master/andhttps://<keycloak.example.com>/auth/realms/master/.well-known/openid-configurationis the well known configuration endpoint.
- 
cookie-cipher-secret-keyproperty orAUTH_COOKIE_ENCRYPTION_KEYenv variableGenerate the cookie encryption key by issuing the command: openssl enc -aes-128-cbc -k <YOUR_PASS_PHRASE> -P -md sha1 -pbkdf2 
- 
scopeproperty orAUTH_SCOPEenv variableThe minimum required scope to turn on authentication is: openid+profile+emailAdd offline scope offline_accessto enable the refresh tokenAdd the rolesand/orgroupsscope to enable role/groups based authorizationIt is not necessary to add groups and/or roles scopeif you only need the authentication, and you don't need the authorization.N.B.: Please, note that the groupsscope is not supported by the most OIDC providers. You can check the supported scopes at<issuer-uri>/.well-known/openid-configurationurl.N.B.: Keycloack supports returning the groups for a user by adding Group Membership mapperto your client
- 
use-pkceproperty orAUTH_USE_PKCEenv variableThe default value is autoto automatically detect if the OIDC provider supports PKCE.When the OIDC provider supports PKCE, the filter uses the PKCE flow automatically. Other values: trueorfalseto enforce the property manually.N.B.: The OIDC provider must support PKCE extension in order to use Public Clients. 
- 
cookie-is-secureproperty orAUTH_COOKE_IS_SECUREenv variableIt's recommended to secure the connection to your spark UIs by enabling HTTPS. Although, the spark cookie is encrypted, it's recommended to send it over an encrypted connection. By default, the property is enabled. Disable the property if your connection is not secure otherwise the cookie will not be sent 
The filter can be enabled either by setting the properties globally in the spark-defaults.properties
spark.ui.filters=io.okdp.spark.authc.OidcAuthFilter
spark.io.okdp.spark.authc.OidcAuthFilter.param.issuer-uri=<issuer-uri>
spark.io.okdp.spark.authc.OidcAuthFilter.param.client-id=<client-id>
# Comment this line if your client-id is public and your OIDC provider support PKCE
spark.io.okdp.spark.authc.OidcAuthFilter.param.client-secret=<client-secret>
spark.io.okdp.spark.authc.OidcAuthFilter.param.redirect-uri=<redirect-uri>
spark.io.okdp.spark.authc.OidcAuthFilter.param.scope=<scope>
# Keep the default value 'auto'
# spark.io.okdp.spark.authc.OidcAuthFilter.param.use-pkce=<true|false|auto>
spark.io.okdp.spark.authc.OidcAuthFilter.param.cookie-max-age-minutes=480
spark.io.okdp.spark.authc.OidcAuthFilter.param.cookie-cipher-secret-key=<cookie-cipher-secret-key>
spark.io.okdp.spark.authc.OidcAuthFilter.param.cookie-is-secure=<true|false>
spark.io.okdp.spark.authc.OidcAuthFilter.param.user-id=<sub|email>Or during the job submission like the following:
spark-submit  --conf spark.ui.filters=io.okdp.spark.authc.OidcAuthFilter \
--conf spark.io.okdp.spark.authc.OidcAuthFilter.param.issuer-uri=<issuer-uri> \
--conf spark.io.okdp.spark.authc.OidcAuthFilter.param.client-id=<client-id>         \
--conf spark.io.okdp.spark.authc.OidcAuthFilter.param.client-secret=<client-secret> \
--conf spark.io.okdp.spark.authc.OidcAuthFilter.param.redirect-uri=<redirect-uri>   \
--conf spark.io.okdp.spark.authc.OidcAuthFilter.param.scope=<scope>                 \
--conf spark.io.okdp.spark.authc.OidcAuthFilter.param.cookie-max-age-minutes=480    \
--conf spark.io.okdp.spark.authc.OidcAuthFilter.param.cookie-cipher-secret-key=<cookie-cipher-secret-key> \
--conf spark.io.okdp.spark.authc.OidcAuthFilter.param.cookie-is-secure=<true|false>  \
--conf spark.io.okdp.spark.authc.OidcAuthFilter.param.user-id=<sub|email>  \
--class ...Remove the following configuration if your client id is public and your OIDC provider supports PKCE:
--conf spark.io.okdp.spark.authc.OidcAuthFilter.param.client-secret=<client-secret>The properties can also be passed by their equivalent env variables.
You can save the client_id, client secret and the cookie encryption key in a kubernetes secret and reference it as an env variable like the following:
env:
- name: AUTH_ISSUER_URI
  value: <issuer-uri>
- name: AUTH_CLIENT_ID
  valueFrom:
  secretKeyRef:
- name: AUTH_REDIRECT_URI
  value: <redirect-uri>
- name: AUTH_SCOPE
  value: openid+profile+email+roles+offline_access
# Keep the default value of AUTH_USE_PKCE as auto, other values: true, false
#- name: AUTH_USE_PKCE
#  value: auto
# Remove the AUTH_CLIENT_SECRET if your client id is public and your OIDC provider supports PKCE
- name: AUTH_CLIENT_SECRET
  valueFrom:
  secretKeyRef:
- name: AUTH_COOKIE_ENCRYPTION_KEY
  valueFrom:
  secretKeyRef:An example of a raw access token returned by the Oauth2/OIDC provider during a successful authentication is like the following:
{
   "access_token": "eyJhbGciOiJI6-auxZsE6...",
   "token_type": "bearer",
   "expires_in": 86399,
   "refresh_token": "ChlvaWJmNXBuaG1rdWN0e...",
   "id_token": "eyJhbGciOiJSUzI1NiIsImtpZCI6IjBkZWEw..."
}The token payload after decoding the access_token is as follows:
{
   "iss": "<issuer-uri>",
   "sub": "CgNib2ISBGxkYXA",
   "aud": "<client-id>",
   "exp": 1708476719,
   "iat": 1708390319,
   "at_hash": "x_kKHrjGfnSfkjDwIGPPbg",
   "email": "[email protected]",
   "email_verified": true,
   "groups": ["admins", "team1", "/team2"],
   "roles": ["role-team1", "admin-role"],
   "name": "bob"
}The "email", "groups" and/or "roles" can be mapped in Spark ACLs to consequently grant or denies access.
A basic configuration properties to enable the provider globally, in spark-defaults.conf, are:
spark.user.groups.mapping=io.okdp.spark.authz.OidcGroupMappingServiceProvider
spark.acls.enable=true
spark.history.ui.acls.enable=true
# Comma separated list of admin groups (view all applications)
spark.history.ui.admin.acls.groups=admins,team1These properties should be set before the spark history starts.
You can also decide globally which users, roles and/or groups you grant access to your applications individually in spark history UI
or your spark ui by adding the properties in spark-defaults.conf:
Select the properties to enable the authorization for:
#Comma separated list of groups
spark.admin.acls.groups=admins,admin-role
spark.modify.acls.groups=team1
spark.ui.view.acls.groups=/team2,role-team1
# Comma separated list of users
spark.admin.acls[email protected]
spark.modify.acls[email protected]
spark.ui.view.acls[email protected],[email protected]Or at spark job submission time (select the properties to enable):
spark-submit -conf spark.admin.acls.groups=admins,admin-role \
--conf spark.modify.acls.groups=team1 \
--conf spark.ui.view.acls.groups=/team2,role-team1 \
--conf [email protected] \
--conf [email protected] \
--conf [email protected],[email protected] \
  ...Add the following entry to your /etc/hosts:
127.0.0.1       keycloak
mvn clean package
docker-compose up --buildmvn clean package
PROFILE=Jakarta docker-compose up --buildBrowse to: http://localhost:18080/
| User | Password | Group | 
|---|---|---|
| dev1 | user | developers | 
| dev2 | user | developers | 
| view1 | user | viewers | 
| adm1 | user | admins | 
The Spark Auth Filter relies on a local cookie for authentication. Remove both the OKDP_AUTH_SPARK_UI cookie and the local Keycloak cookie from your local browser at each re-login.
For more details, check docker-compose.yml, env and local setup
docker-compose rm -fThe filter is designed to address a basic use cases where you don't need to deploy extra components in order to secure your Spark History UIs.
The filter can also be used to secure Spark UIs, but note that, in real world kubernetes integration, each spark application submission creates its own ingress endpoint. With hundreds of running spark applications, it becomes very difficult to track all the endpoints and configure them.
Another limitation, is that depending on the oidc provider, the number of redirect URIs per oidc client can be limited and the usage of URIs pattern is also prohibited by OAuth 2.1.
A new central portal UI is under development to simplify dynamic discovery, log/monitoring tracking and provides shortcuts to easily navigate and filter the spark applications.
