Format-preserving encryption for Databricks — Spark UDF powered by Cyphera.
Built on io.cyphera:cyphera from Maven Central.
This integration requires a Databricks workspace. See below for deployment instructions.
mvn package -DskipTestsProduces target/cyphera-databricks-0.1.0.jar (fat JAR with all dependencies).
docker build -t cyphera-databricks .Upload target/cyphera-databricks-0.1.0.jar as a cluster library in the Databricks workspace.
PUT 'target/cyphera-databricks-0.1.0.jar' INTO '/Volumes/catalog/schema/jars/';In a Databricks notebook:
spark._jvm.io.cyphera.databricks.CypheraRegistrar.registerAll(spark._jsparkSession)Or in Scala:
io.cyphera.databricks.CypheraRegistrar.registerAll(spark)Place cyphera.json at /etc/cyphera/cyphera.json on the cluster, or set the CYPHERA_CONFIGURATION_FILE environment variable in the cluster configuration.
-- Protect with a named configuration
SELECT cyphera_protect('ssn', '123-45-6789');
-- → 'T01i6J-xF-07pX' (header-prefixed, dashes preserved)
-- Access — the embedded header tells Cyphera which configuration to use
SELECT cyphera_access(cyphera_protect('ssn', '123-45-6789'));
-- → '123-45-6789'
-- Bulk protect
SELECT name, cyphera_protect('ssn', ssn) AS protected_ssn
FROM customers;- Configuration file:
/etc/cyphera/cyphera.jsonorCYPHERA_CONFIGURATION_FILEenv var - Set env var in Databricks cluster Spark configuration
- Configuration loaded on first UDF call — restart cluster to reload
- UDF errors surface as Spark task failures
- Check cluster driver logs for
CypheraLoaderentries
- Build a new JAR with the updated SDK version
- Replace the cluster library or volume JAR
- Restart the cluster
- UDF not found —
CypheraRegistrar.registerAll(spark)not called, or JAR not attached to cluster - "Unknown configuration" — check that cyphera.json is accessible from the cluster
- ClassNotFoundException — JAR not on the classpath, re-upload and restart
{
"configurations": {
"ssn": { "engine": "ff1", "key_ref": "demo-key", "header": "T01" },
"credit_card": { "engine": "ff1", "key_ref": "demo-key", "header": "T02" }
},
"keys": {
"demo-key": { "material": "2B7E151628AED2A6ABF7158809CF4F3C" }
}
}- Unity Catalog function registration (SQL-based, no notebook needed)
- Delta Lake integration (encrypt on write, access on read)
- Init script for automatic UDF registration on cluster start
Apache 2.0 — Copyright 2026 Horizon Digital Engineering LLC