You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 8, 2023. It is now read-only.
Talend Cloud lets you install and host Talend Data Preparation application on premises. This setup allows you to store sensitive data behind your firewall, while still managing your users and the rest of your platform from Talend Cloud.
Parameter
Description
Value
tdp_hybrid_mode
Hybrid mode (yes or no)
Default value: no
tdp_hybrid_region
The region to use (us, eu, ap, au, us-west or at)
Default value: us
For hybrid mode, it is important to set up also these 2 variables:
Parameter
Description
tdp_security_oauth2_client_clientId
Client ID for your account (retrieved from Talend Management Console)
tdp_security_oauth2_client_clientSecret
Client Secret for your account (retrieved from Talend Management Console)
Note: in non-hybrid mode, these two variables should be set to Talend IAM OIDC client identifier and client secret.
Language
Talend Data Preparation supports the following languages: English (en-US), French (fr-FR), Japanese (jp-JP) and Chinese (zh-CN). This setting is ignored in Hybrid configuration as in this case language setting is stored in the cloud account configuration.
Parameter
Description
Value
tdp_language
Language (en-US, fr-FR, jp-JP or zh-CN
Default value: en-US
IAM configuration
Parameter
Description
Value
tdp_iam_ip
Host IP address of IAM server
Default value: localhost
Event propagation
Parameter
Description
Value
tdp_dataprep_event_listener
Mechanism used for event propagation
Possible values: spring or kafka
Live datasets
The live dataset feature allows creating a job in Talend Studio, executing it on demand in the Talend Administration Center, and retrieving a dataset with the sample data directly in Talend Data Preparation.
Parameter
Description
Value
tdp_live_dataset_location
Location of source
Possible values: tac or tic
tdp_live_dataset_url
URL to the Talend Administration Center instance used to list execution tasks as dataset sources
tdp_live_dataset_task_prefix
Prefix used to list Talend Administration Center tasks in the Talend Data Preparation interface and create live datasets. Only the tasks with this prefix will be listed when importing data with the Talend Job option.
Default value: dataprep_
MongoDB connection settings
Parameter
Description
Value
tdp_mongodb_host
Host of MongoDB
Default value: localhost
tdp_mongodb_port
Port of MongoDB
Default value: 27017
tdp_mongodb_database
MongoDB database
Default value: dataprep
tdp_mongodb_user
User name of MongoDB user
Default value: dataprep-user
tdp_mongodb_password
Password of MongoDB user
tdp_multi_tenancy_mongodb_active
Whether MongoDB is set up in multi-tenancy mode
Possible values: false (default) or true
Authentication parameters
Parameter
Description
Value
tdp_security_provider
TDP security provider
Only possible value: oauth2
tdp_security_token_secret
Secret used to sign tokens
tdp_security_token_renew_after
Do not modify
Only possible value: 30
tdp_security_token_invalid_after
Session timeout (in seconds)
Default value: 3600
Spring framework settings
Parameter
Description
Value
tdp_spring_profiles_active
Active Spring profile
Default value: server-standalone Do not modify unless instructed by Talend
tdp_spring_http_multipart_maxFileSize
Maximum file size (in bytes) to transfer via HTTP
Default value: 200000000 (200 MB)
tdp_spring_http_multipart_maxRequestSize
Maximum request size (in bytes) to transfer via HTTP
Default value: 200000000 (200 MB)
Dataset limits
Parameter
Description
Value
tdp_dataset_records_limit
Maximum number of records in datasets. Additional records are truncated.
Default value: 10000
tdp_dataset_local_file_size_limit
Maximum file size of a locally stored dataset
Default value: 2000000000
tdp_dataset_imports
List of datasources available for the "dataset import" action
Cache management (location for cache and content storage)
Parameter
Description
Value
tdp_content_service_store
Content service store
Only possible value: local
tdp_content_service_store_local_path
Path to store cache content
Default value: data/
Preparation service configuration
Parameter
Description
Value
tdp_preparation_store_remove_hours
Time (in hours) to keep preparation in storage
Default value: 24
Lock on preparations (see documentation for details)
Parameter
Description
Value
tdp_lock_preparation_store
Lock store for preparations
Possible values: none or mongodb
tdp_lock_preparation_delay
Delay in seconds
Default value: 600 (10 minutes)
Lucene index configuration
Parameter
Description
Value
tdp_luceneIndexStrategy
Lucene index strategy
Possible values: singleton (default) or basic Do not modify unless instructed by Talend
Parameters for asynchronous full run and sampling operations
Parameter
Description
Value
tdp_execution_store
Storage for async operations
Possible values: mongodb (default), in-memory, remote Do not modify unless instructed by Talend
tdp_async_operation_concurrent_run
Maximum allowed concurrent runs. If there are more full runs than this parameter's value running in parallel, the remaining operations will be queued and resumed when there is an available slot. This value can be increased, according to the host capacity
Default value: 5
Components Catalog configuration properties
Parameter
Description
Value
tdp_tcomp_server_url
URL of the server hosting the Components Catalog, used to configure self service connectors
Logging level for org.talend.dataprep.configuration
Default value: INFO
tdp_logging_level_org_talend_dataquality_semantic
Logging level for org.talend.dataquality.semantic
Default value: INFO
Audit logging configuration parameters
Parameter
Description
Value
tdp_audit_log_enabled
Whether to enable audit log
Possible values: true (default) or false
tdp_talend_logging_audit_config
Configuration file of audit log
Default value: config/audit.properties
CSV export settings
Parameter
Description
Value
tdp_default_text_enclosure
Default enclosure character
Default value: "
tdp_default_text_escape
Default escape character
Default value: "
tdp_default_text_encoding
Default encoding
Default value: UTF-8
CSV import settings
Parameter
Description
Value
tdp_default_import_text_enclosure
Default enclosure character
Default value: "
tdp_default_import_text_escape
Default escape character
Default value: empty value
Dataset service provider setting
Parameter
Description
Value
tdp_dataset_service_provider
Service provider of datasets Data Preparation version 7.2 only supports legacy (embedded dataset service provider).
In higher versions, catalog is also supported, in which case "data catalog" or "TMC" is used as the dataset service provider
Only possible value: legacy
Extra variables
Parameter
Description
Value
tdp_async_runtime_contextPath
Runtime context path for async operations
Default value: /api
tdp_spring_mvc_async_request_timeout
Timeout (in milliseconds) for async executions
Default value: 600000 This value may need to be increased for large datasets
Logging properties
Parameter
Description
Value
tdp_root_logger
Logger name (SLF4J) prefix added to the event category
Default value: audit
tdp_backend
Logging backend. If set to auto, the audit library will try to detect and use the logging library is present
Possible values: auto, logback, log4j1
tdp_encoding
Encoding to use when writing events using appenders
Default value: UTF-8
tdp_application_name
Name of the application that logs audit events. This value will be put into MDC for each logged event
Default value: Data Preparation
tdp_instance_name
Name of the instance of the service. This value will be put into MDC for each logged event
Default value: DefaultInstance
tdp_log_appender
Comma-separated list of log appender types
Possible values: file, http
Logging - File appender properties
The file appender puts log entries into a JSON file. In most cases, there should be a FileBeat instance that picks up new messages and sends them to Logstash.
Parameter
Description
Value
tdp_appender_file_path
Path to the log file
Default value: data/logs/audit.log
tdp_appender_file_maxsize
Maximum file size (in bytes)
Default value: 52428800
tdp_appender_file_maxbackup
Maximum number of backup log files
Default value: 20
Logging - HTTP appender properties
Parameter
Description
Value
tdp_appender_http_url
URL of target where logging data will be sent
Default value: http://localhost:8057/
tdp_appender_http_async
Whether to use asynchronous mode
Possible values: true (default) or false
Connection to Minio / AWS S3
Talend Data Preparation requires connection to a Minio server (or AWS S3) to share semantic dictionary with TSD.
If you do not have an existing Minio / AWS S3 account, then an embedded Minio server (minio role) can be used instead.
The following variables control the connection to Minio / AWS S3 and by default they are set to use embedded minio role:
Parameter
Description
Default value
tdp_s3endpoint
S3 endpoint URL
http://localhost:9000
tdp_s3bucket
Bucket name
default-bucket
tdp_s3region
Used AWS S3 region
us-east-1 (do not change it if using embedded minio role)
tdp_s3user
AWS S3 access key
usr7xJ0agsFq
tdp_s3pass
AWS S3 secret key
pwd9jYF26Van
tdp_basepath
The base path
(empty value)
Dependencies
The following roles must be used to successfully install and deploy Talend Data Preparation:
java
talend-repo
kafka (to use the Talend Kafka package, otherwise an external Kafka server must be used)
mongodb (to use the Talend MongoDB package, otherwise an external MongoDB server must be used)
Example Playbook
The dependency roles listed above must be defined before the tdp role in the playbook.