-
Notifications
You must be signed in to change notification settings - Fork 15
Home
Run playbooks of SQL scripts in series and parallel on Amazon Redshift and PostgreSQL.
Used with Snowplow snowplow for scheduled SQL-based transformations of event stream data.
Assuming you are running on 64bit Linux:
> wget http://dl.bintray.com/snowplow/snowplow-generic/sql_runner_0.1.0_linux_amd64.zip
> unzip sql_runner_0.1.0_linux_amd64.zip
> ./sql-runner -usage
See the User Guide section below for more.
## User guideThere are several command line arguments that can be used:
-
-playbook
: This is a required argument and should point to the playbook you wish to run. -
-fromStep
: Optional argument which will allow you to start the sql-runner from any step in your playbook. -
-sqlroot
: Optional argument to change where we look for the sql statements to run, defaults to the directory of your playbook. -
-var
: Optional argument which allows you to pass a dictionary of key-value pairs which will be used to flesh out your templates. -
-consul
: Optional argument which allows you to fetch playbooks and SQL files from a Consul server. -
-dryRun
: Optional argument which allows you to run through your playbook without executing any SQL against your target(s)
Using the -consul
argument results in the following changes:
- The
-playbook
argument becomes the key that is used to look for the playbook in Consul. - The
-sqlroot
argument also becomes a key argument for Consul.
If you pass in the default:
./sql-runner -consul "localhost:8500" -playbook "sql-runner/playbook/1"
This results in:
- Looking for your playbook file at this key
sql-runner/playbook/1
- Expecting all your SQL file keys to begin with
sql-runner/playbook/<SQL path from playbook>
However as the key here can be used as a both a data and folder node we have added a new sqlroot option:
./sql-runner -consul "localhost:8500" -playbook "sql-runner/playbook/1" -sqlroot PLAYBOOK_CHILD
This results in:
- Looking for your playbook file at this key
sql-runner/playbook/1
- Expecting all your SQL file keys to begin with
sql-runner/playbook/1/<SQL path from playbook>
- The data node is used as a folder node as well.
A playbook consists of one of more steps, each of which consists of one or more queries. Steps are run in series, queries are run in parallel within the step.
Each query contains the path to a query file. See Query files for details.
All steps are applied against all targets. All targets are processed in parallel.
In the following example, a.sql
, b.sql
and c.sql
are run in parallel.
:steps:
- :name: "Run a,b and c in parallel"
:queries:
- :name: a
:file: a.sql
- :name: b
:file: b.sql
- :name: c
:file: c.sql
By contrast, in the example below, the three SQL files are executed in sequence.
:steps:
- :name: "Run a..."
:queries:
- :name: a
:file: a.sql
- :name: "...then run b..."
:queries:
- :name: b
:file: b.sql
- :name: "...then run c..."
:queries:
- :name: c
:file: c.sql
For the playbook template see: [config/config.yml.sample] example-config
### Query filesA query file contains one or more SQL statements. These are executed "raw" (i.e. not in a transaction) in series by SQL Runner.
If the query file is flagged as a template in the playbook, then the file is pre-processed as a template before being executed. See Templates for details
### TemplatesTemplates are run through Golang's [text template processor] go-text-template. The template processor can access all variables defined in the playbook.
The following custom functions are also supported:
-
nowWithFormat [timeFormat]
: wheretimeFormat
is a valid Golang [time format] go-time-format -
systemEnv "ENV_VAR"
: whereENV_VAR
is a key for a valid environment variable -
awsEnvCredentials
: supports passing credentials through environment variables, such asAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
-
awsProfileCredentials
: supports getting credentials from a credentials file, also used by boto/awscli -
awsEC2RoleCredentials
: supports getting role-based credentials, i.e. getting the automatically generated credentials in EC2 instances -
awsChainCredentials
: tries to get credentials from each of the three methods above in order, using the first one returned
Note: All AWS functions output strings in the Redshift credentials format (
CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s'
).
For an example query file using templating see: [integration-tests/postgres-sql/good/3.sql] example-query
If a statement fails in a query file, the query will terminate and report failure.
If a query fails, its sibling queries will continue running, but no further steps will run.
Failures in one target do not affect other targets in any way.