Release CDAP 5.1.0 · cdapio/cdap

Summary

This release introduces a number of new features, improvements and bug fixes to CDAP. Some of the main highlights of the release are:

Date and Time Support
- Support for Date, Time and Timestamp data types in the CDAP schema. In addition, this support is also now available in pipeline plugins and Data Preparation directives.
Plugin Requirements
- A way for plugins to specify certain runtime requirements, and the ability to filter available plugins based on those requirements.
Bootstrapping
- A method to automatically bootstrap CDAP with a given state, such as a set of deployed apps, artifacts, namespaces, and preferences.
UI Customization
- A way to customize the display of the CDAP UI by enabling or disabling certain features.

New Features

Added support for Date/Time in Preparation. Also, added a new directive parse-timestamp to convert unix timestamp in long or string to Timestamp object. (CDAP-14244)
Added Date, Time, and Timestamp support in plugins (Wrangler, Google Cloud BigQuery, Google Cloud Spanner, Database). (CDAP-14245)
Added Date, Time, and Timestamp support in CDAP Schema. (CDAP-14021)
Added Date, Time, and Timestamp support in UI. (CDAP-14028)
Added Google Cloud Spanner source and sink plugins in Pipeline and Google Cloud Spanner connection in Preparation. (CDAP-14053)
Added Google Cloud PubSub realtime source. (CDAP-14185)
Added a new user onboarding tour to CDAP. (CDAP-14088)
Added the ability to customize UI through theme. (CDAP-13990)
Added a framework that can be used to bootstrap a CDAP instance. (CDAP-14022)
Added the ability to configure system wide provisioner properties that can be set by admins but not by users. (CDAP-13746)
Added capability to allow specifying requirements by plugins and filter them on the basis of their requirements. (CDAP-13924)
Added REST endpoints to query the run counts of a program. (CDAP-13975)
Added a REST endpoint to get the latest run record of multiple programs in a single call. (CDAP-14260)
Added support for Apache Spark 2.3. (CDAP-13653)

Improvements

Improved runtime monitoring (which fetches program states, metadata and logs) of remotely launched programs from the CDAP Master by using dynamic port forwarding instead of HTTPS for communication. (CDAP-13566)
Removed duplicate classes to reduce the size of the sandbox by a couple hundred megabytes. (CDAP-13977)
Added cdap-env.sh to allow configuring jvm options while launching the Sandbox. (CDAP-14461)
Added support for bidirectional Field Level Lineage. (CDAP-14003)
Added capability for external dataset to record their schema. (CDAP-14013)
The Dataproc provisioner will try to pick up the project id and credentials from the environment if they are not specified. (CDAP-14091)
The Dataproc provisioner will use internal IP addresses when CDAP is in the same network as the Dataproc cluster. (CDAP-14104)
Added capability to always display current dataset schema in Field Level Lineage. (CDAP-14168)
Improved error handling in Preparation. (CDAP-13886)
Added a FileSink batch sink, FileMove action, and FileDelete action to replace their HDFS counterparts. (CDAP-14023)
Added a configurable jvm option to kill CDAP process immediately on sandbox when an OutOfMemory error occurs. (CDAP-14097)
Added better trace logging for dataset service. (CDAP-14135)
Make Google Cloud Storage, Google Cloud BigQuery, and Google Cloud Spanner connection properties optional (project id, service account keyfile path, temporary GCS bucket). (CDAP-14386)
Google Cloud PubSub sink will try to create the topic if it does not exist while preparing for the run. (CDAP-14401)
Added csv, tsv, delimited, json, and blob as formats to the S3 source and sink. (CDAP-14475)
Added csv, tsv, delimited, json, and blob as formats to the File source. (CDAP-14321)
Added a button on external sources and sinks to jump to the dataset detail page. (CDAP-9048)
Added format and suppress query params to the program logs endpoint to match the program run logs endpoint. (CDAP-14040)
Made all CDAP examples to be compatible with Spark 2. (CDAP-14132)
Added worker and master disk size properties to the Dataproc provisioner. (CDAP-14220)
Improved operational behavior of the dataset service. (CDAP-14298)
Fixed wrangler transform to make directives optional. If none are given, the transform is a no-op. (CDAP-14372)
Fixed Preparation to treat files wihtout extension as text files. (CDAP-14397)
Limited the number of files showed in S3 and Google Cloud Storage browser to 1000. (CDAP-14398)
Enhanced Google Cloud BigQuery sink to create dataset if the specified dataset does not exist. (CDAP-14482)
Increased log levels for the CDAP Sandbox so that only CDAP classes are at debug level. (CDAP-14489)

Bug Fixes

Fixed the 'distinct' plugin to use a drop down for the list of fields and to have a button to get the output schema. (CDAP-14468)
Ensured that destroy() is always called for MapReduce, even if initialize() fails. (CDAP-7444)
Fixed a bug where Alert Publisher will not work if there is a space in the label. (CDAP-13008)
Fixed a bug that caused Preparation to fail while parsing avro files. (CDAP-13230)
Fixed a misleading error message about hbase classes in cloud runtimes. (CDAP-13878)
Fixed a bug where the metric for failed profile program runs was not getting incremented when the run failed due to provisioning errors. (CDAP-13887)
Fixed a bug where querying metrics by time series will be incorrect after a certain amount of time. (CDAP-13894)
Fixed a bug where profile metrics is incorrect if an app is deleted. (CDAP-13959)
Fixed a deprovisioning bug when cluster creation would fail. (CDAP-13965)
Fixed an error where TMS publishing was retried indefinitely if the first attempt failed. (CDAP-13988)
Fixed a race condition in MapReduce that can cause a deadlock. (CDAP-14076)
Fixed a resource leak in preview feature. (CDAP-14098)
Fixed a bug that would cause RDD versions of the dynamic scala spark plugins to fail. (CDAP-14107)
Fixed a bug where profiles were getting applied to all program types instead of only workflows. (CDAP-14154)
Fixed a race condition by ensuring that a program is started before starting runtime monitoring for it. (CDAP-14203)
Fixed runs count for pipelines in UI to show correct number instead of limiting to 100. (CDAP-14211)
Fixed an issue where Dataproc client was not being closed, resulting in verbose error logs. (CDAP-14223)
Fixed a bug that could cause the provisioning state of stopped program runs to be corrupted. (CDAP-14261)
Fixed a bug that caused Preparation to be unable to list buckets in a Google Cloud Storage connection in certain environments. (CDAP-14271)
Fixed a bug where Dataproc provisioner is not able to provision a singlenode cluster. (CDAP-14303)
Fixed a bug where Preparation could not read json or xml files on Google Cloud Storage. (CDAP-14390)
Fixed dataproc provisioner to use full API access scopes so that Google Cloud Spanner and Google Cloud PubSub are accessible by default. (CDAP-14395)
Fixed a bug where profile metrics is not deleted when a profile is deleted. (CDAP-14435)

Deprecated and Removed Features

Removed old and buggy dynamic spark plugins. (CDAP-14108)
Dropped support for MapR 4.1. (CDAP-14456)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDAP 5.1.0

Summary

New Features

Improvements

Bug Fixes

Deprecated and Removed Features