Skip to content

Commit 3c4e2e3

Browse files
authored
Merge pull request #236 from hubmapconsortium/test-release
v2.0.18 release
2 parents e439d8e + 3573002 commit 3c4e2e3

File tree

8 files changed

+447
-36
lines changed

8 files changed

+447
-36
lines changed

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.0.17
1+
2.0.18

entity-api-spec.yaml

Lines changed: 77 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1423,4 +1423,80 @@ paths:
14231423
'404':
14241424
description: The target dataset could not be found
14251425
'500':
1426-
description: Internal error
1426+
description: Internal error
1427+
'/datasets/{id}/revisions':
1428+
get:
1429+
summary: 'From a given ID of a versioned dataset, retrieve a list of every dataset in the chain ordered from most recent to oldest. The revision number, as well as the dataset uuid will be included. An optional parameter ?include_dataset=true will include the full dataset for each revision as well. Public/Consortium access rules apply, if is for a non-public dataset and no token or a token without membership in HuBMAP-Read group is sent with the request then a 403 response should be returned. If the given id is published, but later revisions are not and the user is not in HuBMAP-Read group, only published revisions will be returned. The field next_revision_uuid will not be returned if the next revision is unpublished'
1430+
parameters:
1431+
- name: id
1432+
in: path
1433+
description: The unique identifier of entity. This identifier can be either an HuBMAP ID (e.g. HBM123.ABCD.456) or UUID
1434+
required: true
1435+
schema:
1436+
type: string
1437+
- name: include_dataset
1438+
in: query
1439+
description: A case insensitive string. Any value besides true will have no effect. If the string is 'true', the full dataset for each revision will be included in the response
1440+
required: false
1441+
schema:
1442+
type: string
1443+
enum: ['true', 'false']
1444+
responses:
1445+
'200':
1446+
description: The list of revised datasets that the referenced dataset is a member of including the index number of the revision, where 1 is the oldest version of any revision chain
1447+
content:
1448+
application/json:
1449+
schema:
1450+
type: object
1451+
properties:
1452+
dataset_uuid:
1453+
type: string
1454+
description: The uuid of a dataset
1455+
revision_number:
1456+
type: integer
1457+
description: The number in the revision chain of this dataset where 1 is the oldest revision
1458+
dataset:
1459+
$ref: '#/components/schemas/Dataset'
1460+
'400':
1461+
description: Invalid or misformatted entity identifier, or the given entity is not a Dataset
1462+
'401':
1463+
description: The user's token has expired or the user did not supply a valid token
1464+
'403':
1465+
description: The user is not authorized to query the revision number of the given dataset.
1466+
'404':
1467+
description: The target dataset could not be found
1468+
'500':
1469+
description: Internal error
1470+
'/datasets/{id}/retract':
1471+
put:
1472+
summary: 'Retracts a dataset after it has been published. Requires a json body with a single field {retraction_reason: string}. The dataset for the given id is modified to include this new retraction_reason field and sets the dataset property sub_status to Retracted. The complete modified dataset is returned. Requires that the dataset being retracted has already been published (dataset.status == Published. Requires a user token with membership in the HuBMAP-Data-Admin group otherwise then a 403 will be returned.'
1473+
parameters:
1474+
- name: id
1475+
in: path
1476+
description: The unique identifier of entity. This identifier can be either a HubMAP ID (e.g. HBM123.ABCD.456) or UUID
1477+
required: true
1478+
schema:
1479+
type: string
1480+
requestBody:
1481+
description: A json body with a single, required retraction_reason parameter contianing the reason why the dataset is being retracted.
1482+
content:
1483+
application/json:
1484+
schema:
1485+
type: object
1486+
properties:
1487+
retraction_reason:
1488+
type: string
1489+
description: Free text describing why the dataset was retracted
1490+
responses:
1491+
'200':
1492+
description: The complete dataset with modified sub_status and retraction_reason
1493+
'400':
1494+
description: Invalid or misformatted entity identifier, the given entity is not a Dataset, is not published or the required retraction_reason was not included in a json body
1495+
'401':
1496+
description: The user's token has expired or the user did not supply a valid token
1497+
'403':
1498+
description: The user is not authorized to query the retract the given dataset. The user must be a member of the HuBMAP-Data-Admin group
1499+
'404':
1500+
description: The target dataset could not be found
1501+
'500':
1502+
description: Internal error

src/app.py

Lines changed: 179 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -102,10 +102,9 @@ def http_internal_server_error(e):
102102
# This neo4j_driver_instance will be used for application-specifc neo4j queries
103103
# as well as being passed to the schema_manager
104104
try:
105-
neo4j_driver_instance = neo4j_driver.instance(app.config['NEO4J_URI'],
106-
app.config['NEO4J_USERNAME'],
105+
neo4j_driver_instance = neo4j_driver.instance(app.config['NEO4J_URI'],
106+
app.config['NEO4J_USERNAME'],
107107
app.config['NEO4J_PASSWORD'])
108-
109108
logger.info("Initialized neo4j_driver module successfully :)")
110109
except Exception:
111110
msg = "Failed to initialize the neo4j_driver module"
@@ -794,7 +793,7 @@ def create_entity(entity_type):
794793
# Execute entity level validator defined in schema yaml before entity creation
795794
# Currently on Dataset and Upload creation require application header
796795
try:
797-
schema_manager.execute_entity_level_validator('before_entity_create_validator', normalized_entity_type, request.headers)
796+
schema_manager.execute_entity_level_validator('before_entity_create_validator', normalized_entity_type, request)
798797
except schema_errors.MissingApplicationHeaderException as e:
799798
bad_request_error(e)
800799
except schema_errors.InvalidApplicationHeaderException as e:
@@ -867,6 +866,11 @@ def create_entity(entity_type):
867866
if next_revisions_list:
868867
bad_request_error(f"The previous_revision_uuid specified for this dataset has already had a next revision")
869868

869+
# Only published datasets can have revisions made of them. Verify that that status of the Dataset specified
870+
# by previous_revision_uuid is published. Else, bad request error.
871+
if previous_version_dict['status'].lower() != DATASET_STATUS_PUBLISHED:
872+
bad_request_error(f"The previous_revision_uuid specified for this dataset must be 'Published' in order to create a new revision from it")
873+
870874
# Generate 'before_create_triiger' data and create the entity details in Neo4j
871875
merged_dict = create_entity_details(request, normalized_entity_type, user_token, json_data_dict)
872876
else:
@@ -997,7 +1001,7 @@ def update_entity(id):
9971001

9981002
# Execute property level validators defined in schema yaml before entity property update
9991003
try:
1000-
schema_manager.execute_property_level_validators('before_property_update_validators', normalized_entity_type, request.headers, entity_dict, json_data_dict)
1004+
schema_manager.execute_property_level_validators('before_property_update_validators', normalized_entity_type, request, entity_dict, json_data_dict)
10011005
except (schema_errors.MissingApplicationHeaderException,
10021006
schema_errors.InvalidApplicationHeaderException,
10031007
KeyError,
@@ -1454,7 +1458,7 @@ def get_previous_revisions(id):
14541458
# Get user token from Authorization header
14551459
user_token = get_user_token(request)
14561460

1457-
# Make sure the id exists in uuid-api and
1461+
# Make sure the id exists in uuid-api and
14581462
# the corresponding entity also exists in neo4j
14591463
entity_dict = query_target_entity(id, user_token)
14601464
uuid = entity_dict['uuid']
@@ -1954,6 +1958,175 @@ def get_dataset_revision_number(id):
19541958
return jsonify(revision_number)
19551959

19561960

1961+
"""
1962+
Retract a published dataset with a retraction reason and sub status
1963+
1964+
Takes as input a json body with required fields "retracted_reason" and "sub_status".
1965+
Authorization handled by gateway. Only token of HuBMAP-Data-Admin group can use this call.
1966+
1967+
Technically, the same can be achieved by making a PUT call to the generic entity update endpoint
1968+
with using a HuBMAP-Data-Admin group token. But doing this is strongly discouraged because we'll
1969+
need to add more validators to ensure when "retracted_reason" is provided, there must be a
1970+
"sub_status" filed and vise versa. So consider this call a special use case of entity update.
1971+
1972+
Parameters
1973+
----------
1974+
id : str
1975+
The HuBMAP ID (e.g. HBM123.ABCD.456) or UUID of target dataset
1976+
1977+
Returns
1978+
-------
1979+
dict
1980+
The updated dataset details
1981+
"""
1982+
@app.route('/datasets/<id>/retract', methods=['PUT'])
1983+
def retract_dataset(id):
1984+
# Always expect a json body
1985+
require_json(request)
1986+
1987+
# Parse incoming json string into json data(python dict object)
1988+
json_data_dict = request.get_json()
1989+
1990+
# Use beblow application-level validations to avoid complicating schema validators
1991+
# The 'retraction_reason' and `sub_status` are the only required/allowed fields. No other fields allowed.
1992+
# Must enforce this rule otherwise we'll need to run after update triggers if any other fields
1993+
# get passed in (which should be done using the generic entity update call)
1994+
if 'retraction_reason' not in json_data_dict:
1995+
bad_request_error("Missing required field: retraction_reason")
1996+
1997+
if 'sub_status' not in json_data_dict:
1998+
bad_request_error("Missing required field: sub_status")
1999+
2000+
if len(json_data_dict) > 2:
2001+
bad_request_error("Only retraction_reason and sub_status are allowed fields")
2002+
2003+
# Must be a HuBMAP-Data-Admin group token
2004+
token = get_user_token(request)
2005+
2006+
# Retrieves the neo4j data for a given entity based on the id supplied.
2007+
# The normalized entity-type from this entity is checked to be a dataset
2008+
# If the entity is not a dataset and the dataset is not published, cannot retract
2009+
entity_dict = query_target_entity(id, token)
2010+
normalized_entity_type = entity_dict['entity_type']
2011+
2012+
# A bit more application-level validation
2013+
if normalized_entity_type != 'Dataset':
2014+
bad_request_error("The entity of given id is not a Dataset")
2015+
2016+
# Validate request json against the yaml schema
2017+
# The given value of `sub_status` is being validated at this step
2018+
try:
2019+
schema_manager.validate_json_data_against_schema(json_data_dict, normalized_entity_type, existing_entity_dict = entity_dict)
2020+
except schema_errors.SchemaValidationException as e:
2021+
# No need to log the validation errors
2022+
bad_request_error(str(e))
2023+
2024+
# Execute property level validators defined in schema yaml before entity property update
2025+
try:
2026+
schema_manager.execute_property_level_validators('before_property_update_validators', normalized_entity_type, request, entity_dict, json_data_dict)
2027+
except (schema_errors.MissingApplicationHeaderException,
2028+
schema_errors.InvalidApplicationHeaderException,
2029+
KeyError,
2030+
ValueError) as e:
2031+
bad_request_error(e)
2032+
2033+
# No need to call after_update() afterwards because retraction doesn't call any after_update_trigger methods
2034+
merged_updated_dict = update_entity_details(request, normalized_entity_type, token, json_data_dict, entity_dict)
2035+
2036+
complete_dict = schema_manager.get_complete_entity_result(token, merged_updated_dict)
2037+
2038+
# Will also filter the result based on schema
2039+
normalized_complete_dict = schema_manager.normalize_entity_result_for_response(complete_dict)
2040+
2041+
# Also reindex the updated entity node in elasticsearch via search-api
2042+
reindex_entity(entity_dict['uuid'], token)
2043+
2044+
return jsonify(normalized_complete_dict)
2045+
2046+
"""
2047+
Retrieve a list of all revisions of a dataset from the id of any dataset in the chain.
2048+
E.g: If there are 5 revisions, and the id for revision 4 is given, a list of revisions
2049+
1-5 will be returned in reverse order (newest first). Non-public access is only required to
2050+
retrieve information on non-published datasets. Output will be a list of dictionaries. Each dictionary
2051+
contains the dataset revision number and its uuid. Optionally, the full dataset can be included for each.
2052+
By default, only the revision number and uuid is included. To include the full dataset, the query
2053+
parameter "include_dataset" can be given with the value of "true". If this parameter is not included or
2054+
is set to false, the dataset will not be included. For example, to include the full datasets for each revision,
2055+
use '/datasets/<id>/revisions?include_dataset=true'. To omit the datasets, either set include_dataset=false, or
2056+
simply do not include this parameter.
2057+
"""
2058+
@app.route('/datasets/<id>/revisions', methods=['GET'])
2059+
def get_revisions_list(id):
2060+
# By default, do not return dataset. Only return dataset if return_dataset is true
2061+
show_dataset = False
2062+
if bool(request.args):
2063+
include_dataset = request.args.get('include_dataset')
2064+
if (include_dataset is not None) and (include_dataset.lower() == 'true'):
2065+
show_dataset = True
2066+
# Token is not required, but if an invalid token provided,
2067+
# we need to tell the client with a 401 error
2068+
validate_token_if_auth_header_exists(request)
2069+
2070+
# Use the internal token to query the target entity
2071+
# since public entities don't require user token
2072+
token = get_internal_token()
2073+
2074+
# Query target entity against uuid-api and neo4j and return as a dict if exists
2075+
entity_dict = query_target_entity(id, token)
2076+
normalized_entity_type = entity_dict['entity_type']
2077+
2078+
# Only for Dataset
2079+
if normalized_entity_type != 'Dataset':
2080+
bad_request_error("The entity of given id is not a Dataset")
2081+
2082+
# Only published/public datasets don't require token
2083+
if entity_dict['status'].lower() != DATASET_STATUS_PUBLISHED:
2084+
# Token is required and the user must belong to HuBMAP-READ group
2085+
token = get_user_token(request, non_public_access_required=True)
2086+
2087+
# By now, either the entity is public accessible or
2088+
# the user token has the correct access level
2089+
# Get the all the sorted (DESC based on creation timestamp) revisions
2090+
sorted_revisions_list = app_neo4j_queries.get_sorted_revisions(neo4j_driver_instance, entity_dict['uuid'])
2091+
2092+
# Skip some of the properties that are time-consuming to generate via triggers
2093+
# direct_ancestors, collections, and upload for Dataset
2094+
properties_to_skip = [
2095+
'direct_ancestors',
2096+
'collections',
2097+
'upload'
2098+
]
2099+
complete_revisions_list = schema_manager.get_complete_entities_list(token, sorted_revisions_list, properties_to_skip)
2100+
normalized_revisions_list = schema_manager.normalize_entities_list_for_response(complete_revisions_list)
2101+
2102+
# Only check the very last revision (the first revision dict since normalized_revisions_list is already sorted DESC)
2103+
# to determine if send it back or not
2104+
if not user_in_hubmap_read_group(request):
2105+
latest_revision = normalized_revisions_list[0]
2106+
2107+
if latest_revision['status'].lower() != DATASET_STATUS_PUBLISHED:
2108+
normalized_revisions_list.pop(0)
2109+
2110+
# Also hide the 'next_revision_uuid' of the second last revision from response
2111+
if 'next_revision_uuid' in normalized_revisions_list[0]:
2112+
normalized_revisions_list[0].pop('next_revision_uuid')
2113+
2114+
# Now all we need to do is to compose the result list
2115+
results = []
2116+
revision_number = len(normalized_revisions_list)
2117+
for revision in normalized_revisions_list:
2118+
result = {
2119+
'revision_number': revision_number,
2120+
'dataset_uuid': revision['uuid']
2121+
}
2122+
if show_dataset:
2123+
result['dataset'] = revision
2124+
results.append(result)
2125+
revision_number -= 1
2126+
2127+
return jsonify(results)
2128+
2129+
19572130
####################################################################################################
19582131
## Internal Functions
19592132
####################################################################################################

src/app_neo4j_queries.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -592,6 +592,47 @@ def get_children(neo4j_driver, uuid, property_key = None):
592592
return results
593593

594594

595+
596+
"""
597+
Get all revisions for a given dataset uuid and sort them in descending order based on their creation time
598+
599+
Parameters
600+
----------
601+
neo4j_driver : neo4j.Driver object
602+
The neo4j database connection pool
603+
uuid : str
604+
The uuid of target entity
605+
606+
Returns
607+
-------
608+
dict
609+
A list of all the unique revision datasets in DESC order
610+
"""
611+
def get_sorted_revisions(neo4j_driver, uuid):
612+
results = []
613+
614+
query = (f"MATCH (prev:Dataset)<-[:REVISION_OF *0..]-(e:Dataset)<-[:REVISION_OF *0..]-(next:Dataset) "
615+
f"WHERE e.uuid='{uuid}' "
616+
# COLLECT() returns a list
617+
# apoc.coll.toSet() reruns a set containing unique nodes
618+
f"WITH apoc.coll.toSet(COLLECT(next) + COLLECT(e) + COLLECT(prev)) AS collection "
619+
f"UNWIND collection as node "
620+
f"WITH node ORDER BY node.created_timestamp DESC "
621+
f"RETURN COLLECT(node) AS {record_field_name}")
622+
623+
logger.debug("======get_sorted_revisions() query======")
624+
logger.debug(query)
625+
626+
with neo4j_driver.session() as session:
627+
record = session.read_transaction(_execute_readonly_tx, query)
628+
629+
if record and record[record_field_name]:
630+
# Convert the list of nodes to a list of dicts
631+
results = _nodes_to_dicts(record[record_field_name])
632+
633+
return results
634+
635+
595636
"""
596637
Get all previous revisions of the target entity by uuid
597638

src/schema/provenance_schema.yaml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -372,7 +372,22 @@ ENTITIES:
372372
# The updated_peripherally tag is a temporary measure to correctly handle any attributes
373373
# which are potentially updated by multiple triggers
374374
updated_peripherally: true
375-
375+
retraction_reason:
376+
type: string
377+
before_property_update_validators:
378+
- validate_if_retraction_permitted
379+
- validate_sub_status_provided
380+
description: 'Information recorded about why a the dataset was retracted.'
381+
sub_status:
382+
type: string
383+
before_property_update_validators:
384+
- validate_if_retraction_permitted
385+
- validate_retraction_reason_provided
386+
- validate_retracted_dataset_sub_status_value
387+
description: 'A sub-status provided to further define the status. The only current allowable value is "Retracted"'
388+
provider_info:
389+
type: string
390+
description: 'Information recorded about the data provider before an analysis pipeline is run on the data.'
376391

377392
############################################# Donor #############################################
378393
Donor:

0 commit comments

Comments
 (0)