-
Notifications
You must be signed in to change notification settings - Fork 0
Query string
- Introduction
- Global full text search
- Field-specific keyword search
- Spatial search
- Combining operators
- Further reading
This document explains in more detail the nature and usage of the query string, the element of the query object that defines the query terms. This is valid for both the search and download API methods.
A query string (also called search query) is just a string that contains at most 2000 Unicode characters. There are several ways in which you can refine a search to find exactly what you are looking for.
This is the simplest search option. It provides a basic keyword search that looks for matching text anywhere in a record. The following search for "noturus placidus" is an example of a global full text search, it searches for and retrieves records with both terms "Noturus" and "placidus" in any field:
{"q": "noturus placidus"}Search terms are case insensitive in terms of the content they match (so using "noturus placidus" will retrieve the same results as "Noturus placidus" of "NOTURUS PLACIDUS"). As a slightly more complex example, to search for all records that contain "mvz" (the abbreviation for the Museum of Vertebrate Zoology), "gymnogyps" (the genus of the rare California condor), and "california" anywhere in the record, you could use this query object:
{"q": "mvz gymnogyps california"}Looking for quoted content (like an exact set of terms) or punctuated values (like a URN) is a little tricky. You have to enclose the string you are looking for in escaped quotes (\" at the beginning and end of the term). For example, the following search object looks for any records that contain the exact string urn:occurrence:Arctos:CUMV:Amph:14908:2243803
{"q":"\"urn:occurrence:Arctos:CUMV:Amph:14908:2243803\""}while this one looks for records with "postcranial skeleton" exactly as they are shown here:
{"q":"\"postcranial skeleton\""}You can also limit keyword searches to match only specific values of Darwin Core terms. To do so, provide the name of the Darwin Core term immediately before the search text, in lower-case and separated by a colon (":"). For example, the previous query to retrieve records of the Noturus placidus would retrieve those records with these terms in the scientific name field, but also in the comment fields, or any other field. So, if we wanted to get only those records with genus Noturus and specific epithet placidus, we could use this query:
{"q": "genus:noturus specificepithet:placidus"}Or, suppose we already know the globally unique identifier for an occurrence record (iptrecordid), we could use this query:
{"q": "iptrecordid:7108667e-1483-4d04-b204-6a44a73a5219"}The following Darwin Core terms are indexed and available for searching:
- basisofrecord
- bed
- catalognumber
- class
- collectioncode
- continent
- coordinateuncertaintyinmeters
- country
- county
- day
- eventdate
- family
- fieldnumber
- formation
- genus
- geodeticdatum
- georeferencedby
- georeferenceverificationstatus
- group
- infraspecificepithet
- institutioncode
- island
- islandgroup
- kingdom
- lifestage
- locality
- member
- month
- municipality
- order
- phylum
- preparations
- recordedby
- recordnumber
- reproductivecondition
- scientificname
- sex
- specificepithet
- stateprovince
- typestatus
- vernacularname
- waterbody
- year
There are other terms that are not exactly aligned to pure DarwinCore standard terms but provide flexible search options:
- dctype: dcterms:type
- fossil:
trueif dwc:basisOfRecord is FossilSpecimen or collection is a paleo collection - gbifdatasetid: dwc:datasetID
- gbifpublisherid: dwc:institutionID
- hashid: a value between 1 and 9999 (a way to distribute records in 10k bins)
- haslicense:
trueif dcterms:license or eml:intellectualRights has a license designated - hastypestatus:
trueif dwc:typeStatus is populated - iptlicense: eml:intellectualRights
- iptrecordid: dwc:occurrenceID
- lastindexed: dcterms:modified
- license: dcterms:license
- location: a Google GeoField of the dwc:decimalLatitude, dwc:decimalLongitude
- mappable:
trueif the record has valid dwc:decimalLatitude, dwc:decimalLongitude) - media:
trueif the record has dwc:associatedMedia - migrator: last processed date
- networks: one or more of MaNIS, ORNIS, HerpNET, FishNet, VertNet, Arctos or Paleo
- rank: a value between 1 and 12 (higher signifies a more complete record, these will show first on lists)
- tissue:
trueif the record has dwc:preparation that suggests tissue is available - type: either specimen or observation
- verbatim_record: the whole records as published
- wascaptive:
trueif dwc:establishmentMeans or occurrenceRemarks suggests it was captive
NOTE: code to rank records can be found at https://github.com/VertNet/dwc-indexer/blob/master/utils.py#L163; basically, the most complete records with respect to georeferences, scientific name, and year are rank before all others and appear first in any list.
NOTE: Because year is a number field, it can be searched using less than/greater than comparison operators ("<", "<=", ">", ">=") in addition to the colon (which is equivalent to "=").
The API allows to search records within a specified distance (in meters) around a given spatial point, represented by a pair of coordinates. This is done using the distance operator, a function that returns the distance in meters between two points, passed as arguments. One of the points should be the location field of the record and the other, the point we want to use as center. Then, we just need to state that we want the distance between these two to be less than a certain value.
Example: search for all records within 2 kilometers of the point 33.529, -105.694:
{"q":"distance(location,geopoint(33.529,-105.694))<2000"}This will first build the geopoint spatial feature from the given coordinates, then calculate the distance between that geopoint and the location field of the records and return only those that match distance<2000.
Query string terms can be combined by using the boolean operators AND, OR, and NOT. If used, they must be written in upper case. NOT must always appear before the value it modifies, while AND and OR should be used between values. If multiple search keywords are provided but no Boolean operators are specified, AND is used by default.
Here are some examples:
Search for records with all three terms "mvz", "gymnogyps" and "california"
{"q": "mvz AND gymnogyps AND california"}Search for records with the term "Noturus" but not "placidus"
{"q": "noturus AND NOT placidus"}Search for records from years 1990 or 1991
{"q": "year:1990 OR year:1991"}Search for georeferenced, 20th-21st century records of the black-footed ferret from either Colorado or Kansas (note the use of parentheses to group together the two possible values for the "stateprovince" field):
{"q":"genus:Mustela specificepithet:nigripes stateprovince:(colorado OR kansas) year>=1900 mappable:1"}If you would like to learn more about query strings, you will want to read the official documentation from Google.