Feature Request
Describe the feature request
The Source Google Drive connector currently does not surface Google Drive label metadata when listing or syncing files. Google Drive supports Labels -- structured metadata (classification tags, custom fields) that organizations apply to files for governance, search, and policy enforcement. Users need the ability to access and filter files based on label metadata during ingestion.
Today, the connector requests basic file fields from the Drive API but does not include labelInfo, so there is no way to filter or expose label values in synced records.
Describe the solution you'd like
-
Surface label metadata in the Drive API request: Add labelInfo to the fields parameter in the files.list call so that label data is returned alongside each file. The Drive API supports includeLabels and labelInfo fields for this purpose (reference).
-
Expose label metadata in synced records: Include label names and field values as part of the file metadata record so downstream systems can access and use them.
-
Enable filtering by label metadata: Allow users to configure label-based filters (e.g., only sync files with a specific label or label value) in the connector spec. The Drive API supports q parameter filtering with label-based syntax.
This could be implemented at the file-based CDK level (benefiting all file-based connectors) or at the Google Drive source level specifically.
Describe alternatives you've considered
- Post-load filtering in dbt: Users could filter records after they land in the warehouse, but this requires syncing all files first and does not reduce ingestion volume.
- Google Drive API search queries: Users could manually construct Drive API queries outside Airbyte, but this defeats the purpose of a managed connector.
- Folder-based organization: Users could organize files into folders by label category, but this is fragile and does not scale for multi-dimensional metadata.
Additional context
Use case
Using Airbyte as a connector to Google Drive for AI-ingestable data sources. Label metadata filtering is needed to selectively ingest files based on organizational classification and governance labels applied in Google Drive.
Category
- Type: New Feature
- Importance: Blocker
- Connector: source-google-drive
Devin session
Internal Tracking: https://github.com/airbytehq/oncall/issues/11819
Feature Request
Describe the feature request
The Source Google Drive connector currently does not surface Google Drive label metadata when listing or syncing files. Google Drive supports Labels -- structured metadata (classification tags, custom fields) that organizations apply to files for governance, search, and policy enforcement. Users need the ability to access and filter files based on label metadata during ingestion.
Today, the connector requests basic file fields from the Drive API but does not include
labelInfo, so there is no way to filter or expose label values in synced records.Describe the solution you'd like
Surface label metadata in the Drive API request: Add
labelInfoto thefieldsparameter in the files.list call so that label data is returned alongside each file. The Drive API supportsincludeLabelsandlabelInfofields for this purpose (reference).Expose label metadata in synced records: Include label names and field values as part of the file metadata record so downstream systems can access and use them.
Enable filtering by label metadata: Allow users to configure label-based filters (e.g., only sync files with a specific label or label value) in the connector spec. The Drive API supports
qparameter filtering with label-based syntax.This could be implemented at the file-based CDK level (benefiting all file-based connectors) or at the Google Drive source level specifically.
Describe alternatives you've considered
Additional context
includeLabelsparameter to specify which label IDs to return, andlabelInfoin thefieldsparameter to include label data in responses.stream_reader.pyline 151 -- fields parameter in the files.list callUse case
Using Airbyte as a connector to Google Drive for AI-ingestable data sources. Label metadata filtering is needed to selectively ingest files based on organizational classification and governance labels applied in Google Drive.
Category
Devin session
Internal Tracking: https://github.com/airbytehq/oncall/issues/11819