Skip to content

[source-google-drive] Surface Google Drive label metadata for file filtering #75591

@devin-ai-integration

Description

@devin-ai-integration

Feature Request

Describe the feature request

The Source Google Drive connector currently does not surface Google Drive label metadata when listing or syncing files. Google Drive supports Labels -- structured metadata (classification tags, custom fields) that organizations apply to files for governance, search, and policy enforcement. Users need the ability to access and filter files based on label metadata during ingestion.

Today, the connector requests basic file fields from the Drive API but does not include labelInfo, so there is no way to filter or expose label values in synced records.

Describe the solution you'd like

  1. Surface label metadata in the Drive API request: Add labelInfo to the fields parameter in the files.list call so that label data is returned alongside each file. The Drive API supports includeLabels and labelInfo fields for this purpose (reference).

  2. Expose label metadata in synced records: Include label names and field values as part of the file metadata record so downstream systems can access and use them.

  3. Enable filtering by label metadata: Allow users to configure label-based filters (e.g., only sync files with a specific label or label value) in the connector spec. The Drive API supports q parameter filtering with label-based syntax.

This could be implemented at the file-based CDK level (benefiting all file-based connectors) or at the Google Drive source level specifically.

Describe alternatives you've considered

  • Post-load filtering in dbt: Users could filter records after they land in the warehouse, but this requires syncing all files first and does not reduce ingestion volume.
  • Google Drive API search queries: Users could manually construct Drive API queries outside Airbyte, but this defeats the purpose of a managed connector.
  • Folder-based organization: Users could organize files into folders by label category, but this is fragile and does not scale for multi-dimensional metadata.

Additional context

Use case

Using Airbyte as a connector to Google Drive for AI-ingestable data sources. Label metadata filtering is needed to selectively ingest files based on organizational classification and governance labels applied in Google Drive.

Category

  • Type: New Feature
  • Importance: Blocker
  • Connector: source-google-drive

Devin session


Internal Tracking: https://github.com/airbytehq/oncall/issues/11819

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions