Skip to content

[Draft]Labeled Training Data Support#56290

Draft
changjian-wang wants to merge 9 commits intocu_sdk/gafrom
cu_sdk/ga-sample16-labels
Draft

[Draft]Labeled Training Data Support#56290
changjian-wang wants to merge 9 commits intocu_sdk/gafrom
cu_sdk/ga-sample16-labels

Conversation

@changjian-wang
Copy link
Member

This pull request adds support for creating custom analyzers with labeled training data in Azure Blob Storage, enhancing the ability to build more accurate field extraction models. It introduces a new sample demonstrating this workflow, updates documentation to guide users through the process, and exposes a new constructor for the LabeledDataKnowledgeSource class to simplify usage. Additionally, there is a minor improvement to operation status parsing logic.

Labeled Training Data Support

  • Added a new sample (Sample16_CreateAnalyzerWithLabels.md) that demonstrates how to create a custom analyzer using labeled training data from Azure Blob Storage, including setup instructions, code snippets, and helper methods for uploading and accessing training data.
  • Updated the README (Azure.AI.ContentUnderstanding/README.md) to document the new labeled training data capability and reference the new sample. [1] [2] [3]

API and SDK Enhancements

  • Added a new constructor to LabeledDataKnowledgeSource that accepts only a container URL, making it easier to instantiate when a file list path is not needed. This is implemented across all supported target frameworks and in a new partial class for customizations. [1] [2] [3]

Other Changes

  • Updated the assets.json tag to reflect the new build.
  • Improved the extraction of the operation ID from the Operation-Location header to be more robust in OperationWithId.cs.

Changjian Wang added 9 commits February 13, 2026 13:53
- Add LabeledDataKnowledgeSource customization with single-param (Uri) constructor
- Add Sample16_CreateAnalyzerWithLabels.cs aligned with Java SDK pattern
- Add Sample16_CreateAnalyzerWithLabels.md documentation
- Add receipt label files (receipt1/receipt2 with .labels.json and .result.json)
- Rename SampleFiles to sample_files for consistency
- Align environment variables with Java SDK (CONTENTUNDERSTANDING_* prefix)
- Update test-resources.bicep and test-resources-post.ps1 output names
- Update all appsettings.json files with new env var names
- Update API listing files with new constructor
- Update README.md with Sample16 references
- Add Azure.Storage.Blobs dependency to test project
- Add CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT and CONTAINER env vars
- Auto-generate User Delegation SAS URL when SAS URL not set but account/container provided
- Update Sample16 .cs with fallback SAS generation logic
- Update Sample16 .md documentation with Option A/B pattern
- Extract BuildReceiptFieldSchema() and wrap in Snippet region
- Shorten SNIPPET SAS block by calling GenerateUserDelegationSasUrlAsync
- Add Snippet region around GenerateUserDelegationSasUrlAsync
- Add Assertion region for test assertions (consistent with other samples)
- Add DeleteAnalyzerWithLabels snippet with #if SNIPPET/#else pattern
- Consolidate test infrastructure with clear section separator
- Update .md to reference 4 separate snippets for better docs structure
- Add UploadTrainingDataAsync helper: uploads local receipt_labels/ files to container
- Option B now auto-uploads before generating SAS URL (no manual upload needed)
- Add upload snippet to .md documentation
- Update XML doc comments to reflect new auto-upload behavior
…rocess with labeled training data, update variable names, and enhance instructions for Azure Blob Storage setup.
Add unit tests to achieve >=80% coverage on all custom code files:

- ContentFieldExtensionsTest.cs: 22 tests for Value property switch branches
  covering all ContentField subtypes (String, Number, Integer, Date, Time,
  Boolean, Object, Array, Json, Unknown/default)

- AudioVisualContentDeserializationTest.cs: 16 tests for custom
  DeserializeAudioVisualContent covering KeyFrameTimesMs casing variants,
  null values, round-trip unknown properties, empty/multiple items

- ArrayFieldExtensionsTest.cs: 12 tests for Count property, indexer happy
  paths, ArgumentOutOfRangeException paths, and nested ObjectField arrays

- ContentUnderstandingClientTest.cs: 6 protocol method tests with MockTransport
  covering OperationWithId wrapping for sync/async Analyze/AnalyzeBinary
  including WaitUntil.Completed branch

Coverage results (all custom code files):
  ContentField.Extensions.cs:            100% (was 53.8%)
  ArrayField.Extensions.cs:              100% (was 71.4%)
  AudioVisualContent.Customizations.cs:  98.7% (was 77.9%)
  ContentUnderstandingClient.Customizations.cs: 83.3% (was 78.8%)
  OperationWithId.cs:                    90.6% (was 81.3%)
  All others:                            100% or 83.3% (unchanged)
Added 9 new unit tests and simplified OperationWithId to achieve 100%
line coverage across all 10 custom code files:

Tests added to AudioVisualContentDeserializationTest.cs:
- Deserialize_NullTopLevelElement_ReturnsNull: covers null JSON guard

Tests added to ContentUnderstandingClientTest.cs:
- AnalyzeAsync_Protocol_WaitUntilCompleted: async Analyze WaitForCompletionAsync
- AnalyzeBinaryAsync_Protocol_WaitUntilCompleted: async AnalyzeBinary WaitForCompletionAsync
- Analyze_Protocol_ThrowsOnTransportError: sync Analyze catch block
- AnalyzeBinary_Protocol_ThrowsOnTransportError: sync AnalyzeBinary catch block
- AnalyzeAsync_Protocol_ThrowsOnTransportError: async Analyze catch block
- AnalyzeBinaryAsync_Protocol_ThrowsOnTransportError: async AnalyzeBinary catch block
- Analyze_Protocol_InvalidOperationLocation_ThrowsOnIdAccess: OperationWithId fallback path
- GetAnalyzer_Protocol_WithRequestContext_CoversRequestContextParse: non-null RequestContext

Source changes:
- OperationWithId.cs: merged nested if conditions into single guard
  clause to eliminate dead code (segments.Length is never 0 for
  valid absolute URIs)
@changjian-wang
Copy link
Member Author

@copilot how to trigger CI check?

Copy link
Contributor

Copilot AI commented Feb 14, 2026

@changjian-wang I've opened a new pull request, #56291, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI mentioned this pull request Feb 14, 2026
9 tasks
@changjian-wang changjian-wang marked this pull request as draft February 14, 2026 08:02
@changjian-wang changjian-wang changed the title Labeled Training Data Support [Draft]Labeled Training Data Support Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants