[Draft]Labeled Training Data Support by changjian-wang · Pull Request #56290 · Azure/azure-sdk-for-net

changjian-wang · 2026-02-14T06:11:57Z

This pull request adds support for creating custom analyzers with labeled training data in Azure Blob Storage, enhancing the ability to build more accurate field extraction models. It introduces a new sample demonstrating this workflow, updates documentation to guide users through the process, and exposes a new constructor for the LabeledDataKnowledgeSource class to simplify usage. Additionally, there is a minor improvement to operation status parsing logic.

Labeled Training Data Support

Added a new sample (Sample16_CreateAnalyzerWithLabels.md) that demonstrates how to create a custom analyzer using labeled training data from Azure Blob Storage, including setup instructions, code snippets, and helper methods for uploading and accessing training data.
Updated the README (Azure.AI.ContentUnderstanding/README.md) to document the new labeled training data capability and reference the new sample. [1] [2] [3]

API and SDK Enhancements

Added a new constructor to LabeledDataKnowledgeSource that accepts only a container URL, making it easier to instantiate when a file list path is not needed. This is implemented across all supported target frameworks and in a new partial class for customizations. [1] [2] [3]

Other Changes

Updated the assets.json tag to reflect the new build.
Improved the extraction of the operation ID from the Operation-Location header to be more robust in OperationWithId.cs.

- Add LabeledDataKnowledgeSource customization with single-param (Uri) constructor - Add Sample16_CreateAnalyzerWithLabels.cs aligned with Java SDK pattern - Add Sample16_CreateAnalyzerWithLabels.md documentation - Add receipt label files (receipt1/receipt2 with .labels.json and .result.json) - Rename SampleFiles to sample_files for consistency - Align environment variables with Java SDK (CONTENTUNDERSTANDING_* prefix) - Update test-resources.bicep and test-resources-post.ps1 output names - Update all appsettings.json files with new env var names - Update API listing files with new constructor - Update README.md with Sample16 references

- Add Azure.Storage.Blobs dependency to test project - Add CONTENTUNDERSTANDING_TRAINING_DATA_STORAGE_ACCOUNT and CONTAINER env vars - Auto-generate User Delegation SAS URL when SAS URL not set but account/container provided - Update Sample16 .cs with fallback SAS generation logic - Update Sample16 .md documentation with Option A/B pattern

- Extract BuildReceiptFieldSchema() and wrap in Snippet region - Shorten SNIPPET SAS block by calling GenerateUserDelegationSasUrlAsync - Add Snippet region around GenerateUserDelegationSasUrlAsync - Add Assertion region for test assertions (consistent with other samples) - Add DeleteAnalyzerWithLabels snippet with #if SNIPPET/#else pattern - Consolidate test infrastructure with clear section separator - Update .md to reference 4 separate snippets for better docs structure

- Add UploadTrainingDataAsync helper: uploads local receipt_labels/ files to container - Option B now auto-uploads before generating SAS URL (no manual upload needed) - Add upload snippet to .md documentation - Update XML doc comments to reflect new auto-upload behavior

…rocess with labeled training data, update variable names, and enhance instructions for Azure Blob Storage setup.

…ield schema verification details

Add unit tests to achieve >=80% coverage on all custom code files: - ContentFieldExtensionsTest.cs: 22 tests for Value property switch branches covering all ContentField subtypes (String, Number, Integer, Date, Time, Boolean, Object, Array, Json, Unknown/default) - AudioVisualContentDeserializationTest.cs: 16 tests for custom DeserializeAudioVisualContent covering KeyFrameTimesMs casing variants, null values, round-trip unknown properties, empty/multiple items - ArrayFieldExtensionsTest.cs: 12 tests for Count property, indexer happy paths, ArgumentOutOfRangeException paths, and nested ObjectField arrays - ContentUnderstandingClientTest.cs: 6 protocol method tests with MockTransport covering OperationWithId wrapping for sync/async Analyze/AnalyzeBinary including WaitUntil.Completed branch Coverage results (all custom code files): ContentField.Extensions.cs: 100% (was 53.8%) ArrayField.Extensions.cs: 100% (was 71.4%) AudioVisualContent.Customizations.cs: 98.7% (was 77.9%) ContentUnderstandingClient.Customizations.cs: 83.3% (was 78.8%) OperationWithId.cs: 90.6% (was 81.3%) All others: 100% or 83.3% (unchanged)

Added 9 new unit tests and simplified OperationWithId to achieve 100% line coverage across all 10 custom code files: Tests added to AudioVisualContentDeserializationTest.cs: - Deserialize_NullTopLevelElement_ReturnsNull: covers null JSON guard Tests added to ContentUnderstandingClientTest.cs: - AnalyzeAsync_Protocol_WaitUntilCompleted: async Analyze WaitForCompletionAsync - AnalyzeBinaryAsync_Protocol_WaitUntilCompleted: async AnalyzeBinary WaitForCompletionAsync - Analyze_Protocol_ThrowsOnTransportError: sync Analyze catch block - AnalyzeBinary_Protocol_ThrowsOnTransportError: sync AnalyzeBinary catch block - AnalyzeAsync_Protocol_ThrowsOnTransportError: async Analyze catch block - AnalyzeBinaryAsync_Protocol_ThrowsOnTransportError: async AnalyzeBinary catch block - Analyze_Protocol_InvalidOperationLocation_ThrowsOnIdAccess: OperationWithId fallback path - GetAnalyzer_Protocol_WithRequestContext_CoversRequestContextParse: non-null RequestContext Source changes: - OperationWithId.cs: merged nested if conditions into single guard clause to eliminate dead code (segments.Length is never 0 for valid absolute URIs)

changjian-wang · 2026-02-14T06:50:35Z

@copilot how to trigger CI check?

Copilot · 2026-02-14T06:50:43Z

@changjian-wang I've opened a new pull request, #56291, to work on those changes. Once the pull request is ready, I'll request review from you.

Changjian Wang added 9 commits February 13, 2026 13:53

Refactor Sample16 documentation and code: clarify analyzer creation p…

d8e1d6e

…rocess with labeled training data, update variable names, and enhance instructions for Azure Blob Storage setup.

Enhance Sample16 output: improve analyzer creation messages and add f…

7646a3e

…ield schema verification details

Update asset tag in assets.json to reflect latest version

a1c4f10

changjian-wang requested review from bojunehsu, chienyuanchang and yungshinlintw as code owners February 14, 2026 06:11

github-actions bot added the Cognitive - Content Understanding label Feb 14, 2026

Copilot AI mentioned this pull request Feb 14, 2026

Labeled Training Data Support #56291

Open

9 tasks

changjian-wang marked this pull request as draft February 14, 2026 08:02

changjian-wang changed the title ~~Labeled Training Data Support~~ [Draft]Labeled Training Data Support Feb 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft]Labeled Training Data Support#56290

[Draft]Labeled Training Data Support#56290
changjian-wang wants to merge 9 commits intocu_sdk/gafrom
cu_sdk/ga-sample16-labels

changjian-wang commented Feb 14, 2026

Uh oh!

changjian-wang commented Feb 14, 2026

Uh oh!

Copilot AI commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

changjian-wang commented Feb 14, 2026

Uh oh!

changjian-wang commented Feb 14, 2026

Uh oh!

Copilot AI commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants