Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .github/instructions/docs.content.docs.list.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
applyTo: "docs/content/docs/**/_index.md"
---

# Documentation Standards

All documentation content **must stay in sync with the codebase**.
List pages automatically show the `description` from the front matter, followed by cards for each sub-page. This layout is applied automatically.

---

## Front Matter

- Only edit the `description`.
- The `description` must be **SEO/GEO friendly**:
- Use clear, relevant keywords.
- Keep it concise, about 150–160 characters.
- Write for humans first.

---

## Content

The content appears **after the list of sub-pages**.
It must describe the overall purpose of the collection or section, based on an **inspection of all current sub-pages**.

Write **concise generalisms** that:
- Summarise common themes, capabilities, and patterns found across the sub-pages.
- Call out notable variations only when they matter to readers choosing where to go next.
- Link to one or two representative sub-pages when helpful, not an exhaustive list.

---

## Rules

- Content appears **after the auto-generated list of sub-pages**; do not add additional lists.
- Use the content to give context and orientation, not item-level documentation.
- Base statements on an actual review of all sub-pages in the section.
- Keep wording concise and consistent with the page’s keywords, and ensure the `description` aligns with the content.
- Update this content whenever sub-pages are added, removed, or materially changed.

---
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
---
applyTo: "docs/content/docs/**"
applyTo: "docs/content/docs/**/index.md"
---

# Documentation Standards

All documentation content **must stay in sync with the codebase**.
Every page must accurately reflect the associated **data files** and **schemas**.
Every page must accurately reflect the associated **data files** and **schemas**.
Existing manually added content must be preserved. Reorganise or adapt it as needed, but do not remove it.

---

Expand All @@ -25,7 +26,8 @@ Each documentation file **may include** the following properties:

## Documentation Structure

Documentation files should generally include these sections (in order):
Documentation files should generally include these sections (in order).
Manually added content should be placed into the most relevant section, or reorganised if necessary.

1. **Overview**
- Subsections: *How It Works*, *Use Cases*
Expand Down
193 changes: 184 additions & 9 deletions docs/content/docs/reference/tools/stringmanipulatortool/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
---
title: String Manipulator Tool
description: Used to process the String fields of a work item. This is useful for cleaning up data. It will limit fields to a max length and apply regex replacements based on what is configured. Each regex replacement is applied in order and can be enabled or disabled.
description: Processes and cleans up string fields in work items by applying regex patterns, length limitations, and text transformations. Essential for data cleanup and standardization during migration.
dataFile: reference.tools.stringmanipulatortool.yaml
schemaFile: schema.tools.stringmanipulatortool.json
slug: string-manipulator-tool
aliases:
- /docs/Reference/Tools/StringManipulatorTool
Expand All @@ -12,13 +13,39 @@ date: 2025-06-24T12:07:31Z
discussionId: 2643
---

{{< class-description >}}
## Overview

## Options
The String Manipulator Tool provides powerful text processing capabilities for work item migration. It applies configurable string manipulations to all text fields in work items, enabling data cleanup, standardization, and format corrections during the migration process.

{{< class-options >}}
The tool processes string fields through a series of regex-based manipulators that can remove invalid characters, standardize formats, replace text patterns, and enforce field length limits. Each manipulation is applied in sequence and can be individually enabled or disabled.

### How It Works

The String Manipulator Tool operates on all string fields within work items during migration:

1. **Field Processing**: The tool identifies all string-type fields in each work item
2. **Sequential Application**: Each configured manipulator is applied in the order defined in the configuration
3. **Regex Transformations**: Pattern-based replacements using regular expressions
4. **Length Enforcement**: Truncates fields that exceed the maximum allowed length
5. **Conditional Execution**: Each manipulator can be individually enabled or disabled

The tool is automatically invoked by migration processors and applies transformations before work items are saved to the target system.

### Use Cases

Common scenarios where the String Manipulator Tool is essential:

## Samples
- **Data Cleanup**: Removing invalid Unicode characters, control characters, or formatting artifacts
- **Format Standardization**: Converting text patterns to consistent formats
- **Length Compliance**: Ensuring field values don't exceed target system limits
- **Character Encoding**: Fixing encoding issues from legacy systems
- **Pattern Replacement**: Updating URLs, paths, or references to match target environment

## Configuration Structure

### Options

{{< class-options >}}

### Sample

Expand All @@ -28,13 +55,161 @@ discussionId: 2643

{{< class-sample sample="defaults" >}}

### Classic
### Basic Examples

The String Manipulator Tool is configured with an array of manipulators, each defining a specific text transformation:

```json
{
"StringManipulatorTool": {
"Enabled": true,
"MaxStringLength": 1000000,
"Manipulators": [
{
"$type": "RegexStringManipulator",
"Enabled": true,
"Description": "Remove invalid characters",
"Pattern": "[^\\x20-\\x7E\\r\\n\\t]",
"Replacement": ""
}
]
}
}
```

### Complex Examples

#### Manipulator Types

Currently, the tool supports the following manipulator types:

- **RegexStringManipulator**: Applies regular expression pattern matching and replacement

#### Manipulator Properties

Each manipulator supports these properties:

- **$type**: Specifies the manipulator type (e.g., "RegexStringManipulator")
- **Enabled**: Boolean flag to enable/disable this specific manipulator
- **Description**: Human-readable description of what the manipulator does
- **Pattern**: Regular expression pattern to match text
- **Replacement**: Text to replace matched patterns (can be empty string for removal)

## Common Scenarios

### Removing Invalid Characters

Remove non-printable characters that may cause issues in the target system:

```json
{
"$type": "RegexStringManipulator",
"Description": "Remove invalid characters from the end of the string",
"Enabled": true,
"Pattern": "[^( -~)\n\r\t]+",
"Replacement": ""
}
```

### Standardizing Line Endings

Convert all line endings to a consistent format:

```json
{
"$type": "RegexStringManipulator",
"Description": "Standardize line endings to CRLF",
"Enabled": true,
"Pattern": "\r\n|\n|\r",
"Replacement": "\r\n"
}
```

### Cleaning HTML Content

Remove or clean HTML tags from text fields:

```json
{
"$type": "RegexStringManipulator",
"Description": "Remove HTML tags",
"Enabled": true,
"Pattern": "<[^>]*>",
"Replacement": ""
}
```

### Fixing Encoding Issues

Replace common encoding artifacts:

```json
{
"$type": "RegexStringManipulator",
"Description": "Fix common encoding issues",
"Enabled": true,
"Pattern": "’|“|â€\u009d",
"Replacement": "'"
}
```

## Good Practices

### Pattern Testing

- **Test regex patterns** thoroughly before applying to production data
- **Use regex testing tools** to validate patterns against sample data
- **Consider edge cases** and unintended matches in your patterns

### Performance Considerations

- **Order manipulators efficiently**: Place simpler patterns before complex ones
- **Use specific patterns**: Avoid overly broad regex that may match unintended content
- **Consider field length**: Set appropriate `MaxStringLength` to prevent excessive processing

### Data Safety

- **Backup source data**: Always maintain backups before applying string manipulations
- **Test with sample data**: Validate manipulations on a subset before full migration
- **Review results**: Check processed fields to ensure transformations are correct

### Configuration Management

- **Document patterns**: Include clear descriptions for each manipulator
- **Version control**: Maintain configuration files in version control
- **Incremental changes**: Test one manipulator at a time when developing complex transformations

## Troubleshooting

### Common Issues

**Manipulations Not Applied:**

- Verify the tool is enabled (`"Enabled": true`)
- Check that individual manipulators are enabled
- Review regex patterns for syntax errors
- Ensure the tool is configured in the processor's tool list

**Unexpected Results:**

- Test regex patterns in isolation with sample data
- Check the order of manipulators (they execute sequentially)
- Verify escape sequences in JSON configuration
- Review field content before and after processing

**Performance Issues:**

{{< class-sample sample="classic" >}}
- Consider reducing `MaxStringLength` if processing very large fields
- Optimize regex patterns to avoid catastrophic backtracking
- Disable unnecessary manipulators
- Process smaller batches of work items

## Metadata
**Regex Pattern Errors:**

{{< class-metadata >}}
- Validate regex syntax using online tools or testing utilities
- Escape special characters properly in JSON configuration
- Consider case sensitivity requirements
- Test patterns against various input scenarios

## Schema

Expand Down
Loading
Loading