diff --git a/UTF8_SELECTOR_SUPPORT.md b/UTF8_SELECTOR_SUPPORT.md new file mode 100644 index 00000000..5d5b3197 --- /dev/null +++ b/UTF8_SELECTOR_SUPPORT.md @@ -0,0 +1,183 @@ +# UTF-8 Selector Support in Terminator + +## Status: ✅ Supported (with caveats) + +This document describes the state of UTF-8/Unicode support in Terminator selectors, particularly for non-ASCII characters like Chinese, Japanese, Korean, Arabic, etc. + +## Summary + +**Rust Core**: ✅ Fully supports UTF-8 selectors +- Selector parsing correctly handles multi-byte UTF-8 characters +- Comprehensive tests added in `terminator/src/tests/utf8_selector_tests.rs` +- Tests cover Chinese (中文), Japanese (日本語), Korean (한국어), Arabic (العربية), Cyrillic (Русский), and emoji + +**Python Bindings**: ✅ Should work automatically +- PyO3 handles UTF-8 string conversion from Python to Rust automatically +- Python 3.x uses UTF-8 as default string encoding + +**Potential Issues**: +1. **Windows UI Automation API**: The underlying `uiautomation` crate (v0.22.0) passes strings to Windows UI Automation + - Windows may use UTF-16 internally, but the Rust `windows` crate handles conversion + - Element name matching depends on how the target application exposes element names + - Some applications may not correctly expose localized element names + +2. **Locale-specific behavior**: Element names may differ based on: + - Windows display language + - Application language settings + - Regional settings + +## Testing + +### Rust Unit Tests +Run the comprehensive UTF-8 selector tests: +```bash +cargo test utf8_selector_tests +``` + +These tests verify that selector parsing works correctly with: +- Chinese characters (e.g., `role:Button|name:提交`) +- Japanese hiragana/katakana (e.g., `name:こんにちは`) +- Korean hangul (e.g., `text:안녕하세요`) +- Arabic RTL text (e.g., `name:مرحبا`) +- Cyrillic (e.g., `role:Button|name:Привет`) +- Mixed scripts (e.g., `role:Window|name:Settings 设置`) +- Emoji (e.g., `name:保存 💾`) + +### Python Live Tests +Use the provided test script on a system with localized applications: + +```bash +# Install the Python package first +cd bindings/python +pip install -e . + +# Run the test +python ../../test_chinese_selectors.py +``` + +This script: +1. Opens Calculator +2. Lists all button names with their UTF-8 byte representations +3. Tests various selector encoding approaches +4. Verifies selector string passthrough from Python to Rust + +## Known Limitations + +1. **Application-dependent**: Whether UTF-8 selectors work depends on how the target application exposes element names + - Some apps may only expose English names even on localized Windows + - Some apps may expose localized names that match the Windows display language + +2. **Windows UI Automation quirks**: + - Element names are retrieved via `IUIAutomationElement::get_CurrentName()` + - The returned string is in UTF-16, converted to UTF-8 by the `windows` crate + - Matching is case-insensitive and uses substring matching (via `contains_name`) + +3. **Testing challenges**: + - Comprehensive testing requires access to Windows systems with different display languages + - Calculator button names vary by Windows version and language + +## Recommendations for Users + +### For Chinese/CJK Systems: + +1. **First, inspect actual element names**: +```python +import asyncio +import terminator + +async def inspect_elements(): + desktop = terminator.Desktop() + window = desktop.open_application("calc.exe") + await asyncio.sleep(2) + + # Get all buttons to see their actual names + buttons = await window.locator("role:Button").all(timeout_ms=5000, depth=10) + for btn in buttons[:10]: + print(f"Button name: '{btn.name()}'") + +asyncio.run(inspect_elements()) +``` + +2. **Use the actual names you observe**: +```python +# If you see the button is named "显示为" +element = await window.locator("role:Button|name:显示为").first() +``` + +3. **Fallback to role + NativeId when names are not reliable**: +```python +# Use AutomationId which is usually language-independent +element = await window.locator("nativeid:CalculatorResults").first() +``` + +### For Workflow Authors: + +Prefer language-independent selectors when possible: +- `nativeid:` - Uses AutomationId (language-independent) +- `classname:` - Uses class names (language-independent) +- `#id` - Numeric IDs (generated, language-independent) + +Use localized names only when necessary, and document: +```yaml +steps: + - tool_name: click_element + arguments: + # Note: This selector is for Chinese Windows + # English Windows users should use: role:Button|name:Display + selector: "role:Button|name:显示为" +``` + +## Implementation Details + +### Selector Parsing (Rust) +File: `terminator/src/selector.rs` + +The selector parser uses standard Rust string methods: +- `split()` - Safe for UTF-8 (splits at character boundaries) +- `trim()` - Safe for UTF-8 +- `strip_prefix()` - Safe for UTF-8 + +**Important**: Byte indexing (e.g., `s[5..]`) is safe when the prefix is ASCII (like `"role:"`, `"name:"`), as the index will always be at a UTF-8 character boundary. + +### Windows Platform (Rust) +File: `terminator/src/platforms/windows/engine.rs` + +String matching uses the `uiautomation` crate's `contains_name()` method: +```rust +matcher_builder = matcher_builder.contains_name(name); +``` + +This passes the UTF-8 string to the `uiautomation` crate, which converts it to UTF-16 for Windows APIs. + +### Python Bindings +File: `bindings/python/src/desktop.rs`, `bindings/python/src/locator.rs` + +PyO3 handles conversion automatically: +```rust +pub fn locator(&self, selector: &str) -> PyResult { + let locator = self.inner.locator(selector); + Ok(Locator { inner: locator }) +} +``` + +The `&str` parameter in Rust is UTF-8, and PyO3 automatically converts Python's UTF-8 strings to Rust's UTF-8 `&str`. + +## Related Issues + +- Issue #299: Chinese character support in selectors +- The comprehensive tests were added in response to this issue + +## Future Improvements + +1. Add integration tests with real localized applications +2. Test on Windows systems with different display languages +3. Document known application-specific quirks +4. Consider adding a helper to detect system locale and suggest appropriate selectors + +## Contributing + +If you encounter issues with UTF-8 selectors: +1. Run `test_chinese_selectors.py` and share the output +2. Report which application and Windows version you're using +3. Include the actual element names from the UI tree inspection +4. Specify your Windows display language setting diff --git a/terminator/src/tests/mod.rs b/terminator/src/tests/mod.rs index 262692fc..7577e562 100644 --- a/terminator/src/tests/mod.rs +++ b/terminator/src/tests/mod.rs @@ -13,6 +13,8 @@ mod performance_tests; #[cfg(all(test, target_os = "windows"))] mod selector_tests; mod test_serialization; +#[cfg(test)] +mod utf8_selector_tests; // Initialize tracing for tests pub fn init_tracing() { diff --git a/terminator/src/tests/utf8_selector_tests.rs b/terminator/src/tests/utf8_selector_tests.rs new file mode 100644 index 00000000..10488537 --- /dev/null +++ b/terminator/src/tests/utf8_selector_tests.rs @@ -0,0 +1,225 @@ +//! Tests for UTF-8 character support in selectors (Chinese, Japanese, Korean, etc.) +//! +//! This test file verifies that Terminator correctly handles non-ASCII characters +//! in selector strings, including Chinese characters, emoji, and other UTF-8 text. +//! +//! Related issue: #299 + +use crate::Selector; + +#[test] +fn test_chinese_characters_in_role_name_selector() { + // Test Chinese characters in role|name format + let selector_str = "role:Button|name:提交"; // "Submit" in Chinese + let selector = Selector::from(selector_str); + + match selector { + Selector::Role { role, name } => { + assert_eq!(role, "Button"); + assert_eq!(name, Some("提交".to_string())); + } + _ => panic!("Expected Role selector, got: {selector:?}"), + } +} + +#[test] +fn test_japanese_characters_in_name_selector() { + // Test Japanese characters (Hiragana) + let selector_str = "name:こんにちは"; // "Hello" in Japanese + let selector = Selector::from(selector_str); + + match selector { + Selector::Name(name) => { + assert_eq!(name, "こんにちは"); + } + _ => panic!("Expected Name selector, got: {selector:?}"), + } +} + +#[test] +fn test_korean_characters_in_text_selector() { + // Test Korean characters (Hangul) + let selector_str = "text:안녕하세요"; // "Hello" in Korean + let selector = Selector::from(selector_str); + + match selector { + Selector::Text(text) => { + assert_eq!(text, "안녕하세요"); + } + _ => panic!("Expected Text selector, got: {selector:?}"), + } +} + +#[test] +fn test_emoji_in_selector() { + // Test emoji characters + let selector_str = "role:Button|name:保存 💾"; // Save with floppy disk emoji + let selector = Selector::from(selector_str); + + match selector { + Selector::Role { role, name } => { + assert_eq!(role, "Button"); + assert_eq!(name, Some("保存 💾".to_string())); + } + _ => panic!("Expected Role selector, got: {selector:?}"), + } +} + +#[test] +fn test_mixed_language_selector() { + // Test mixed English and Chinese + let selector_str = "role:Window|name:Settings 设置"; + let selector = Selector::from(selector_str); + + match selector { + Selector::Role { role, name } => { + assert_eq!(role, "Window"); + assert_eq!(name, Some("Settings 设置".to_string())); + } + _ => panic!("Expected Role selector, got: {selector:?}"), + } +} + +#[test] +fn test_chinese_in_chained_selector() { + // Test Chinese characters in chained selectors + let selector_str = "role:Window|name:主窗口 >> role:Button|name:确定"; + let selector = Selector::from(selector_str); + + match selector { + Selector::Chain(parts) => { + assert_eq!(parts.len(), 2); + + // First part + if let Selector::Role { role, name } = &parts[0] { + assert_eq!(role, "Window"); + assert_eq!(name, &Some("主窗口".to_string())); // "Main Window" + } else { + panic!("Expected first part to be Role selector"); + } + + // Second part + if let Selector::Role { role, name } = &parts[1] { + assert_eq!(role, "Button"); + assert_eq!(name, &Some("确定".to_string())); // "OK" + } else { + panic!("Expected second part to be Role selector"); + } + } + _ => panic!("Expected Chain selector, got: {selector:?}"), + } +} + +#[test] +fn test_arabic_rtl_text() { + // Test Arabic (right-to-left) text + let selector_str = "name:مرحبا"; // "Hello" in Arabic + let selector = Selector::from(selector_str); + + match selector { + Selector::Name(name) => { + assert_eq!(name, "مرحبا"); + } + _ => panic!("Expected Name selector, got: {selector:?}"), + } +} + +#[test] +fn test_cyrillic_characters() { + // Test Cyrillic characters (Russian) + let selector_str = "role:Button|name:Привет"; // "Hello" in Russian + let selector = Selector::from(selector_str); + + match selector { + Selector::Role { role, name } => { + assert_eq!(role, "Button"); + assert_eq!(name, Some("Привет".to_string())); + } + _ => panic!("Expected Role selector, got: {selector:?}"), + } +} + +#[test] +fn test_special_unicode_characters() { + // Test various Unicode special characters + let test_cases = vec![ + ("name:文本编辑器", "文本编辑器"), // Chinese "Text Editor" + ("name:ファイル", "ファイル"), // Japanese "File" + ("name:파일", "파일"), // Korean "File" + ("name:Файл", "Файл"), // Russian "File" + ("name:Αρχείο", "Αρχείο"), // Greek "File" + ("name:ملف", "ملف"), // Arabic "File" + ]; + + for (selector_str, expected_name) in test_cases { + let selector = Selector::from(selector_str); + match selector { + Selector::Name(name) => { + assert_eq!(name, expected_name, "Failed for selector: {selector_str}"); + } + _ => panic!("Expected Name selector for '{selector_str}', got: {selector:?}"), + } + } +} + +#[test] +fn test_utf8_byte_length_vs_char_length() { + // Verify that string slicing works correctly with multi-byte UTF-8 characters + // This tests the internal string handling in selector parsing + let selector_str = "role:你好"; // Chinese "Hello" - each character is 3 bytes in UTF-8 + let selector = Selector::from(selector_str); + + match selector { + Selector::Role { role, name } => { + assert_eq!(role, "你好"); + assert_eq!(name, None); + // Verify byte length != character length + assert_eq!(role.len(), 6); // 2 Chinese chars * 3 bytes each + assert_eq!(role.chars().count(), 2); // 2 characters + } + _ => panic!("Expected Role selector, got: {selector:?}"), + } +} + +#[test] +fn test_nativeid_with_chinese() { + // Test NativeId selector with Chinese characters + let selector_str = "nativeid:按钮_提交"; + let selector = Selector::from(selector_str); + + match selector { + Selector::NativeId(id) => { + assert_eq!(id, "按钮_提交"); + } + _ => panic!("Expected NativeId selector, got: {selector:?}"), + } +} + +#[test] +fn test_classname_with_unicode() { + // Test ClassName selector with Unicode + let selector_str = "classname:UI控件"; + let selector = Selector::from(selector_str); + + match selector { + Selector::ClassName(class) => { + assert_eq!(class, "UI控件"); + } + _ => panic!("Expected ClassName selector, got: {selector:?}"), + } +} + +#[test] +fn test_contains_with_chinese() { + // Test contains: prefix with Chinese characters + let selector_str = "role:Button|contains:提交"; + let selector = Selector::from(selector_str); + + match selector { + Selector::Role { role, name } => { + assert_eq!(role, "Button"); + assert_eq!(name, Some("提交".to_string())); + } + _ => panic!("Expected Role selector, got: {selector:?}"), + } +} diff --git a/test_chinese_selectors.py b/test_chinese_selectors.py new file mode 100644 index 00000000..d453463e --- /dev/null +++ b/test_chinese_selectors.py @@ -0,0 +1,113 @@ +# -*- coding: utf-8 -*- +""" +Test script for Chinese/UTF-8 selector support in Terminator +This tests whether Chinese characters work in selectors on Chinese Windows systems +""" +import asyncio +import sys +import terminator + + +async def test_chinese_selectors(): + print("=" * 60) + print("Testing UTF-8/Chinese Selector Support") + print("=" * 60) + print(f"Python version: {sys.version}") + print(f"Default encoding: {sys.getdefaultencoding()}") + print() + + # Create desktop instance + desktop = terminator.Desktop(log_level="debug") + + try: + # Open Calculator + print("Opening Calculator...") + try: + calculator_window = desktop.open_application("calc.exe") + except Exception as e: + print(f"Failed to open calculator: {e}") + return + + await asyncio.sleep(2) + + # Get window tree to see actual element names + print("\n" + "=" * 60) + print("Getting Calculator Window Tree") + print("=" * 60) + + # Get all button elements to see their actual names + try: + buttons = await calculator_window.locator("role:Button").all(timeout_ms=5000, depth=10) + print(f"\nFound {len(buttons)} buttons in Calculator") + print("\nFirst 20 button names:") + for i, btn in enumerate(buttons[:20]): + name = btn.name() + print(f" {i+1}. Name: '{name}' | Bytes: {name.encode('utf-8')!r}") + except Exception as e: + print(f"Failed to get buttons: {e}") + + # Test 1: Try to find a button using its English name (baseline test) + print("\n" + "=" * 60) + print("Test 1: English Selector (Baseline)") + print("=" * 60) + + try: + # Try finding button "One" (should work on English systems) + one_button = await calculator_window.locator("Name:One").first() + print(f"✅ Found 'One' button: {one_button.name()}") + except Exception as e: + print(f"❌ Failed to find 'One' button: {e}") + + # Test 2: Try Chinese selectors if on Chinese system + print("\n" + "=" * 60) + print("Test 2: Chinese/Unicode Selectors") + print("=" * 60) + + # Test different encoding approaches + test_cases = [ + ("Direct UTF-8 string", "name:显示为"), + ("Raw string", r"name:显示为"), + ("Unicode escape", "name:\u663e\u793a\u4e3a"), + ("Explicit encode/decode", ("name:" + "显示为".encode('utf-8').decode('utf-8'))), + ("Role with Chinese", "role:Button|name:显示为"), + ] + + for description, selector in test_cases: + print(f"\nTesting: {description}") + print(f" Selector: {selector}") + print(f" Selector bytes: {selector.encode('utf-8')!r}") + print(f" Selector repr: {repr(selector)}") + + try: + element = await calculator_window.locator(selector).first() + print(f" ✅ SUCCESS! Found element: {element.name()}") + except Exception as e: + print(f" ❌ FAILED: {e}") + + # Test 3: Verify selector parsing in Rust + print("\n" + "=" * 60) + print("Test 3: Selector String Roundtrip") + print("=" * 60) + + chinese_text = "显示为" + selector_str = f"role:Button|name:{chinese_text}" + + print(f"Original string: {chinese_text}") + print(f"Original bytes: {chinese_text.encode('utf-8')!r}") + print(f"Selector string: {selector_str}") + print(f"Selector bytes: {selector_str.encode('utf-8')!r}") + + # Create locator and check if it's parsed correctly + locator = calculator_window.locator(selector_str) + print(f"✅ Locator created successfully") + + except Exception as e: + print(f"\n❌ Unexpected error: {e}") + import traceback + traceback.print_exc() + + +if __name__ == "__main__": + print("Starting Chinese selector tests...") + asyncio.run(test_chinese_selectors()) + print("\nTests complete!") diff --git a/test_notepad.yml b/test_notepad.yml new file mode 100644 index 00000000..e1fdfb9a --- /dev/null +++ b/test_notepad.yml @@ -0,0 +1,14 @@ +steps: + - tool_name: open_application + arguments: + app_name: "notepad" + + - tool_name: delay + arguments: + delay_ms: 1500 + + - tool_name: type_into_element + arguments: + selector: "role:Edit" + text_to_type: "Hello World from Terminator!" + verify_action: false