|
| 1 | +# 966. Vowel Spellchecker |
| 2 | + |
| 3 | +Given a `wordlist`, we want to implement a spellchecker that converts a query word into a correct word. |
| 4 | + |
| 5 | +For a given `query` word, the spell checker handles two categories of spelling mistakes: |
| 6 | + |
| 7 | +- Capitalization: If the query matches a word in the wordlist (case-insensitive), |
| 8 | + then the query word is returned with the same case as the case in the wordlist. |
| 9 | + - Example: `wordlist = ["yellow"]`, `query = "YellOw"`: `correct = "yellow"` |
| 10 | + - Example: `wordlist = ["Yellow"]`, `query = "yellow"`: `correct = "Yellow"` |
| 11 | + - Example: `wordlist = ["yellow"]`, `query = "yellow"`: `correct = "yellow"` |
| 12 | + |
| 13 | +- Vowel Errors: If after replacing the vowels `('a', 'e', 'i', 'o', 'u')` of the query word with any vowel individually, |
| 14 | + it matches a word in the wordlist (case-insensitive), then the query word is returned with the same case as the match in the wordlist. |
| 15 | + - Example: `wordlist = ["YellOw"]`, `query = "yollow"`: `correct = "YellOw"` |
| 16 | + - Example: `wordlist = ["YellOw"]`, `query = "yeellow"`: `correct = ""` (no match) |
| 17 | + - Example: `wordlist = ["YellOw"]`, `query = "yllw"`: `correct = ""` (no match) |
| 18 | + |
| 19 | +In addition, the spell checker operates under the following precedence rules: |
| 20 | + |
| 21 | +- When the query exactly matches a word in the wordlist (case-sensitive), you should return the same word back. |
| 22 | +- When the query matches a word up to capitlization, you should return the first such match in the wordlist. |
| 23 | +- When the query matches a word up to vowel errors, you should return the first such match in the wordlist. |
| 24 | +- If the query has no matches in the wordlist, you should return the empty string. |
| 25 | + |
| 26 | +Given some `queries`, return a list of words `answer`, where `answer[i]` is the correct word for `query = queries[i]`. |
| 27 | + |
| 28 | +**Constraints:** |
| 29 | + |
| 30 | +- `1 <= wordlist.length, queries.length <= 5000` |
| 31 | +- `1 <= wordlist[i].length, queries[i].length <= 7` |
| 32 | +- `wordlist[i]` and `queries[i]` consist only of only English letters. |
| 33 | + |
| 34 | +## 基礎思路 |
| 35 | + |
| 36 | +本題要實作一個拼字檢查器,針對每個 `query` 依**優先順序**套用三種比對規則來找出正確字: |
| 37 | + |
| 38 | +1. **完全一致(區分大小寫)**、2) **忽略大小寫**、3) **忽略母音差異**(將所有母音統一替換為 `*` 的「去母音」規格化)。 |
| 39 | + 關鍵在於:為了能在每個查詢以常數時間決策結果,我們先用 `wordlist` **預先建好三種索引**: |
| 40 | + |
| 41 | +- 區分大小寫的 `Set`(處理完全一致) |
| 42 | +- 忽略大小寫的 `hash map`(保留**最先出現**的對應,處理大小寫錯誤) |
| 43 | +- 去母音後的 `hash map`(同樣保留**最先出現**的對應,處理母音錯誤) |
| 44 | + |
| 45 | +查詢時依優先序逐一嘗試命中;若皆不命中則回傳空字串。母音快速判斷可用 **bitmask**(a,e,i,o,u)實作;去母音採「轉小寫+母音→`*`」的單趟掃描以提升效率。 |
| 46 | + |
| 47 | +## 解題步驟 |
| 48 | + |
| 49 | +### Step 1:定義母音 bitmask(a,e,i,o,u)以便常數時間判斷 |
| 50 | + |
| 51 | +```typescript |
| 52 | +// 針對 'a'..'z'(以 0 為起點的索引)建立母音的位元遮罩:a,e,i,o,u 對應位元 0,4,8,14,20 |
| 53 | +const VOWEL_BITMASK_A_TO_Z = |
| 54 | + (1 << 0) | (1 << 4) | (1 << 8) | (1 << 14) | (1 << 20); |
| 55 | +``` |
| 56 | + |
| 57 | +### Step 2:實作「去母音」的字串規格化(轉小寫+母音改為 `*`) |
| 58 | + |
| 59 | +```typescript |
| 60 | +function devowelWord(word: string): string { |
| 61 | + const length = word.length; |
| 62 | + let result = ""; |
| 63 | + |
| 64 | + for (let i = 0; i < length; i++) { |
| 65 | + let code = word.charCodeAt(i); |
| 66 | + |
| 67 | + // 針對 ASCII 大寫 A–Z 進行快速小寫轉換 |
| 68 | + if (code >= 65 && code <= 90) { |
| 69 | + code |= 32; |
| 70 | + } |
| 71 | + |
| 72 | + const alphaIndex = code - 97; // 'a' => 0 |
| 73 | + if (alphaIndex >= 0 && alphaIndex < 26) { |
| 74 | + if (((VOWEL_BITMASK_A_TO_Z >>> alphaIndex) & 1) === 1) { |
| 75 | + result += "*"; |
| 76 | + } else { |
| 77 | + result += String.fromCharCode(code); |
| 78 | + } |
| 79 | + } else { |
| 80 | + result += String.fromCharCode(code); |
| 81 | + } |
| 82 | + } |
| 83 | + |
| 84 | + return result; |
| 85 | +} |
| 86 | +``` |
| 87 | + |
| 88 | +### Step 3:以 `wordlist` 建三種索引(精確、忽略大小寫、去母音) |
| 89 | + |
| 90 | +```typescript |
| 91 | +// 完全一致(區分大小寫)的集合 |
| 92 | +const caseSensitiveDictionary: Set<string> = new Set(wordlist); |
| 93 | + |
| 94 | +// 忽略大小寫與去母音的雜湊表(僅記錄「最先出現」的匹配) |
| 95 | +const caseInsensitiveDictionary: Record<string, string> = Object.create(null); |
| 96 | +const devoweledDictionary: Record<string, string> = Object.create(null); |
| 97 | + |
| 98 | +// 預先建立三種索引 |
| 99 | +for (let i = 0; i < wordlist.length; i++) { |
| 100 | + const word = wordlist[i]; |
| 101 | + const lowerCaseWord = word.toLowerCase(); |
| 102 | + |
| 103 | + if (caseInsensitiveDictionary[lowerCaseWord] === undefined) { |
| 104 | + caseInsensitiveDictionary[lowerCaseWord] = word; |
| 105 | + } |
| 106 | + |
| 107 | + const devoweledWord = devowelWord(lowerCaseWord); |
| 108 | + if (devoweledDictionary[devoweledWord] === undefined) { |
| 109 | + devoweledDictionary[devoweledWord] = word; |
| 110 | + } |
| 111 | +} |
| 112 | +``` |
| 113 | + |
| 114 | +### Step 4:依優先序處理每個 `query` 並回填答案 |
| 115 | + |
| 116 | +```typescript |
| 117 | +// 預先配置輸出陣列 |
| 118 | +const output = new Array<string>(queries.length); |
| 119 | + |
| 120 | +// 逐一處理查詢 |
| 121 | +for (let i = 0; i < queries.length; i++) { |
| 122 | + const query = queries[i]; |
| 123 | + |
| 124 | + // 1. 完全一致(區分大小寫) |
| 125 | + if (caseSensitiveDictionary.has(query)) { |
| 126 | + output[i] = query; |
| 127 | + } else { |
| 128 | + // 2. 忽略大小寫 |
| 129 | + const lowerCaseQuery = query.toLowerCase(); |
| 130 | + const caseInsensitiveHit = caseInsensitiveDictionary[lowerCaseQuery]; |
| 131 | + |
| 132 | + if (caseInsensitiveHit !== undefined) { |
| 133 | + output[i] = caseInsensitiveHit; |
| 134 | + } else { |
| 135 | + // 3. 去母音後比對 |
| 136 | + const devoweledQuery = devowelWord(lowerCaseQuery); |
| 137 | + const devoweledHit = devoweledDictionary[devoweledQuery]; |
| 138 | + |
| 139 | + if (devoweledHit !== undefined) { |
| 140 | + output[i] = devoweledHit; |
| 141 | + } else { |
| 142 | + // 4. 皆未命中 |
| 143 | + output[i] = ""; |
| 144 | + } |
| 145 | + } |
| 146 | + } |
| 147 | +} |
| 148 | + |
| 149 | +return output; |
| 150 | +``` |
| 151 | + |
| 152 | +## 時間複雜度 |
| 153 | + |
| 154 | +- 預處理 `wordlist`:每個單字進行一次小寫化與去母音轉換,長度上限為常數 7,因此可視為對 `wordlist.length = W` 的線性處理。 |
| 155 | +- 查詢階段:對每個 `query` 進行最多三次常數時間查找(`Set` / `hash map`),加上長度至多 7 的去母音轉換,整體對 `queries.length = Q` 為線性。 |
| 156 | +- 總時間複雜度為 $O(W + Q)$。 |
| 157 | + |
| 158 | +> $O(W + Q)$ |
| 159 | +
|
| 160 | +## 空間複雜度 |
| 161 | + |
| 162 | +- 需要儲存三種索引結構:`Set(wordlist)`、`caseInsensitiveDictionary`、`devoweledDictionary`,大小與 `wordlist` 成線性關係。 |
| 163 | +- 去母音與小寫轉換僅用到暫時字串,與單字長度(最大 7)成常數關係。 |
| 164 | +- 總空間複雜度為 $O(W)$。 |
| 165 | + |
| 166 | +> $O(W)$ |
0 commit comments