Remove unicode diacritics in default filter function

**Suggestion**

Currently, the search string normalization in `cmdk/src/command-score.ts` only performs lowercasing and space character replacement:

```typescript
function formatInput(string) {
  // convert all valid space characters to space so they match each other
  return string.toLowerCase().replace(COUNT_SPACE_REGEXP, ' ')
}
```

This approach does not handle unicode diacritics (e.g., accents in café, naïve, etc.). As a result, searches for "cafe" will not match "café".

**Proposal:**

Extend search string normalization to remove unicode diacritics using `String.prototype.normalize('NFD')` and a regex to strip combining marks:

```typescript
function formatInput(string) {
  return string
    .toLowerCase()
    .normalize('NFD') // Decompose unicode characters
    .replace(/[\u0300-\u036f]/g, '') // Remove diacritical marks
    .replace(COUNT_SPACE_REGEXP, ' ')
}
```

This change will make search matching more robust for international users and improve search results for text containing diacritics.

**Location:**
- [`cmdk/src/command-score.ts`](https://github.com/pacocoursey/cmdk/blob/d6fde235386414196bf80d9b9fa91e2cf89a72ea/cmdk/src/command-score.ts)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove unicode diacritics in default filter function #386

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remove unicode diacritics in default filter function #386

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions