Skip to content

Commit d3b071a

Browse files
committed
Steal highly optimizable and smart algorithm to do LWS detection
invented by the clang project. It does LWS detection in just a handful instructions and no branches, as compared to the dozen instructions and 4 branches currently. Some info on how it works here: https://pdimov.github.io/blog/2020/07/19/llvm-and-memchr/ ==== C code === int is_ws_original(char cp) { return ((cp == ' ') || (cp == '\r') || (cp == 'n') || (cp == '\t') || (cp == ',')); } int is_ws_new(unsigned char ch) { const unsigned int mask = (1 << (' ' - 1)) | (1 << ('\r' - 1)) | (1 << ('\n' - 1)) | (1 << ('\t' - 1)); ch--; return ch < ' ' && ((1 << ch) & mask); } ==== Compiled, latest clang/trunk, -O2, x86_64 ==== is_ws_original: # @is_ws_original mov eax, 1 cmp dil, 32 ja .LBB6_1 movzx ecx, dil movabs rdx, 4294976000 bt rdx, rcx jb .LBB6_4 .LBB6_1: cmp dil, 110 jne .LBB6_2 .LBB6_4: ret .LBB6_2: xor eax, eax cmp dil, 44 sete al ret is_ws_new: # @is_ws_new add dil, -1 cmp dil, 32 setb al movzx ecx, dil mov edx, -2147478784 bt edx, ecx setb cl and cl, al movzx eax, cl ret ====
1 parent 659b329 commit d3b071a

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

trim.h

+8-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,14 @@
2525
#include "str.h"
2626

2727
/* whitespace */
28-
#define is_ws(c) ((c) == ' ' || (c) == '\r' || (c) == '\n' || (c) == '\t')
28+
static inline int
29+
is_ws(unsigned char ch)
30+
{
31+
const unsigned int mask = (1 << (' ' - 1)) | (1 << ('\r' - 1)) |
32+
(1 << ('\n' - 1)) | (1 << ('\t' - 1));
33+
ch--;
34+
return ch < ' ' && ((1 << ch) & mask);
35+
}
2936

3037
/*
3138
* trim leading ws

0 commit comments

Comments
 (0)