-
Notifications
You must be signed in to change notification settings - Fork 6.2k
planner: preserve IN for varchar-vs-numeric comparisons with a common cmp type #67498
Description
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
CREATE TABLE test_data (
id VARCHAR(10),
col_a VARCHAR(20),
col_b INT,
col_c DECIMAL(10,2),
col_d VARCHAR(30)
);
INSERT INTO test_data VALUES
('1', 'alpha', 1, 1.00, 'x'),
('2', 'beta', 2, 2.00, 'y'),
('3', 'gamma', 3, 3.00, 'z');
EXPLAIN FORMAT = 'plan_tree'
SELECT col_a, col_b
FROM test_data
WHERE id IN (1, 2, 3);A larger real-world shape is a generated SQL with a long numeric literal list against a string column, for example:
EXPLAIN ANALYZE
SELECT col_a, col_b
FROM test_data
WHERE id IN (501,502,503,504,505, ...);2. What did you expect to see? (Required)
TiDB should preserve a single IN predicate while keeping MySQL comparison semantics.
Since MySQL compares VARCHAR vs numeric literals as DOUBLE, the expected predicate shape is:
in(cast(test.test_data.id, double BINARY), 1, 2, 3)This keeps the expression compact even for long IN lists.
3. What did you see instead (Required)
TiDB rewrites the IN predicate into a DNF of equality predicates:
or(
eq(cast(test.test_data.id, double BINARY), 1),
eq(cast(test.test_data.id, double BINARY), 2),
eq(cast(test.test_data.id, double BINARY), 3)
)For long IN lists this produces a very large expression tree in the planner, even though every element uses the same comparison type.
Warning: even after this planner fix, VARCHAR col IN (numeric literals...) is still a non-ideal SQL shape for index usage. The comparison has to follow MySQL's mixed-type semantics and usually cannot use the string index as efficiently as a type-aligned predicate. Users should still align the literal type with the column type whenever possible, for example compare VARCHAR columns with string literals instead of numeric literals.
The root cause is that IN is only preserved when the comparison type is exactly the left operand's eval type. In this case the comparison type is consistently DOUBLE, but the left operand is VARCHAR, so TiDB unnecessarily expands it to OR-of-EQ.
4. What is your TiDB version? (Required)
Observed on master at commit e07318bec6a3d7fd9b3be66c423eeafe25688eaf.
Suggested labels: type/bug, component/expression, sig/planner, severity/moderate, may-affects-8.5