-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
q_ld2_lane_post_inc wrongly detects:
ld2 { v10.S, v11.S }[0], [x22], #8
ld2 { v10.S, v11.S }[1], [x10], #8
as a jointly destructive pattern and remodels v10/v11 from input/output to output in the first instruction.
Note that this would be correct for
ld2 { v10.D, v11.D }[0], [x22], #8
ld2 { v10.D, v11.D }[1], [x10], #8
as it's indeed overwriting the entire register.
This problem doesn't seem likely in a full optimized kernel, but it is quite common when using the split heuritic and there it can break the self test. Below is an example that demonstrates this and occasionally results in selftest failure:
start:
umull v29.2D, v25.2S, v23.2S
umull v28.2D, v25.2S, v20.2S
zip2 v2.4S, v21.4S, v14.4S
zip1 v22.4S, v21.4S, v14.4S
mov v4.d[0], v2.d[1]
mov v18.d[0], v25.d[1]
mov v7.d[0], v22.d[1]
shl v27.2S, v11.2S, #1
shl v3.2S, v4.2S, #1
shl v5.2S, v18.2S, #1
mul v21.2S, v4.2S, v31.2S
umull v4.2D, v25.2S, v27.2S
shl v14.2S, v2.2S, #1
shl v0.2S, v22.2S, #1
mov v17.d[0], v16.d[1]
shl v26.2S, v7.2S, #1
umull v1.2D, v25.2S, v14.2S
umlal v4.2D, v5.2S, v3.2S
mul v7.2S, v7.2S, v31.2S
mul v10.2S, v2.2S, v31.2S
umlal v1.2D, v5.2S, v26.2S
umlal v1.2D, v16.2S, v0.2S
shl v9.2S, v5.2S, #1
shl v19.2S, v17.2S, #1
umlal v29.2D, v5.2S, v18.2S
umlal v29.2D, v10.2S, v2.2S
umlal v1.2D, v19.2S, v17.2S
umull v6.2D, v25.2S, v26.2S
umull v15.2D, v25.2S, v19.2S
umlal v4.2D, v16.2S, v14.2S
umlal v4.2D, v19.2S, v26.2S
umlal v4.2D, v22.2S, v22.2S
umlal v4.2D, v12.2S, v20.2S
umlal v15.2D, v18.2S, v23.2S
umlal v28.2D, v18.2S, v27.2S
umlal v28.2D, v16.2S, v3.2S
umlal v28.2D, v17.2S, v14.2S
umlal v28.2D, v22.2S, v26.2S
umull v22.2D, v25.2S, v3.2S
umull v13.2D, v25.2S, v0.2S
umlal v6.2D, v18.2S, v0.2S
umull v8.2D, v25.2S, v25.2S
umlal v22.2D, v18.2S, v14.2S
umull v18.2D, v25.2S, v5.2S
umlal v13.2D, v5.2S, v19.2S
umlal v13.2D, v16.2S, v16.2S
umlal v15.2D, v12.2S, v0.2S
mul v5.2S, v11.2S, v31.2S
usra v28.2D, v4.2D, #26
add x10, sp, #64
umlal v18.2D, v12.2S, v23.2S
umlal v18.2D, v5.2S, v19.2S
bic v25.16B, v28.16B, v24.16B
shl v2.2S, v26.2S, #1
shl v20.2S, v3.2S, #1
usra v8.2D, v25.2D, #25
umlal v15.2D, v5.2S, v26.2S
umlal v15.2D, v21.2S, v14.2S
usra v8.2D, v25.2D, #24
umlal v18.2D, v21.2S, v0.2S
umlal v18.2D, v10.2S, v26.2S
usra v8.2D, v25.2D, #21
umlal v22.2D, v16.2S, v26.2S
umlal v22.2D, v17.2S, v0.2S
umlal v8.2D, v7.2S, v26.2S
umlal v8.2D, v12.2S, v9.2S
umlal v6.2D, v16.2S, v19.2S
umlal v6.2D, v12.2S, v14.2S
umlal v1.2D, v5.2S, v11.2S
umlal v22.2D, v12.2S, v27.2S
shl v25.2S, v19.2S, #1
umlal v8.2D, v5.2S, v23.2S
umlal v8.2D, v21.2S, v25.2S
umlal v8.2D, v10.2S, v0.2S
umlal v29.2D, v12.2S, v25.2S
and v25.16B, v4.16B, v30.16B
umlal v6.2D, v5.2S, v3.2S
umlal v13.2D, v21.2S, v3.2S
umlal v13.2D, v12.2S, v2.2S
umlal v29.2D, v5.2S, v0.2S
umlal v29.2D, v21.2S, v2.2S
usra v18.2D, v8.2D, #26
umlal v1.2D, v12.2S, v20.2S
umlal v13.2D, v5.2S, v14.2S
add x22, sp, #168
usra v29.2D, v18.2D, #25
and v9.16B, v28.16B, v24.16B
and v0.16B, v8.16B, v30.16B
usra v15.2D, v29.2D, #26
ld2 { v10.S, v11.S }[1], [x10], #8
and v2.16B, v29.16B, v30.16B
usra v13.2D, v15.2D, #25
and v3.16B, v15.16B, v24.16B
ld2 { v10.S, v11.S }[0], [x22], #8
usra v6.2D, v13.2D, #26
end:
Metadata
Metadata
Assignees
Labels
No labels