Commit 86deef3
authored
Add SarBp optimizations and accuracy improvements (#1140)
* Add SarBp optimizations and accuracy improvements
This PR includes several SarBp performance optimizations for the fltflt case:
- Adds fltflt_sqrt_fast(), which uses fewer operations at a very slight
accuracy cost.
- Adds fltflt_norm3d(), which uses fewer normalizations and calls
fltflt_sqrt_fast()
- Splits the calculation of bin such that more terms can be precomputed and stored
in shared memory. Reduced inner loop bin calculation from ~24 FLOPs to ~18.
In addition, this PR adjusts he computation of the weight w to preserve more
bits. Previously, the mixed and fltflt implementations computed bin as:
bin = static_cast<loose_compute_t>(diffR * dr_inv) + bin_offset;
w = bin - ::floor(bin)
However, with large bin counts, this can leave relatively few bits of precision
for w. The fltflt and mixed variants have been adjusted to preserve more
accuracy at the cost of performance. All told, the fltflt version is ~15% faster
due to the optimizations, but the mixed-precision version is slower due to increased
use of FP64. In the future, a new option may be added to reduce the precision
of the bin calculation for scenarios/ranges where that makes sense.
Signed-off-by: Thomas Benson <tbenson@nvidia.com>1 parent 5bf9643 commit 86deef3
File tree
7 files changed
+467
-47
lines changed- bench
- 00_misc
- scripts
- include/matx
- kernels
- transforms
- test
- 00_misc
- 00_transform
7 files changed
+467
-47
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
409 | 409 | | |
410 | 410 | | |
411 | 411 | | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
412 | 563 | | |
413 | 564 | | |
414 | 565 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
206 | 249 | | |
207 | 250 | | |
208 | 251 | | |
| |||
254 | 297 | | |
255 | 298 | | |
256 | 299 | | |
257 | | - | |
| 300 | + | |
258 | 301 | | |
259 | 302 | | |
260 | 303 | | |
| |||
386 | 429 | | |
387 | 430 | | |
388 | 431 | | |
389 | | - | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
390 | 435 | | |
391 | 436 | | |
392 | 437 | | |
| |||
410 | 455 | | |
411 | 456 | | |
412 | 457 | | |
413 | | - | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
414 | 462 | | |
415 | 463 | | |
416 | 464 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
57 | | - | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
58 | 60 | | |
59 | 61 | | |
60 | 62 | | |
| |||
142 | 144 | | |
143 | 145 | | |
144 | 146 | | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
145 | 151 | | |
146 | 152 | | |
147 | 153 | | |
| |||
519 | 525 | | |
520 | 526 | | |
521 | 527 | | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
522 | 582 | | |
523 | 583 | | |
524 | 584 | | |
| |||
0 commit comments