Skip to content

Commit

Permalink
Squashed 'tpls/kokkos-kernels/' changes from b9c1bab7a..25a31f881
Browse files Browse the repository at this point in the history
25a31f881 Merge pull request #1877 from ndellingwood/master
b6a2db921 Update master_history.txt
14ad220a9 Merge branch 'release-candidate-4.1.00' for 4.1.00
1592d9ed9 Merge pull request #1874 from ndellingwood/fix-compatibility-kokkos-4.0
9620913d1 Merge pull request #1873 from kokkos/update-changelog-4.1.00
9e9351bd1 CHANGELOG: small updates
a3c07dfad CHANGELOG: organizing enhancements section
2579c4e3c CHANGELOG: reorganizing the new features section
c1176142b Update changelog for 4.1.00
a0d99bf69 Merge pull request #1868 from lucbv/MKL_INT
7871bd233 Merge pull request #1867 from bartlettroscoe/tril-11966-bad-batched-incl-dir
e624a7d3b Update to version 4.1.00
af312b9a0 Merge pull request #1850 from e10harvey/issue1764
340895119 Merge pull request #1865 from ndellingwood/update-testall
ec4a4cb09 Merge pull request #1864 from vqd8a/streams-tests-fix-small-numthreads
77745756f Add tests for nstreams=1
98eb68eda Merge branch 'develop' into streams-tests-fix-small-numthreads
4dbb5838e Check concurrency with nstream instead
c62d07442 cm_test_all_sandia: updates for blake
cec953f37 Merge pull request #1861 from cwpearson/fix/rocm-5.2.0-hang-quick
22b5f4ef1 Merge pull request #1862 from e10harvey/workaround_gnu_bug_81429
03998f350 Merge branch 'develop' into streams-tests-fix-small-numthreads
b2581bb2d Apply clang format
ba75b4b58 Remove redundant file
6a71179ab Restore orig. KokkosSparse_BsrMatrix.hpp
71f04ce8a Workaround checking OMP_NUM_THREADS with number of streams
f75ec31ce sparse/src: Add ifdef for doxgen < v1.9.7
ce8bb989f Benchmark cleanup for par_ilut and spmv (#1853)
6d79eaf5d sparse/src: Work around gnu compiler bug
478a56b53 use host pointer mode in rocBLAS scal
232b5bdac Merge pull request #1814 from e10harvey/issue1804
8b3c95135 Merge pull request #1856 from e10harvey/enable_sphinx_werror
8fae08018 Merge pull request #1783 from e10harvey/batched_gemm_eti
7865e88ac Merge pull request #1857 from e10harvey/issue1673
8b62c3851 Merge pull request #1855 from ndellingwood/issue-1749
eb92728a6 batched/unit_test: Optionally skip simd dcomplex4
558dbe4a9 docs: Update trmm. Add trtri.
24d259b0d docs: Fix blas rst files
dec2bcb8d Remove TestDeviceType
c5b2305aa docs: Enable sphinx -werror
07dc82a8d docs: Fix sphinx warnings
d88ad3523 sparse: Various doxygen fixes
9d723f6fe batched/dense: Add gesv DynRankView runtime checks
a907ca594 Merge pull request #1854 from ndellingwood/patch-match-trilinos-11921
87a384657 Address PR feedback
fea22d883 Revert ".github/workflows: Print out arch in osx CI"
341a4779f Revert ".github/workflows: Print out arch in osx CI"
91c0b606a Revert ".github/workflows: Print out arch in osx CI"
0f54c3da9 Merge pull request #1852 from e10harvey/docs_parilut_handle_fix
48d67ff62 CMakeLists.txt: Add all_libs alias
a8884845a CMakeLists.txt: Add alias to match what is exported from Trilinos
127c28198 Remove non-existant subdir kokkos-kernels/common/common (#11921, #11863)
d7c9a0771 docs/developer: Add Experimental namespace
fa2bdef62 Merge pull request #1843 from e10harvey/docs_compiler_profiling
b43d47557 Merge pull request #1844 from bartlettroscoe/remove-nonexistant-incl-dir
b3328390e KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos/Trilinos#11545)
ef98cb76a Merge pull request #1848 from e10harvey/fix_typos
4b3bab673 Merge pull request #1849 from ndellingwood/update-cmake-option-naming
5b369abef Update cmake option naming in docs/comments
723ab23aa blas/tpls: Fix gemm include guard typo
c5302a1ca docs: Add profiling for compile times
ac60cd4e2 Merge pull request #1841 from cwpearson/fix/spot-check-tpls-rocm
9292be86d batched/dense/impl: Fix headers
407e31a99 Merge pull request #1835 from dalg24/cuda_uvm
3a1ea766b batched/dense: cleanup gemm handle
5ece26d89 batched/dense: cleanup and move ETI into spec file
90c8a5ed1 batched/eti: Use Trans from KokkosBlas
48d647966 cmake: Fix batched eti args
557002b53 batched/CMakeLists.txt: ETI valid args only
d55ba7bf3 batched/dense/unit_test: Add TEST SKIPPED prints
1c256b1a3 batched: fix eti avail and wrapper
033a75e27 Merge pull request #1820 from vqd8a/sptrsv-solve-streams
940217b31 Merge pull request #1840 from ndellingwood/update-caraway-queues
c0349db3a .github/workflows: Print out arch in osx CI
611641996 .github/workflows: Print out arch in osx CI
9d4de5dbe add rocblas and rocsparse to --spot-check-tpls
9ad25c9b9 batched: note that tpl struct is unused
f663066d6 batched: Remove empty decl ETI files
721f388f9 batched: Populate avail eti files
d55fb1054 .github/workflows: Print out arch in osx CI
f64d6361a batched/dense: Add HostLevel Gemm unification layer
62b863de5 batched/dense/impl: Remove forward decls
dca6ee561 batched/dense/src: Add KokkosBatched_HostLevel_Gemm.hpp
40d76ebc6 perf_test/blas/blas3: Add compile-time checks for BatchLayout
57bfb3f0b batched/dense/unit_test: Run tests if ETI_ONLY is disabled
7ad0ede54 Start moving into HostLevel headers
813d02967 minor cleanup
60ddbb25a Fix constexpr branch
7b6073bb9 batched/eti: ETI host-level interfaces
237597a00 cm_test_all_sandia: update to add caraway queues for MI210, MI250
3917bd320 Merge pull request #1821 from lucbv/spmv_benchmark
82d93a25c Support rocSparse in rocm 5.2.0 (#1833)
5070d87b5 Merge pull request #1824 from e10harvey/issue1823
5ea1c3c32 Update perf_test/sparse/KokkosSparse_spmv_benchmark.cpp
2b3a070c1 applying clang-format
1a69ed2ae SpMV benchmark: adding logic for spmv algorithm
29c24f2bd SpMV: applying clang-format to benchmark
e3b6eb19e SpMV: adding logic in benchmark to chose algorithm to test.
09dc9ff27 SpMV: applying clang-format to benchmark source file
f75527cd6 SpMV: adding benchmark for spmv
7df961ef9 Merge pull request #1836 from dalg24/cleanup_kokkos_enable_pthread
49b0c491d Merge pull request #1834 from dalg24/remove_dead_code
08f4a4613 Merge pull request #1828 from ndellingwood/fix-cusparse-version-check
c74db8cc0 Merge pull request #1826 from brian-kelley/FixRhelNightly
058f099e1 Merge pull request #1827 from lucbv/Kokkos_ALL_t
6f26e1527 Drop outdated workarounds for backward compatibility with now unsupported Kokkos versions
e329be8dc Do not bother querying the value of Kokkos_ENABLE_CUDA_UVM
3273a031b Do not adjust KokkosKernels_INST_MEMSPACE_CUDA[UVM]SPACE default value
ebd1406fb Remove dead code guarded by `#ifdef KOKKOSKERNELS_INST_MEMSPACE_CUDAHOSTPINNEDSPACE`
abe8558b1 Remove remaining decl.hpp files
b1e22208f Remove includes of decl.hpp files
ad541587d sparse/eti: Remove unused decl.hpp.in files
2f0ce87ca Merge pull request #1830 from ndellingwood/weaver-update
4a8667228 scripts/cm_test_all_sandia: Update cuda11 modules
09a4820b3 cm_test_all_sandia: updates for weaver
d0f4a9ca0 Merge branch 'develop' into sptrsv-solve-streams
ea3321c2f Apply clang format
bf498cd4a Remove unnecessary code
b3ef19c74 Applying clang-format
28e813086 Sparse: fixing a few issues related to coo2csr and par_ilut benchmark
f30291cd1 spmv cusparse version check modified for cuda/11.1
1424f8aef Kokkos 4 compatibility: modifying the preprocessor logic
990d7db76 Fix errors and warnings in sems-rhel nighly
2bb633d46 .github/workflows: Summarize github-DOCS errors and warnings
69d0a8b5b Add BsrMatrix SpMV in rocSparse TPL, rewrite BsrMatrix SpMV unit tests (#1769)
63eab04f5 Merge pull request #1819 from ndellingwood/fix-rocblas-build-2
86784956b Merge pull request #1816 from cwpearson/ci/KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
7c9e7b433 Merge pull request #1822 from lucbv/ger_doc
3794a36be Merge pull request #1818 from jgfouca/jgfouca/par_ilut_perf_test_refactor2
af4688919 Ger: adding documentation stubs in apidocs
5b1c1f4fa Remove unused variable
19333668c Merge branch 'develop' into sptrsv-solve-streams
89d67ff14 Apply clang format
924cdee42 Add unit test for sptrsv via streams
787c711bb Merge pull request #1686 from e10harvey/coo2crs
725b46b89 apply clang-format
b8a22cc6c blas: fixups for ger exec space instances
146ce522f blas: various rocblas execspace fixes
4f1abd794 apply clang-format
954750d0c rocblas tpl spec: add missing comma separating vars in some macros
42ef78393 Merge pull request #1756 from eeprude/ger2
6e80b37f9 formatting
b60e681da Reorganize par_ilut performance test
bf06fef9a Merge pull request #1810 from ndellingwood/fix-rocblas-build
28a0421c2 Merge pull request #1812 from lucbv/blas2_3_on_stream
98c6509eb Merge pull request #1817 from bartlettroscoe/tril-11545-kokkos-no-subpks-develop
ab0f774cd Workaround for #1777 - cusparse spgemm test hang (#1811)
6c514ff1c Merge pull request #1813 from ndellingwood/update-changelog-4.0.01
4e6c85c39 Docs: adding stubs for trsm and trmm and updating gemv and gemm
cd242ba2f New performance test for par_ilut, ginkgo::par_ilut, and spill (#1799)
0ba9eaa3a Manually remove redundant Kokkos dep (#11545)
099d05784 Run script remove_kokkos_subpackages_from_trilinos_packages_r.sh (#11545)
9d95d49d1 only enable KokkosBlas gesv test for CUDA+MAGMA and HOST+BLAS
ff664866d cm_test_all_sandia: load openblas/0.3.20/rocm/5.2.0 for TPL spot check on caraway
28254863f Apply clang format
166716a87 No need to fence after each level
415deb091 Update changelog
1ae83cf16 Update changelog
4fc4831fb Update changelog
2f78417b7 Some changes in sptrsv_solve_streams for cuSPARSE < 11.3
5d027ccec Add sptrsv_solve_streams for cuSPARSE < 11.3
db991036e BLAS2/3: applying clang-format
c00c8a6e3 BLAS2/3: fixing some TPLs issues with execution space code path
a725974a3 Minor fixes for sptrsv cuSPARSE
1331baf11 Merge pull request #1808 from ndellingwood/master
e65f61147 sparse/unit_test: Use host mirror of RandCsMatrix map
005530354 Minor compilation error. Thanks to Luc for the proper suggestion.
19903279d Formatting
7720c8199 Added explanations
0e26dd1a1 Tests passing now at blake
99cbf779d Possible corrections for test on blake
31f2b0555 Fix name mismatch with rocblas tpl spec layer
5a5a2946c sparse: Encapsulate CooMatrix. Cleanup coo2crs TODO.
c208dacae Update master_history.txt
8809e41ca Update to version 4.0.1
9cee1a3d7 sparse/unit_test: Check last entry of col_map. Improve readability.
946f29a63 Merge branch 'develop' into sptrsv-solve-streams
29034f31a Minor changes to match L solve and U solve implementations
311157f62 Merge pull request #1795 from lucbv/norms_on_stream
2087e7009 BLAS3: starting to add stream support for TPL code path of trmm/trsm
4231677db Formatting
89eab5240 Changes made for compilation in blake
961b6362a Changes for testing in blake
6e06af03c Backup
215a00692 Formatting
ab59a34cf Another typo
ac307232e Typo
5cf9c3ea9 Formatting
0453f0d02 Forgot some spots that need a template parameter for the execution space
f27e4d034 Formatting
792bd5fa8 Correcting compilation errors on blake
a368dd3cb Formatting
6742ef3bf Solving compilation issues on the automatic tests
52a2a2de2 Corrections for some automatic tests that are failing
3a91bb0e5 Proper formatting
9f49fb972 Addressing new feedbacks from Luc.
d94c0139e Minor corrections
629337c26 Needed to format two extra files in kokkos-dev-2 in order for the automatic 'check' step to pass
7ce9d9f83 The clang formatting from kokkos-dev-2 puts a space into these 3 files, which needed (the space) to be removed in my Mac in order for the compilation to work. Tests pass in my Mac.
e41861865 All files formatted with clang 8.0
414210378 Addressed all feedbacks from Luc and Kim
b21194af4 Handling compilation warnings and errors at weaver
99a3b9dac All changes again, because previous branch got changes beyond those related to ger
13c5d8633 BLAS2/3: adding proper execution space interfaces to gemv and gemm
31e00593f Merge branch 'release-candidate-4.0.01' for 4.0.01
a46ebd5e9 Merge pull request #1719 from lucbv/gmres_type_fixes
d3b8bc823 BLAS1: adding final fences for code path that return host results
cb9fc79da Merge pull request #1768 from e10harvey/more_sparse_docs
f83016589 BLAS1: applying clang-format
e36c50e4b BLAS1: nrm2w adding support for execution space overload
20463f2a4 BLAS1: nrm1/nrm2 update CUBLAS calls
a0d52184d BLAS1: nrm2(_squared) updated to have executions_space overload
f0088ab94 BLAS1: nrminf fix in the TPL layer for execution space overload
be556c08a BLAS1 nrminf: adding execution space overload
a760a1d60 BLAS nrm1: fixing issues with TPLs
4538fc446 Blas1: updating nrm1 interface to accept execution space instance
ccf8f1557 Merge pull request #1805 from lucbv/blas1_on_stream_docs
03d678724 BLAS1: clang-format for documentation... : (
6606dde03 BLAS1: documentation adding default space info and non-block statement
daf1edce6 BLAS1: updating documentation for changes in PR #1803
3ce7f2985 Merge pull request #1803 from lucbv/blas1_on_stream
6d673920c Merge branch 'develop' into sptrsv-solve-streams
5f89a772f sparse: Fix intel build error
bb0e2fef3 BLAS1: fix documentation for fill and mult and apply clang-format
bf09ba19b BLAS1: fix CUBLAS TPL layer for axpby and scal
fa03d4884 Update blas1.rst
fb6318907 Merge pull request #15 from brian-kelley/GS_Docs
ffefb5386 BLAS1: applying clang format
9d45383d2 BLAS1: fix some Host BLAS TPL issue with execution space overload
b3d73f1d0 Add doxygen for user-facing Gauss-Seidel functions
2949394c0 BLAS1: apply clang-format
93986fd68 sparse: coo2crs add RandomAccess to BmapViewType
2d3c2c4f4 Update sparse/src/KokkosSparse_par_ilut.hpp
4ad4962c5 Update docs/developer/apidocs/sparse.rst
8a35f819a Update docs/developer/contrib.rst
4ce5d2a4e sparse: coo2crs and crs2coo updates
394409fb4 docs: build_doc
4c6d55b11 docs: Update contrib
6e150ac9d sparse: CooMatrix
6016771b3 sparse: CooMatrix
82e13ca28 Update changelog
0dcbd6a17 par_ilut: make Ut_values view atomic in compute_l_u_factors (#1781)
710a2396b Jgfouca/remove par ilut limitations (#1755)
8233f7330 ParIlut: create and destroy spgemm handle for each usage (#1736)
957298552 GMRES: fixing some type issues related to memory space instantiation
49339eb3f Merge pull request #1661 from jgfouca/jgfouca/par_ilut_test
8077e640b Update changelog
4df81e5d0 Fix #1758 (#1762)
221495705 Merge pull request #1763 from lucbv/roc_tpls_upgrade
98c72b5f4 Merge pull request #1759 from tmranse/tmranse/mdfInterface
cfd5928e2 Update changelog
8928788a4 Update version to 4.0.01
229608457 Patch Trilinos #11663
48ca11b50 Fix kk_generate_diagonally_dominant_sparse_matrix hang (#1689)
99654d8cf Merge pull request #1737 from e10harvey/reduce_test_coverage
8e90d005f Remove unused variable (#1734)
db917b2f4 Merge pull request #1727 from lucbv/cuda_11_4_fixes
6cfc547ce Merge pull request #1704 from e10harvey/doc_typos
1f266de0d Merge pull request #1698 from cwpearson/fix/kk-1692
4b731c4fb Merge pull request #1801 from e10harvey/include_omp_settings
01547c447 Blas1: supporting execution space on BLAS1 kernels
1d33c6f9b scripts: Include OMP settings
f78e4eb74 sparse: specify memory space for coo2crs
ea9db31d1 Merge pull request #1800 from brian-kelley/Fix1798
40eac2958 Fix #1798
790c9f506 Blas1: adding execution space instance interface for abs
f69755715 Merge pull request #1797 from kokkos/cwpearson/docs-apt-update
81477dc0d Update docs.yml
ec611fe92 Blas1: adding execution space overload of axpy and axpby
038def615 sparse: Add coo2crs, crs2coo and CooMatrix
a2a741da2 Merge pull request #1649 from e10harvey/get_ci_back_up
6dc008e11 Merge pull request #1796 from e10harvey/fix-docs-check
0b871d129 Remove deprecated code
2b63c1a61 scripts: Fix github-DOCS
26dac2932 scripts: Final changes for clang 10
a176b931b Fix #1786: check that work array is contiguous in SVD (#1793)
03f48fae6 BLAS: fixes and testing for LayoutStride (#1794)
e3a42e418 Fix compile errors
f3ec3b464 Merge branch 'develop' into sptrsv-solve-streams
bcaa37fc8 Merge pull request #1751 from NexGenAnalytics/benchmark-blas3-tests
507c29f68 par_ilut: make Ut_values view atomic in compute_l_u_factors (#1781)
1a6f22b1c Report layouts used
c025caacd Port blas3 gemm test
5015a2cdf Merge pull request #1733 from NexGenAnalytics/5-google-benchmark-blas2-tests
e2d1a1d69 Merge pull request #1790 from kliegeois/fixUnusedVar
ec392dc43 Merge pull request #1789 from NexGenAnalytics/benchmark-openmp-context
b654dd63b Merge pull request #1784 from masterleinad/fix_sycl_printf
7c798ae97 cuSPARSE trisolve with streams
0fd4f2878 Fix unused variable warnings
b1185f3a9 Include OpenMP environment variables in benchmark context
97187c3af Allow passing additional arguments
20ad98ac6 Add execution space to policies
15d616983 Reduce duplication
5d237f8b6 Support all command line parameters
35ee9ee7e Fix formatting
332485486 Add registration wrapper
34a228689 Parse blas2 custom command line parameters
f38b56ab1 Let benchmark decide number of iterations
03728a8b8 Use CMake helper for ODE_RK benchmark
1d70e7aeb Parse common parameters
10dc298b5 Move warm-up out of benchmarking loop
24923b79e Use separate executable
6c21c4df2 Revert changes to blas1 benchmark
278d18fac Use stored time value
b3da12558 Use correct header
7336d9c2f Add a benchmark for LayoutRight
6d027010a Let benchmark calculate FLOP/s
0678b55b1 Include scalar type in the output
e87d532c2 Let benchmark decide the number of repetitions
3b8c2da3d Remove redundant output
bfc68039d #5: Create blas2 gemv benchmark test
8154037ff Merge pull request #1779 from NexGenAnalytics/8-refactor-cmake-mkl
9f12713ad Add --enable-docs option to cm_generate_makefile (#1785)
0a95fff2d Merge pull request #1776 from tmranse/mdfComplex
31ef8f6bf Intial stream interface
837bf841d Merge branch 'develop' into sptrsv-solve-streams
a645960c9 Merge pull request #1773 from brian-kelley/SortAndMergeEarlyExit
005822bcf Merge pull request #1728 from vqd8a/spiluk_numeric-streams
0564b18d3 Merge branch 'develop' into spiluk_numeric-streams
4ca54ed15 Use KOKKOS_IMPL_DO_NOT_USE_PRINTF in Test_Common_UpperBound.hpp
e2ca0694a Merge branch 'develop' into sptrsv-solve-streams
6dc2a6a53 Re-enable and clean up triangle counting perf test (#1752)
378ffb32e Merge pull request #1770 from kliegeois/device_blas2
dc6f763f3 Remove the printf inside the team kernels.
0ae0d31e1 Formatting & remove unused typedefs
17b71d2b3 Add compile-time checks for SortCrs functions
893132ccd Allowed template arg deduction for sort_, sort_and_merge
d49004f77 Remvoe deprecated KokkosKernels::Impl:: sort functions
f666fba99 Sort and merge improvements
47322fbe5 Merge pull request #1778 from lucbv/fix_gesv_uninitialized
ec7ce2133 Gesv: using a value-initialization after all
397a3c660 Gesv: adding small comment for clarity
2114d03b6 Merge pull request #1754 from lucbv/ode_explicit
2bd997ae3 #8 added SYCL path for MKL in FindTPLMKL.cmake file
788018fd4 Batched Gesv: initializing variable to make compiler happy
6b4b8bb17 ODE: fix small typo and rebase error
22cd43ce1 ODE: adding support for adaptive time stepping
9ff29b38d ODE: adding new component for time integration
51ac81620 use crs_matrix view traits for magnitude view
1c2105bb1 remove deprecated Rank call
8ef7d05e8 Move TeamSpmv and TeamVectorSpmv to KokkosSparse
70db534be add support for complex data types in MDF
8f3574e33 spgemm handle: check that A,B,C graphs never change (#1742)
a975fa3e0 #8 Updated FindTPLMKL.cmake to support SYCL option from kokkos
aa96a83ad Jgfouca/remove par ilut limitations (#1755)
7d6485eaa Formatting
43bf36595 Make Werror build happy
f8b2a5e5a Update docs/developer/apidocs/sparse.rst
4dd7e613c Add par_ilu numeric docs
53599f47d Fix #1758 (#1762)
6c003deb3 Fix the doc of KokkosBlas2_team_spmv.hpp
bebcf360d Using Kokkos::ArithTraits instead of Kokkos::Details::ArithTraits
24cb9017b Add calls to KokkosBlas Gemv and Spmv for team batched kernels when m==1
5edb51a45 #8 update FindTPLMKL.cmake to use find_package(MKL)
c9d22ca1b #8: made functionnal current version (v1) for MKL
5ece7b3dd Merge branch 'develop' into spiluk_numeric-streams
e35ed210b Merge pull request #1763 from lucbv/roc_tpls_upgrade
30bd681ff Merge pull request #1759 from tmranse/tmranse/mdfInterface
75c14cd0b Add par_ilut symbolic docs
a2b18d73e Merge pull request #1765 from e10harvey/host_level_docs
1b123b177 Merge pull request #1767 from e10harvey/update_actions_checkout
a9189f56a clang-format...
3065eb31c ROCSPARSE: fix unused variable in unit-test
01c49a8d2 docs: Add stubs for some sparse APIs
f2c217d57 .github: Update to actions/checkout@v3
3d28a4730 Merge pull request #1711 from cwpearson/feature/search
aaadaa0dd docs: Include BatchedGemm
a0a928194 Merge branch 'develop' into spiluk_numeric-streams
1491bd433 Add exec instance support to sort/sort_and_merge utils (#1744)
8e77c01cc TPLs: replicating changes made in Trilinos for ROCBLAS/ROCSPARSE
45a8d3baf address reviewer comments and run clang-format
b079a4e2d Merge pull request #1672 from brian-kelley/FixSpaddPerftest
25dbdcb9b #7 Removed V2 and V1.
f49d41ead #7: V3: simplest way to get rocsparse and rocblas
5c8d760a3 #7: V2 Added hybrid version for rocblas and rocsparse
8efb0356c #7: (v1): old way for rocsparse and rocblas
27ec2cdb8 Spgemm perf test enhancements (#1664)
a94163cbc Patch Trilinos #11663 (#1757)
0e615295f Merge pull request #1753 from kliegeois/device_blas_refact
a2c1610a8 accept r-value A matrix
f11a70ab6 Merge branch 'develop' into get_ci_back_up
6bcfac5bd Adds team- and thread-based lower-bound and upper-bound search and predicates.
9f2399310 Merge branch 'develop' into spiluk_numeric-streams
b483cfce3 Merge pull request #1732 from cwpearson/fix/kk-1731
c77395716 Add calls to KokkosBlas Dot and Axpy for team batched kernels when m==1
11d442b51 Deprecate Kokkos::Details::ArithTraits (#1748)
a3c919474 Merge pull request #1750 from NexGenAnalytics/1718-print-google-benchmark-version
5595b4a92 Leverage std library in BsrMatrix constructor
943cfc6bb add access to inv permutations to mdf handle
38789c2cc add ability to generate compile_commands.json for clangd
252fbf8a2 Clarify comments for context helper functions
d2f9e0113 Mark functions as inline where appropriate
0912b67ac Include google benchmark lib version in benchmark output
1554ee7a8 Extract benchmark CMake code into a separate file
0e507ae38 openblas is now in standard modulepath
aec946c28 Merge pull request #1737 from e10harvey/reduce_test_coverage
873e2a8b1 Merge pull request #1693 from NexGenAnalytics/5-print-get-CUSPARSE-CUBLAS-versions
2a5309b39 Use concurrency() rather than impl_thread_pool_size()
bf9ed2aee ParIlut: create and destroy spgemm handle for each usage (#1736)
fd7f6e515 cm_test_all_sandia: Add llvm/10.0.1
55f24857e perf test utils: fix device ID parsing (#1739)
a7e7bcb74 Merge pull request #1722 from NexGenAnalytics/5-add-git-info
664bfc4d3 Fix kk_generate_diagonally_dominant_sparse_matrix hang (#1689)
60881471b Remove unused variable (#1734)
2cfc5082b spadd perf test: use common infrastructure
2dff92063 Avoid errors about not finalizing Kokkos
1e0fb0249 Fix/enhance backend issues on spadd perftest
ee059d078 Improve readability
323cefa5d Do not print CUBLAS_VER_BUILD
b6f4c80e9 Rename functions
9cc9328c7 #5: added TplsVersion file and  print methods
54d70dc83 Remove sample benchmark
72de68a8d Revert "Enable benchmarks in CI"
a21ce0982 Enable benchmarks in CI
e8b2d6cd0 Use constexpr variables for git info
2f9352acc Switch to header-only implementation
c32f3ad06 Include git information in benchmark context
3b466361c Generate git information during build
bc9265b0d Fix typo
ff097ec63 Merge pull request #1636 from NexGenAnalytics/5-google-bench-dot-test
be9310d97 Reduce BatchedGemm test coverage
221f7abc0 Work around instance resource limits
4e6c1d76e Merge branch 'develop' into spiluk_numeric-streams
560f37286 Fix unused-parameter nstreams error
cb11f0cff Use clang modules
950f633b7 pull in mkl
5c8067c93 More cleanup.
fa5bdf509 More cleanup
4b4e7b82f Cleanup. Need clang toolchain
f2184cf60 Use openblas tpl
3ac5a6fe1 Use stdlibc++ from gnu 8.2.1
678783275 Get a C++17 stdlibc++ in the path
b8ebb9564 scripts/cm_test_all_sandia:   - Add boiler plate for gnu/10.2.1 and intel/19.0.5.281.
afd686eb5 Merge pull request #1723 from kokkos/docs/cwpearson-html-only
26332eda6 Merge pull request #1727 from lucbv/cuda_11_4_fixes
9b0dfbd0f CUDA 11.4: fixing some failing build while trying to reproduce issue #1725
26bf33311 Merge pull request #1726 from e10harvey/ci_format_docs
ff31df01e .github: Automation reminder
5c2702283 Make Sphinix optional
a9877dc6f Install doxygen-latex for HTML docs
3ec0cb7fc #5: Rebased on develop and added kernels print_configuration call
8be303261 #5: Added better name for benchmark tests
56ef2095f #5: Added team dot benchmark test
4fc790848 #5: Fixed clang-format
e9c968cdd #5: Added dot_mv benchmark test
7be07e5a4 #5: Fixed clang-format errors
7dfe9efde #5: generalized execution space and removed unused include
0361d1d32 #5: Added benchmark dot perf test
482cc00f6 clang format
d83c123ea Add nstreams to symbolic call
08e3824f3 Apply clang format to Test_Sparse_spiluk.hpp
004c1c041 Fix undefined reference errors and clean up printf statements
d17877163 Apply clang format
1f74d4399 Add nstreams to avail_byte calculation
f055b6977 Merge branch 'develop' into spiluk_numeric-streams
f658cc4dd Add spiluk_numeric_streams interface
048155245 Merge pull request #1720 from dalg24/drop_pre_kokkos_36_workaround
f41ff478c Merge pull request #1719 from lucbv/gmres_type_fixes
d9df4fd6b Drop obsolete workaround checking whether KOKKOS_IF_ON_{HOST,DEVICE} macros are defined
e5c8da8fc Merge pull request #1710 from cwpearson/feature/iota
ba311291c Adding fix for LUPrec
3831a680a Merge pull request #1707 from lucbv/kk_config_version
b209a157c Merge pull request #1691 from cwpearson/fix/cmake-force
2f069c4fc Use the options ENABLE_PERFTEST, ENABLE_EXAMPLES (#1667)
4414f46c1 GMRES: fixing some type issues related to memory space instantiation
fa3dd4e13 Merge pull request #1717 from ndellingwood/update-changelog-4.0
b202dcbfd Merge pull request #1714 from cwpearson/ci/format-diff
abcf8d4d1 Merge pull request #1716 from ndellingwood/issue-1715
4f39a18ec Merge pull request #1698 from cwpearson/fix/kk-1692
6ce7ea4ec Merge pull request #1695 from kokkos/update-changelog-to-4.0.0
4abf2a3a8 rocsparse spmv tpl: Fix rocsparse_spmv call for rocm < 5.4.0
8ed861214 Adds KokkosKernels::Impl::Iota, a view-like where iota(i) = i + offset
50758c1b2 Merge pull request #1712 from cwpearson/tests/spmv-controls
813626471 Merge pull request #1701 from cwpearson/fix/kk-issue-1700
3a2064350 Merge pull request #1704 from e10harvey/doc_typos
fcf349d33 print the patch that clang-format-8 wants to apply
6ead86002 add explicit tests of opt-in algorithms
a3ab61082 CUSPARSE_MM_ALG_DEFAULT deprecated by 11.1
8697db1e4 Merge pull request #1709 from lucbv/comp_4_0_0
d63de38b5 Merge pull request #1707 from lucbv/kk_config_version
76968d3f7 Merge pull request #1691 from cwpearson/fix/cmake-force
7f3acf133 Compatibility upgrade: adding compatibility branch in code
8469d478f Kokkos Kernels version: need to use upper case variables
f40aabfea Merge pull request #1706 from lucbv/fix_team_mult
db0071a43 team mult: applying clang-format
562aaffd9 team mult: fix type issue in max_error calculation
d692d3585 Merge pull request #1703 from cwpearson/fix/kk-1702
f1dd58cf7 Merge pull request #1694 from lucbv/test_eti_only_off
0b88c05ed test mixed scalars: adding more comments and sending msg to cerr
31190a68c blas/blas1: Add mult docs
f46b24258 blas/blas1: Fix a couple documentation typos.
e4b324c8c test mixed scalars: incorporate Evan's comments
016384fff View::Rank -> View::rank
feb9f9ae6 use rocsparse_spmv_ex for rocm >= 5.4.0
e9ec43800 Introduce KOKKOSKERNELS_ALL_COMPONENTS_ENABLED variable
5153da336 Merge pull request #1697 from cwpearson/fix/kk-1696
8aa7fa23e cast Kokkos::Impl::integral_constant to int
602c526d7 Tested mixed scalars: removing temporary output
557e62a67 Test mixed scalars: more fixes related to mixed scalar tests
6d73c141e Merge pull request #1687 from lucbv/version_integration_fix
45ffc0849 Versions: fixing the CMake logic to export Kokkos Kernels version
5f5b9e0c5 Merge pull request #1685 from e10harvey/test_eti_only
37efc3bff Merge pull request #1665 from NexGenAnalytics/5-print-configuration
8206953f5 scripts: add --disable-test-eti-only
d2386da91 Merge pull request #1615 from lucbv/gemm_mixed_scalars
1fccf4a27 Mixed Scalars: fixing typo
31a756661 Mixed Scalars: fixing some type conversion in unit-tests
92b82ef88 Mixed Scalars: modifying one more test according to review comment
1507de8dc Mixed Scalars: modifying according to PR comments.
e9f463439 Mix Scalars: fixing the tolerance in axpby
d76e8e18a BLAS: mixed gemm
4a29cafbe Merge pull request #1683 from vqd8a/spiluk-nondeterministic-numeric
2140e99b0 #5 Fixed typo
7f579fb5c #5 rebased on develop and updated print_version method for kernels
d12158be6 #5: Fixed mistake in filename and updated Kernels version key
3ddf1dea0 #5: Fixed clang format and removed form this PR benchmark modification
8c1a89e0e #5: Added inline to avoit multiple define problem
e3c311bd7 #5: updated key verification
95b9ddcb5 #5 Updated print_configuration content format
32d58f6c3 #5: fixed previous commit mistake
634b2cad7 #5: added print_configuration file and its test
b60e9913f #5: moved print_configuration to header only file and added its test
cc11c6d7a #5: Added basis for print_configuration method
9455f6505 BLAS: fix build with KokkosKernels_TEST_ETI_ONLY=OFF
747bb9303 Merge pull request #1661 from jgfouca/jgfouca/par_ilut_test
9ff35198d Add utility KokkosSparse::removeCrsMatrixZeros(A, tol) (#1681)
c7765bc1d Merge pull request #1680 from lucbv/export_version_info
a66a5d6d6 Fix uninitialized error
a67bc42ce Apply clang format
d7ca7e7a4 Merge branch 'develop' into spiluk-nondeterministic-numeric
e2b8df3fd Make hlevel_ptr a separate allocation
6d02704ad Remove one unnecessary barrier
0b5bc7a61 Fix race condition when read and write L_values at the same k
76d9ed4ab formatting
7f78fceb1 Support alpha and beta in LUPrec::apply
304bcdaea Merge pull request #1676 from lucbv/perf_test_wrapper
fd8bf8ae4 Update perf_test/sparse/KokkosSparse_mdf.cpp
a936394f1 Merge branch 'develop' into spiluk-nondeterministic-numeric
b0965b7d4 Spgemm non-reuse: unification layer and TPLs (#1678)
d3ffe8214 Perf Tests: adding utilities and instantiation wrapper
c9e631b61 Version: applying clang-format
c284ef4ac Version: adding unit-test to verify that version info is available
c2486ab14 Merge pull request #1679 from dalg24/view_rank
6a6a51045 Fix warnings
1d19eeabb CMake: export version and subversion to config file
a05f21e3b Prefer View::{R->r}ank
91222dba2 formatting
6a4bf14ce Address GH feedback
9586dd948 Use sptrsv instead of blas::trsm
aac450bd3 Merge pull request #1624 from lucbv/MDF_alg_upgrade
9095beb5c MDF: improving performance and adding performance test
61ba79b8a Merge pull request #1677 from masterleinad/update_sycl
167ad420e Update SYCL docker file to include oneDPL
566570a87 Temporary workaround for Kokkos #5860 (#1675)
d1ee1a43e format fix
e50849b37 Fix for openmp-only
771f0f2cf Fix warnings
6615b77c0 Merge remote-tracking branch 'origin/develop' into jgfouca/par_ilut_test
08b71b3ff Fix @file tags in a few headers
d83b0649c Turn off main par_ilut+gmres test if kokkos::serial is not enabled
a89349ddb formatting
dd930a662 Fixes: trsm expects host views
b9bcc5f49 Add new assert/require macros. Other minor fixes
834a85ece Use the options ENABLE_PERFTEST, ENABLE_EXAMPLES (#1667)
6d6ed244e Merge pull request #1670 from masterleinad/update_sycl
a3ee83b55 Merge pull request #1666 from brian-kelley/FixOmpImplThreads
e72bc3859 SYCL CI: Specify the full path to the compiler
16c97ddb6 Call concurrency(), not impl_thread_pool_size()
dec1753fc Testing working in serial and openmp (IF I force determinism on parIlut)
da90033b7 Merge pull request #1654 from dalg24/clock_tic
56cdbd2c6 Merge pull request #1653 from dalg24/drop_pre_kokkos_36_workaround
55eb42008 Merge pull request #1652 from dalg24/are_integral
f0229f902 Merge pull request #1651 from masterleinad/fix_sycl_printf
5b30c5a05 Merge pull request #1662 from ndellingwood/update-version-4.x
40474b7d9 Merge pull request #1660 from masterleinad/update_sycl
03180cdf1 Merge pull request #1659 from kliegeois/fix_documentation_typo
b4d8ca8bf Update nightly SYCL setup
4df9db90a Hands off Kokkos::Impl::are_integral
f33376a54 Add Impl::are_integral_v helper variable template
4ee798d83 Drop pre Kokkos 3.6 workaround
a4bea4798 Replace printf in device code for SYCL
12e1b814b Do not use Kokkos::Impl::clock_tic, prefer std::chrono to get a random seed
839453184 Merge pull request #1647 from e10harvey/issue1571
93ecefbc9 Fix LUPrec license
3074b4b01 CMakeLists.txt: update version to 4.0.99
45630287b Merge remote-tracking branch 'origin/develop' into jgfouca/par_ilut_test
b4f3dd0eb Fix documentation regressions
51c3c5a0c Fix whitespace
cc38f32ef Add deprecated code disable to docs build.
10d155ad4 Merge branch 'develop' into issue1571
78833d6ca Merge pull request #1658 from lucbv/kokkos_deprecate_ALL_t
86edac3b1 Minor fixes
ead0712ef Merge branch 'develop' into issue1571
d2f273c02 osx-ci: adding option to disable deprecated_code_4 in Kokkos
b846db97e Apply suggestions from code review
fed582cb5 Fix an error in Krylov Handle documentation
6c5744fd6 Applying clang-format
215c6beb0 Benchmarks: for some reason the current version fails to build
11be16b61 Fixing deprecated usage of Kokkos::Impl::ALL_t in favore of Kokkos::ALL_t
e04475d55 Things building
1ea3a7b90 .github/workflows:   - Added docs.yml   - Save cycles with -DKokkos_ENABLE_TESTS=OFF
25b4fb815 Add new par_ilut test
f87b7d566 Clean up numeric and symbolic
547a6608a Clean up spiluk numeric
432c9541c Fix for VOLTA
81f77d0fb Prefer team size 32
0b4b667f1 Use atomic_add again
3adaa70ba Not use atomic_add
916100baf Initial fix

git-subtree-dir: tpls/kokkos-kernels
git-subtree-split: 25a31f8812330cec6e8ac5d8ea99bb9a2045cbab
  • Loading branch information
etphipp committed Sep 6, 2023
1 parent 903980f commit 5c56e14
Show file tree
Hide file tree
Showing 564 changed files with 26,757 additions and 11,475 deletions.
84 changes: 84 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
name: github-DOCS

on:
pull_request:
branches:
- master
- develop

permissions:
contents: none

jobs:
docs-check:
runs-on: ubuntu-latest
steps:
- name: Install Dependencies
run: |
sudo apt-get update
sudo apt-get install --no-install-recommends doxygen-latex
pip install sphinx
pip install breathe
pip install sphinx-rtd-theme
- name: checkout_kokkos_kernels
uses: actions/checkout@v3
with:
path: kokkos-kernels

- name: checkout_kokkos
uses: actions/checkout@v3
with:
repository: kokkos/kokkos
ref: develop
path: kokkos

- name: configure_kokkos
run: |
mkdir -p kokkos/{build,install}
cd kokkos/build
cmake \
-DCMAKE_CXX_FLAGS="-Werror" \
-DCMAKE_CXX_STANDARD=17 \
-DCMAKE_INSTALL_PREFIX=$PWD/../install \
-DKokkos_ENABLE_COMPILER_WARNINGS=ON \
-DKokkos_ENABLE_DEPRECATED_CODE_3=OFF \
-DKokkos_ENABLE_TESTS=OFF \
-DKokkos_ENABLE_DEPRECATED_CODE_4=OFF \
..
- name: build_and_install_kokkos
working-directory: kokkos/build
run: make -j2 install

- name: configure_kokkos_kernels
run: |
mkdir -p kokkos-kernels/{build,install}
cd kokkos-kernels/build
cmake \
-DKokkos_DIR=$PWD/../../kokkos/install/lib/cmake/Kokkos \
-DCMAKE_INSTALL_PREFIX=$PWD/../install \
-DKokkosKernels_ENABLE_DOCS=ON \
..
- name: build_kokkos_kernels_doxygen
working-directory: kokkos-kernels/build
run: |
echo "Redirecting full output to doxygen.out..."
make Doxygen > doxygen.out 2>&1 || true
error_ret=$(grep 'Error' doxygen.out | head -c 1) || true
if [ ! -z $error_ret ]; then
echo "---- BEGIN: Summary of errors ---- "
cat doxygen.out | grep -i 'error:' || true
echo "---- END: Summary of errors ---- "
echo
echo
echo "---- BEGIN: Summary of warnings ---- "
cat doxygen.out | grep -i 'warning:' || true
echo "---- END: Summary of warnings ---- "
exit 1
fi
- name: build_kokkos_kernels_sphinx
working-directory: kokkos-kernels/build
run: make Sphinx
13 changes: 11 additions & 2 deletions .github/workflows/format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
clang-format-check:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3

- name: Install Dependencies
run: sudo apt install clang-format-8
Expand All @@ -31,9 +31,18 @@ jobs:
fi
done
# If any diffs exist, error out
# If any diffs exist, print the patch and error out
if [[ ! -z $(git status -s -uno . -- ':!.github') ]]; then
echo "The following files require formatting changes:"
git status -s -uno . -- ':!.github'
echo "==== Begin Format Patch ===="
# --cached means show staged changes (git add above)
git --no-pager diff --patch --cached
echo "==== End Format Patch ===="
echo "To automate formatting, see:"
echo " https://kokkos-kernels.readthedocs.io/en/latest/developer/style.html#id1"
exit 1
fi
6 changes: 4 additions & 2 deletions .github/workflows/osx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,12 @@ jobs:

steps:
- name: checkout_kokkos_kernels
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
path: kokkos-kernels

- name: checkout_kokkos
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
repository: kokkos/kokkos
ref: ${{ github.base_ref }}
Expand All @@ -72,6 +72,8 @@ jobs:
-DKokkos_ENABLE_COMPILER_WARNINGS=ON \
-DKokkos_ENABLE_DEBUG_BOUNDS_CHECK:BOOL=${{ matrix.debug_bounds_check }} \
-DKokkos_ENABLE_DEPRECATED_CODE_3=OFF \
-DKokkos_ENABLE_TESTS=OFF \
-DKokkos_ENABLE_DEPRECATED_CODE_4=OFF \
-DCMAKE_BUILD_TYPE=${{ matrix.cmake_build_type }} \
-DCMAKE_INSTALL_PREFIX=$PWD/../install \
..
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,8 @@
.project
*.o
TAGS

#Clangd indexing
compile_commands.json
.cache/
.vscode/
2 changes: 1 addition & 1 deletion BUILD.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ endif()
* Whether to pre instantiate kernels for the scalar type double. This option is KokkosKernels_INST_DOUBLE=ON by default. Disabling this may increase build times.
* Default: ON
* KokkosKernels_INST_EXECSPACE_OPENMP: BOOL
* Whether to pre instantiate kernels for the execution space Kokkos::OpenMP. Disabling this when Kokkos_ENABLE_OpenMP is enabled may increase build times.
* Whether to pre instantiate kernels for the execution space Kokkos::OpenMP. Disabling this when Kokkos_ENABLE_OPENMP is enabled may increase build times.
* Default: ON if Kokkos is OpenMP-enabled, OFF otherwise.
* KokkosKernels_INST_EXECSPACE_SERIAL: BOOL
* Whether to build kernels for the execution space Kokkos::Serial. If explicit template instantiation (ETI) is enabled in Trilinos, disabling this when Kokkos_ENABLE_SERIAL is enabled may increase build times.
Expand Down
168 changes: 166 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,171 @@
# Change Log

## [4.0.0](https://github.com/kokkos/kokkos-kernels/tree/4.0.0) (2023-21-02)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/3.7.01...4.0.0)
## [4.1.00](https://github.com/kokkos/kokkos-kernels/tree/4.1.00) (2023-06-16)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/4.0.01...4.1.00)

### New Features

#### BLAS updates
- Adding interface with execution space instance argument to support execution of BLAS on stream
- Norms on stream [\#1795](https://github.com/kokkos/kokkos-kernels/pull/1795)
- Blas1 on stream [\#1803](https://github.com/kokkos/kokkos-kernels/pull/1803)
- Blas2 and 3 on stream [\#1812](https://github.com/kokkos/kokkos-kernels/pull/1812)
- Improving BLAS level 2 support by adding native implementation and TPL for GER, HER and SYR
- Implementation for BLAS2 ger [\#1756](https://github.com/kokkos/kokkos-kernels/pull/1756)
- Implement BLAS2 syr() and her() functionalities under kokkos-kernels syr() [\#1837](https://github.com/kokkos/kokkos-kernels/pull/1837)

#### Batched updates
- Optimizing algorithms for single input data
- Add calls to KokkosBlas Dot and Axpy for team batched kernels when m==1 [\#1753](https://github.com/kokkos/kokkos-kernels/pull/1753)
- Add calls to KokkosBlas Gemv and Spmv for team batched kernels when m==1 [\#1770](https://github.com/kokkos/kokkos-kernels/pull/1770)

#### Sparse updates
- Adding stream support to ILUK/SPTRSV and sort/merge
- Streams interface for SPILUK numeric [\#1728](https://github.com/kokkos/kokkos-kernels/pull/1728)
- Stream interface for SPTRSV solve [\#1820](https://github.com/kokkos/kokkos-kernels/pull/1820)
- Add exec instance support to sort/sort_and_merge utils [\#1744](https://github.com/kokkos/kokkos-kernels/pull/1744)
- Add BsrMatrix SpMV in rocSparse TPL, rewrite BsrMatrix SpMV unit tests [\#1769](https://github.com/kokkos/kokkos-kernels/pull/1769)
- sparse: Add coo2crs, crs2coo and CooMatrix [\#1686](https://github.com/kokkos/kokkos-kernels/pull/1686)
- Adds team- and thread-based lower-bound and upper-bound search and predicates [\#1711](https://github.com/kokkos/kokkos-kernels/pull/1711)
- Adds KokkosKernels::Impl::Iota, a view-like where iota(i) = i + offset [\#1710](https://github.com/kokkos/kokkos-kernels/pull/1710)

#### Misc updates
- ODE: explicit integration methods [\#1754](https://github.com/kokkos/kokkos-kernels/pull/1754)

### Enhancements:

#### BLAS
- refactor blas3 tests to use benchmark library [\#1751](https://github.com/kokkos/kokkos-kernels/pull/1751)

#### Batched
- batched/eti: ETI host-level interfaces [\#1783](https://github.com/kokkos/kokkos-kernels/pull/1783)
- batched/dense: Add gesv DynRankView runtime checks [\#1850](https://github.com/kokkos/kokkos-kernels/pull/1850)

#### Sparse
- Add support for complex data types in MDF [\#1776](https://github.com/kokkos/kokkos-kernels/pull/1776)
- Sort and merge improvements [\#1773](https://github.com/kokkos/kokkos-kernels/pull/1773)
- spgemm handle: check that A,B,C graphs never change [\#1742](https://github.com/kokkos/kokkos-kernels/pull/1742)
- Fix/enhance backend issues on spadd perftest [\#1672](https://github.com/kokkos/kokkos-kernels/pull/1672)
- Spgemm perf test enhancements [\#1664](https://github.com/kokkos/kokkos-kernels/pull/1664)
- add explicit tests of opt-in algorithms in SpMV [\#1712](https://github.com/kokkos/kokkos-kernels/pull/1712)

#### Common utilities
- Added TplsVersion file and print methods [\#1693](https://github.com/kokkos/kokkos-kernels/pull/1693)
- Add basis skeleton for KokkosKernels::print_configuration [\#1665](https://github.com/kokkos/kokkos-kernels/pull/1665)
- Add git information to benchmark context [\#1722](https://github.com/kokkos/kokkos-kernels/pull/1722)
- Test mixed scalars: more fixes related to mixed scalar tests [\#1694](https://github.com/kokkos/kokkos-kernels/pull/1694)
- PERF TESTS: adding utilities and instantiation wrapper [\#1676](https://github.com/kokkos/kokkos-kernels/pull/1676)

#### TPL support
- Refactor MKL TPL for both CPU and GPU usage [\#1779](https://github.com/kokkos/kokkos-kernels/pull/1779)
- MKL: support indices properly [\#1868](https://github.com/kokkos/kokkos-kernels/pull/1868)
- Use rocsparse_spmv_ex for rocm >= 5.4.0 [\#1701](https://github.com/kokkos/kokkos-kernels/pull/1701)


### Build System:
- Do not change memory spaces instantiation defaults based on Kokkos_ENABLE_CUDA_UVM [\#1835](https://github.com/kokkos/kokkos-kernels/pull/1835)
- KokkosKernels: Remove TriBITS Kokkos subpackages (trilinos/Trilinos#11545) [\#1817](https://github.com/kokkos/kokkos-kernels/pull/1817)
- CMakeLists.txt: Add alias to match what is exported from Trilinos [\#1855](https://github.com/kokkos/kokkos-kernels/pull/1855)
- KokkosKernels: Don't list include for non-existant 'batched' build dir (trilinos/Trilinos#11966) [\#1867](https://github.com/kokkos/kokkos-kernels/pull/1867)
- Remove non-existant subdir kokkos-kernels/common/common (#11921, #11863) [\#1854](https://github.com/kokkos/kokkos-kernels/pull/1854)
- KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos/Trilinos#11545) [\#1844](https://github.com/kokkos/kokkos-kernels/pull/1844)

### Documentation and Testing:
- Enable sphinx werror [\#1856](https://github.com/kokkos/kokkos-kernels/pull/1856)
- Update cmake option naming in docs/comments [\#1849](https://github.com/kokkos/kokkos-kernels/pull/1849)
- docs/developer: Add Experimental namespace [\#1852](https://github.com/kokkos/kokkos-kernels/pull/1852)
- docs: Add profiling for compile times [\#1843](https://github.com/kokkos/kokkos-kernels/pull/1843)
- Ger: adding documentation stubs in apidocs [\#1822](https://github.com/kokkos/kokkos-kernels/pull/1822)
- .github/workflows: Summarize github-DOCS errors and warnings [\#1814](https://github.com/kokkos/kokkos-kernels/pull/1814)
- Blas1: docs update for PR #1803 [\#1805](https://github.com/kokkos/kokkos-kernels/pull/1805)
- apt-get update in hosted runner docs check [\#1797](https://github.com/kokkos/kokkos-kernels/pull/1797)
- scripts: Fix github-DOCS [\#1796](https://github.com/kokkos/kokkos-kernels/pull/1796)
- Add --enable-docs option to cm_generate_makefile [\#1785](https://github.com/kokkos/kokkos-kernels/pull/1785)
- docs: Add stubs for some sparse APIs [\#1768](https://github.com/kokkos/kokkos-kernels/pull/1768)
- .github: Update to actions/checkout@v3 [\#1767](https://github.com/kokkos/kokkos-kernels/pull/1767)
- docs: Include BatchedGemm [\#1765](https://github.com/kokkos/kokkos-kernels/pull/1765)
- .github: Automation reminder [\#1726](https://github.com/kokkos/kokkos-kernels/pull/1726)
- Allow an HTML-only docs build [\#1723](https://github.com/kokkos/kokkos-kernels/pull/1723)
- SYCL CI: Specify the full path to the compiler [\#1670](https://github.com/kokkos/kokkos-kernels/pull/1670)
- Add github DOCS ci check & disable Kokkos tests [\#1647](https://github.com/kokkos/kokkos-kernels/pull/1647)
- Add rocsparse,rocblas, to enabled TPLs in cm_test_all_sandia when --spot-check-tpls [\#1841](https://github.com/kokkos/kokkos-kernels/pull/1841)
- cm_test_all_sandia: update to add caraway queues for MI210, MI250 [\#1840](https://github.com/kokkos/kokkos-kernels/pull/1840)
- Support rocSparse in rocm 5.2.0 [\#1833](https://github.com/kokkos/kokkos-kernels/pull/1833)
- Add KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520 support, only enable KokkosBlas::gesv where supported [\#1816](https://github.com/kokkos/kokkos-kernels/pull/1816)
- scripts: Include OMP settings [\#1801](https://github.com/kokkos/kokkos-kernels/pull/1801)
- Print the patch that clang-format-8 wants to apply [\#1714](https://github.com/kokkos/kokkos-kernels/pull/1714)

### Benchmarks:
- Benchmark cleanup for par_ilut and spmv [\#1853](https://github.com/kokkos/kokkos-kernels/pull/1853)
- SpMV: adding benchmark for spmv [\#1821](https://github.com/kokkos/kokkos-kernels/pull/1821)
- New performance test for par_ilut, ginkgo::par_ilut, and spill [\#1799](https://github.com/kokkos/kokkos-kernels/pull/1799)
- Include OpenMP environment variables in benchmark context [\#1789](https://github.com/kokkos/kokkos-kernels/pull/1789)
- Re-enable and clean up triangle counting perf test [\#1752](https://github.com/kokkos/kokkos-kernels/pull/1752)
- Include google/benchmark lib version in benchmark output [\#1750](https://github.com/kokkos/kokkos-kernels/pull/1750)
- Refactor blas2 test for benchmark feature [\#1733](https://github.com/kokkos/kokkos-kernels/pull/1733)
- Adds a better parilut test with gmres [\#1661](https://github.com/kokkos/kokkos-kernels/pull/1661)
- Refactor blas1 test for benchmark feature [\#1636](https://github.com/kokkos/kokkos-kernels/pull/1636)

### Cleanup:
- Drop outdated workarounds for backward compatibility with Kokkos [\#1836](https://github.com/kokkos/kokkos-kernels/pull/1836)
- Remove dead code guarded [\#1834](https://github.com/kokkos/kokkos-kernels/pull/1834)
- Remove decl ETI files [\#1824](https://github.com/kokkos/kokkos-kernels/pull/1824)
- Reorganize par_ilut performance test [\#1818](https://github.com/kokkos/kokkos-kernels/pull/1818)
- Deprecate Kokkos::Details::ArithTraits [\#1748](https://github.com/kokkos/kokkos-kernels/pull/1748)
- Drop obsolete workaround #ifdef KOKKOS_IF_ON_HOST [\#1720](https://github.com/kokkos/kokkos-kernels/pull/1720)
- Drop pre Kokkos 3.6 workaround [\#1653](https://github.com/kokkos/kokkos-kernels/pull/1653)
- View::Rank -> View::rank [\#1703](https://github.com/kokkos/kokkos-kernels/pull/1703)
- Prefer Kokkos::View::{R->r}ank [\#1679](https://github.com/kokkos/kokkos-kernels/pull/1679)
- Call concurrency(), not impl_thread_pool_size() [\#1666](https://github.com/kokkos/kokkos-kernels/pull/1666)
- Kokkos moves ALL_t out of Impl namespace [\#1658](https://github.com/kokkos/kokkos-kernels/pull/1658)
- Add KokkosKernels::Impl::are_integral_v helper variable template and quit using Kokkos::Impl::are_integral trait [\#1652](https://github.com/kokkos/kokkos-kernels/pull/1652)

### Bug Fixes:
- Kokkos 4 compatibility: modifying the preprocessor logic [\#1827](https://github.com/kokkos/kokkos-kernels/pull/1827)
- blas/tpls: Fix gemm include guard typo [\#1848](https://github.com/kokkos/kokkos-kernels/pull/1848)
- spmv cusparse version check modified for cuda/11.1 [\#1828](https://github.com/kokkos/kokkos-kernels/pull/1828)
- Workaround for #1777 - cusparse spgemm test hang [\#1811](https://github.com/kokkos/kokkos-kernels/pull/1811)
- Fix 1798 [\#1800](https://github.com/kokkos/kokkos-kernels/pull/1800)
- BLAS: fixes and testing for LayoutStride [\#1794](https://github.com/kokkos/kokkos-kernels/pull/1794)
- Fix 1786: check that work array is contiguous in SVD [\#1793](https://github.com/kokkos/kokkos-kernels/pull/1793)
- Fix unused variable warnings [\#1790](https://github.com/kokkos/kokkos-kernels/pull/1790)
- Use KOKKOS_IMPL_DO_NOT_USE_PRINTF in Test_Common_UpperBound.hpp [\#1784](https://github.com/kokkos/kokkos-kernels/pull/1784)
- Batched Gesv: initializing variable to make compiler happy [\#1778](https://github.com/kokkos/kokkos-kernels/pull/1778)
- perf test utils: fix device ID parsing [\#1739](https://github.com/kokkos/kokkos-kernels/pull/1739)
- Fix OOB and improve comments in BsrMatrix COO constructor [\#1732](https://github.com/kokkos/kokkos-kernels/pull/1732)
- batched/unit_test: Disable simd dcomplex4 test in for intel > 19.05 and <= 2021. [\#1857](https://github.com/kokkos/kokkos-kernels/pull/1857)
- rocsparse spmv tpl: Fix rocsparse_spmv call for rocm < 5.4.0 [\#1716](https://github.com/kokkos/kokkos-kernels/pull/1716)
- compatibility with 4.0.0 [\#1709](https://github.com/kokkos/kokkos-kernels/pull/1709)
- team mult: fix type issue in max_error calculation [\#1706](https://github.com/kokkos/kokkos-kernels/pull/1706)
- cast Kokkos::Impl::integral_constant to int [\#1697](https://github.com/kokkos/kokkos-kernels/pull/1697)


## [4.0.01](https://github.com/kokkos/kokkos-kernels/tree/4.0.01) (2023-04-19)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/4.0.00...4.0.01)

### Bug Fixes:
- Use the options ENABLE_PERFTEST, ENABLE_EXAMPLES [\#1667](https://github.com/kokkos/kokkos-kernels/pull/1667)
- Introduce KOKKOSKERNELS_ALL_COMPONENTS_ENABLED variable [\#1691](https://github.com/kokkos/kokkos-kernels/pull/1691)
- Kokkos Kernels version: need to use upper case variables [\#1707](https://github.com/kokkos/kokkos-kernels/pull/1707)
- CUSPARSE_MM_ALG_DEFAULT deprecated by cuSparse 11.1 [\#1698](https://github.com/kokkos/kokkos-kernels/pull/1698)
- blas1: Fix a couple documentation typos [\#1704](https://github.com/kokkos/kokkos-kernels/pull/1704)
- CUDA 11.4: fixing some -Werror [\#1727](https://github.com/kokkos/kokkos-kernels/pull/1727)
- Remove unused variable in KokkosSparse_spgemm_numeric_tpl_spec_decl.hpp [\#1734](https://github.com/kokkos/kokkos-kernels/pull/1734)
- Reduce BatchedGemm test coverage time [\#1737](https://github.com/kokkos/kokkos-kernels/pull/1737)
- Fix kk_generate_diagonally_dominant_sparse_matrix hang [\#1689](https://github.com/kokkos/kokkos-kernels/pull/1689)
- Temporary spgemm workaround matching Trilinos 11663 [\#1757](https://github.com/kokkos/kokkos-kernels/pull/1757)
- MDF: Minor changes to interface for ifpack2 impl [\#1759](https://github.com/kokkos/kokkos-kernels/pull/1759)
- Rocm TPL support upgrade [\#1763](https://github.com/kokkos/kokkos-kernels/pull/1763)
- Fix BLAS cmake check for complex types [\#1762](https://github.com/kokkos/kokkos-kernels/pull/1762)
- ParIlut: Adds a better parilut test with gmres [\#1661](https://github.com/kokkos/kokkos-kernels/pull/1661)
- GMRES: fixing some type issues related to memory space instantiation (partial) [\#1719](https://github.com/kokkos/kokkos-kernels/pull/1719)
- ParIlut: create and destroy spgemm handle for each usage [\#1736](https://github.com/kokkos/kokkos-kernels/pull/1736)
- ParIlut: remove par ilut limitations [\#1755](https://github.com/kokkos/kokkos-kernels/pull/1755)
- ParIlut: make Ut_values view atomic in compute_l_u_factors [\#1781](https://github.com/kokkos/kokkos-kernels/pull/1781)


## [4.0.0](https://github.com/kokkos/kokkos-kernels/tree/4.0.00) (2023-21-02)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/3.7.01...4.0.00)

### Features:
- Copyright update 4.0 [\#1657](https://github.com/kokkos/kokkos-kernels/pull/1657)
Expand Down
Loading

0 comments on commit 5c56e14

Please sign in to comment.