merge master branch

flatironinstitute · Oct 22, 2024 · 0618ec9 · 0618ec9
2 parents aa2be60 + 67214a3
commit 0618ec9
Show file tree

Hide file tree

Showing 163 changed files with 7,097 additions and 4,360 deletions.
diff --git a/.github/workflows/cmake_ci.yml b/.github/workflows/cmake_ci.yml
@@ -55,3 +55,33 @@ jobs:
         working-directory: ./build
         run: |
           ctest -C ${{matrix.build_type}} --output-on-failure
+
+      - name: Set up Python
+        if: matrix.finufft_static_linking
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.10'
+
+      - name: Build Python wheels
+        if: matrix.finufft_static_linking
+        env:
+          MACOSX_DEPLOYMENT_TARGET: 13
+        shell: bash
+        run: |
+          python3 -m pip install \
+            --verbose \
+            -C cmake.define.CMAKE_BUILD_TYPE=${{ matrix.build_type }} \
+            -C cmake.define.FINUFFT_ARCH_FLAGS=${{ matrix.arch_flags }} \
+            -C cmake.define.FINUFFT_USE_DUCC0=${{ matrix.ducc_fft }} \
+            python/finufft
+
+      - name: Install pytest
+        if: matrix.finufft_static_linking
+        run: |
+          python3 -m pip install --upgrade pip
+          python3 -m pip install pytest
+
+      - name: Test Python package
+        if: matrix.finufft_static_linking
+        run: |
+          python3 -m pytest python/finufft/test
diff --git a/.github/workflows/python_build_wheels.yml b/.github/workflows/python_build_wheels.yml
@@ -18,15 +18,22 @@ jobs:
         with:
           package-dir: 'python/finufft'
         env:
-          CIBW_BEFORE_ALL_MACOS: brew install gcc@14 fftw
+          CIBW_BEFORE_ALL_MACOS: |
+            # In order to reinstall a version of GCC compatible with older versions of macOS, we need to first uninstall the existing version.
+            brew uninstall gcc
+            pkg=$(brew fetch --force --bottle-tag=monterey gcc | grep 'Downloaded to.*monterey.*' | cut -d' ' -f3)
+            brew install $pkg
+
+            pkg=$(brew fetch --force --bottle-tag=monterey fftw | grep 'Downloaded to.*monterey.*' | cut -d' ' -f3)
+            brew install $pkg
           CIBW_ARCHS_MACOS: "x86_64"
           # Need following versions of GCC for compatibility with fftw
           # installed by homebrew. Similarly, we set the macOS version
           # for compatibility with those libraries.
           CIBW_ENVIRONMENT_MACOS: >
             CC=gcc-14
             CXX=g++-14
-            MACOSX_DEPLOYMENT_TARGET=13
+            MACOSX_DEPLOYMENT_TARGET=12
 
       - uses: actions/upload-artifact@v4
         with:
@@ -46,18 +53,18 @@ jobs:
           package-dir: 'python/finufft'
         env:
           CIBW_ARCHS_MACOS: "arm64"
-          # Make sure to install the ARM64-specific versions of FFTW and GCC.
-          # Perhaps this is done automatically on the macos-14 image. We should
-          # look into this further.
           CIBW_BEFORE_ALL_MACOS: |
-            pkg=$(brew fetch --force --bottle-tag=arm64_ventura fftw | grep 'Downloaded to' | cut -d' ' -f3)
+            # In order to reinstall a version of GCC compatible with older versions of macOS, we need to first uninstall the existing version.
+            brew uninstall gcc
+            pkg=$(brew fetch --force --bottle-tag=arm64_monterey gcc | grep 'Downloaded to.*monterey.*' | cut -d' ' -f3)
             brew install $pkg
-            pkg=$(brew fetch --force --bottle-tag=arm64_ventura gcc | grep 'Downloaded to' | cut -d' ' -f3)
+
+            pkg=$(brew fetch --force --bottle-tag=arm64_monterey fftw | grep 'Downloaded to.*monterey.*' | cut -d' ' -f3)
             brew install $pkg
           CIBW_ENVIRONMENT_MACOS: >
             CC=gcc-14
             CXX=g++-14
-            MACOSX_DEPLOYMENT_TARGET=14
+            MACOSX_DEPLOYMENT_TARGET=12
 
       - uses: actions/upload-artifact@v4
         with:
@@ -85,11 +92,6 @@ jobs:
         uses: pypa/[email protected]
         with:
           package-dir: 'python/finufft'
-        env:
-          # This is required to force cmake to avoid using MSVC (the default).
-          # By setting the generator to Ninja, cmake will pick gcc (mingw64)
-          # as the compiler.
-          CIBW_CONFIG_SETTINGS: "cmake.args='-G Ninja'"
 
       - uses: actions/upload-artifact@v4
         with:

diff --git a/CHANGELOG b/CHANGELOG
@@ -1,7 +1,26 @@
 List of features / changes made / release notes, in reverse chronological order.
 If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).
 
-V 2.3.0-rc1 (8/6/24)
+Master (10/8/24)
+
+* Support and docs for opts.gpu_spreadinterponly=1 for MRI "density compensation
+  estimation" type 1&2 use-case with upsampfac=1.0 PR564 (Chaithya G R).
+* reduced roundoff error in a[n] phase calc in CPU onedim_fseries_kernel().
+   PR534 (Barnett).
+* GPU code type 1,2 also reduced round-off error in phases, to match CPU code;
+  rationalized onedim_{fseries,nuft}_* GPU codes to match CPU (Barbone, Barnett)
+* Added type 3 in 1D, 2D, and 3D, in the GPU library cufinufft. PR #517, Barbone
+  - Removed the CPU fseries computation (used for benchmark, no longer needed)
+  - Added complex arithmetic support for cuda_complex type
+  - Added tests for type 3 in 1D, 2D, and 3D and cuda_complex arithmetic
+  - Minor fixes on the GPU code:
+    a) removed memory leaks in case of errors
+    b) renamed maxbatchsize to batchsize
+* Add options for user-provided FFTW locker (PR548, Blackwell). These options
+  can be be used to prevent crashes when a user is creating/destroying FFTW
+  plans and FINUFFT plans in threads simultaneously.
+
+V 2.3.0 (9/5/24)
 
 * Switched C++ standards from C++14 to C++17, allowing various templating
   improvements (Barbone).
@@ -72,6 +91,9 @@ V 2.3.0-rc1 (8/6/24)
   test/finufft?d_test.cpp to reduce CI fails due to random numbers on some
   platforms in single-prec (with DUCC, etc). (Barnett PR516)
 * fix GPU segfault due to stream deletion as pointer not value (Barbone PR520)
+* new performance-tracking doc page comparing releases (Barbone) #527
+* fix various Py 3.8 wheel and numpy distutils logging issues #549 #545
+* Cmake option to control -fPIC in static build; default now ON (as v2.2) #551
 
 V 2.2.0 (12/12/23)
 

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -19,6 +19,7 @@ option(FINUFFT_USE_OPENMP "Whether to use OpenMP for parallelization. If disable
 option(FINUFFT_USE_CPU "Whether to build the ordinary FINUFFT library (libfinufft)." ON)
 option(FINUFFT_USE_CUDA "Whether to build CUDA accelerated FINUFFT library (libcufinufft). This is completely independent of the main FINUFFT library" OFF)
 option(FINUFFT_STATIC_LINKING "If ON builds the static finufft library, if OFF build a shared finufft library." ON)
+option(FINUFFT_POSITION_INDEPENDENT_CODE "Whether to build the finufft library with position independent code (-fPIC). This forced ON when FINUFFT_SHARED_LINKING is ON." ON)
 option(FINUFFT_BUILD_DEVEL "Whether to build development executables" OFF)
 option(FINUFFT_BUILD_EXAMPLES "Whether to build the FINUFFT examples" OFF)
 option(FINUFFT_BUILD_TESTS "Whether to build the FINUFFT tests" OFF)
@@ -37,6 +38,11 @@ cmake_dependent_option(FINUFFT_STATIC_LINKING "Disable static libraries in the c
 cmake_dependent_option(FINUFFT_SHARED_LINKING "Shared should be the opposite of static linking" ON "NOT FINUFFT_STATIC_LINKING" OFF)
 # cmake-format: on
 
+# When building shared libraries, we need to build with -fPIC in all cases
+if(FINUFFT_SHARED_LINKING)
+  set(FINUFFT_POSITION_INDEPENDENT_CODE ON)
+endif()
+
 include(cmake/utils.cmake)
 
 set(FINUFFT_CXX_FLAGS_RELEASE
@@ -115,9 +121,7 @@ endif()
 
 # This set of sources is compiled twice, once in single precision and once in
 # double precision The single precision compilation is done with -DSINGLE
-set(FINUFFT_PRECISION_DEPENDENT_SOURCES
-    src/finufft.cpp src/fft.cpp src/simpleinterfaces.cpp src/spreadinterp.cpp
-    src/utils.cpp)
+set(FINUFFT_PRECISION_DEPENDENT_SOURCES)
 
 # If we're building for Fortran, make sure we also include the translation
 # layer.
@@ -231,7 +235,7 @@ function(set_finufft_options target)
   set_target_properties(
     ${target}
     PROPERTIES MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>"
-               POSITION_INDEPENDENT_CODE ${FINUFFT_SHARED_LINKING})
+               POSITION_INDEPENDENT_CODE ${FINUFFT_POSITION_INDEPENDENT_CODE})
   enable_asan(${target})
   if(FINUFFT_USE_OPENMP)
     target_link_libraries(${target} PRIVATE OpenMP::OpenMP_CXX)
@@ -246,25 +250,30 @@ endfunction()
 
 if(FINUFFT_USE_CPU)
   # Main finufft libraries
-  add_library(finufft_f32 OBJECT ${FINUFFT_PRECISION_DEPENDENT_SOURCES})
-  target_compile_definitions(finufft_f32 PRIVATE SINGLE)
-  set_finufft_options(finufft_f32)
-
-  add_library(finufft_f64 OBJECT ${FINUFFT_PRECISION_DEPENDENT_SOURCES})
-  set_finufft_options(finufft_f64)
   if(NOT FINUFFT_STATIC_LINKING)
-    add_library(finufft SHARED src/utils_precindep.cpp
-                               contrib/legendre_rule_fast.cpp)
+    add_library(
+      finufft SHARED
+      src/spreadinterp.cpp
+      src/utils.cpp
+      contrib/legendre_rule_fast.cpp
+      src/fft.cpp
+      src/finufft_core.cpp
+      src/simpleinterfaces.cpp
+      fortran/finufftfort.cpp)
   else()
-    add_library(finufft STATIC src/utils_precindep.cpp
-                               contrib/legendre_rule_fast.cpp)
+    add_library(
+      finufft STATIC
+      src/spreadinterp.cpp
+      src/utils.cpp
+      contrib/legendre_rule_fast.cpp
+      src/fft.cpp
+      src/finufft_core.cpp
+      src/simpleinterfaces.cpp
+      fortran/finufftfort.cpp)
   endif()
-  target_link_libraries(finufft PRIVATE finufft_f32 finufft_f64)
   set_finufft_options(finufft)
 
   if(WIN32 AND FINUFFT_SHARED_LINKING)
-    target_compile_definitions(finufft_f32 PRIVATE dll_EXPORTS FINUFFT_DLL)
-    target_compile_definitions(finufft_f64 PRIVATE dll_EXPORTS FINUFFT_DLL)
     target_compile_definitions(finufft PRIVATE dll_EXPORTS FINUFFT_DLL)
   endif()
   find_library(MATH_LIBRARY m)
@@ -342,12 +351,14 @@ if(FINUFFT_BUILD_PYTHON)
   add_subdirectory(python)
 endif()
 
-message(STATUS " CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
+# cmake-format: off
 message(STATUS "FINUFFT configuration summary:")
+message(STATUS "  CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
 message(STATUS "  FINUFFT_USE_CPU: ${FINUFFT_USE_CPU}")
 message(STATUS "  FINUFFT_USE_CUDA: ${FINUFFT_USE_CUDA}")
 message(STATUS "  FINUFFT_USE_OPENMP: ${FINUFFT_USE_OPENMP}")
 message(STATUS "  FINUFFT_STATIC_LINKING: ${FINUFFT_STATIC_LINKING}")
+message(STATUS "  FINUFFT_POSITION_INDEPENDENT_CODE: ${FINUFFT_POSITION_INDEPENDENT_CODE}")
 message(STATUS "  FINUFFT_ENABLE_INSTALL: ${FINUFFT_ENABLE_INSTALL}")
 message(STATUS "  FINUFFT_BUILD_EXAMPLES: ${FINUFFT_BUILD_EXAMPLES}")
 message(STATUS "  FINUFFT_BUILD_TESTS: ${FINUFFT_BUILD_TESTS}")
@@ -359,7 +370,7 @@ message(STATUS "  FINUFFT_FFTW_SUFFIX: ${FINUFFT_FFTW_SUFFIX}")
 message(STATUS "  FINUFFT_FFTW_LIBRARIES: ${FINUFFT_FFTW_LIBRARIES}")
 message(STATUS "  FINUFFT_ARCH_FLAGS: ${FINUFFT_ARCH_FLAGS}")
 message(STATUS "  FINUFFT_USE_DUCC0: ${FINUFFT_USE_DUCC0}")
-
+# cmake-format: on
 if(FINUFFT_ENABLE_INSTALL)
   include(GNUInstallDirs)
   install(TARGETS ${INSTALL_TARGETS} PUBLIC_HEADER)

diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
-Copyright (C) 2017-2023 The Simons Foundation, Inc. - All Rights Reserved.
+Copyright (C) 2017-2024 The Simons Foundation, Inc. - All Rights Reserved.
 
-Lead developer: Alex H. Barnett; see docs/ackn.rst for other contributors.
+See docs/ackn.rst for the list of code authors and contributors.
 
 ------
 
@@ -29,16 +29,22 @@ tutorial/utils/lgwt.m
 
 If you find this library useful, or it helps you in creating software
 or publications, please let us know, and acknowledge that fact by citing our
-repository:
+source repository:
 
   https://github.com/flatironinstitute/finufft
 
-and the corresponding journal articles (particularly the first):
+and the corresponding journal articles (particularly the first for the CPU
+and/or the last for the GPU):
 
   A parallel non-uniform fast Fourier transform library based on an
   ``exponential of semicircle'' kernel. A. H. Barnett, J. F. Magland,
   and L. af Klinteberg.  SIAM J. Sci. Comput. 41(5), C479-C504 (2019).
 
-  Aliasing error of the exp$(\beta \sqrt{1-z^2})$ kernel in the
+  Aliasing error of the $\exp (\beta \sqrt{1-z^2})$ kernel in the
   nonuniform fast Fourier transform. A. H. Barnett,
   Appl. Comput. Harmon. Anal. 51, 1-16 (2021).
+
+  cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs,
+  Yu-hsuan Shih, Garrett Wright, Joakim Andén, Johannes Blaschke, and
+  Alex H. Barnett. PDSEC2021 workshop of the IPDPS2021 conference.
+  https://arxiv.org/abs/2102.08463
diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@ see `docs/ackn.rst` for full list of contributors.
 
 <img align="right" src="docs/spreadpic.png" width="400"/>
 
-This is a lightweight CPU library to compute the three standard types of nonuniform FFT to a specified precision, in one, two, or three dimensions. It is written in C++ with interfaces to C, Fortran, MATLAB/octave, Python, and (in a separate [repository](https://github.com/ludvigak/FINUFFT.jl)) Julia. It now also integrates the GPU CUDA library cuFINUFFT (which currently does all but type 3).
+This is a lightweight CPU library to compute the three standard types of nonuniform FFT to a specified precision, in one, two, or three dimensions. It is written in C++ with interfaces to C, Fortran, MATLAB/octave, Python, and (in a separate [repository](https://github.com/ludvigak/FINUFFT.jl)) Julia. It now also integrates the GPU CUDA library cuFINUFFT.
 
 Please see the [online documentation](http://finufft.readthedocs.io/en/latest/index.html) which can also be downloaded as a [PDF manual](https://finufft.readthedocs.io/_/downloads/en/latest/pdf/), and a [project overview](https://users.flatironinstitute.org/~ahb/notes/finufft-project-summary-2023.pdf).
 You will also want to see CPU example codes in the directories `examples`, `test`, `fortran`, `matlab/test`, `matlab/examples`, `python/finufft/test`, etc, and GPU examples in `examples/cuda`, `test/cuda`, etc.

diff --git a/cmake/setupDUCC.cmake b/cmake/setupDUCC.cmake
@@ -29,7 +29,7 @@ if(ducc0_ADDED)
   set_target_properties(
     ducc0
     PROPERTIES MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>"
-               POSITION_INDEPENDENT_CODE ${FINUFFT_SHARED_LINKING})
+               POSITION_INDEPENDENT_CODE ${FINUFFT_POSITION_INDEPENDENT_CODE})
   check_cxx_compiler_flag(-ffast-math HAS_FAST_MATH)
   if(HAS_FAST_MATH)
     target_compile_options(ducc0 PRIVATE -ffast-math)

diff --git a/cmake/setupFFTW.cmake b/cmake/setupFFTW.cmake
@@ -72,7 +72,8 @@ if(FINUFFT_FFTW_LIBRARIES STREQUAL DEFAULT OR FINUFFT_FFTW_LIBRARIES STREQUAL
       set_target_properties(
         ${element}
         PROPERTIES MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>"
-                   POSITION_INDEPENDENT_CODE ${FINUFFT_SHARED_LINKING})
+                   POSITION_INDEPENDENT_CODE
+                   ${FINUFFT_POSITION_INDEPENDENT_CODE})
     endforeach()
 
     target_include_directories(

diff --git a/devel/README b/devel/README
@@ -21,4 +21,7 @@ Tweaks should be done here, and see instructions there for resulting acc test.
 Another code that has to match ../src/spreadinterp.cpp is:
 reverse_engineer_tol.m
 
-Barnett 7/22/24
+Re measuring overall accuracy, to compare kernels, make matlab, and run:
+matlab/test/fig_accuracy.m
+
+Barnett 8/20/24
diff --git a/docs/c_gpu.rst b/docs/c_gpu.rst
@@ -1,3 +1,5 @@
+.. _c_gpu:
+
 C interface (GPU)
 =================
 
@@ -289,6 +291,8 @@ This deallocates all arrays inside the ``plan`` struct, freeing all internal mem
 Note: the plan (being just a pointer to the plan struct) is not actually "destroyed"; rather, its internal struct is destroyed.
 There is no need for further deallocation of the plan.
 
+.. _opts_gpu:
+
 Options for GPU code
 --------------------
 
@@ -311,11 +315,9 @@ while ``modeord=1`` selects FFT-style ordering starting at zero and wrapping ove
 
 **gpu_device_id**: Sets the GPU device ID. Leave at default unless you know what you're doing. [To be documented]
 
-Diagnostic options
-~~~~~~~~~~~~~~~~~~
+**gpu_spreadinterponly**: If ``0`` do the NUFFT as intended. If ``1``, omit the FFT and deconvolution (diagonal division by kernel Fourier transform) steps, which returns *garbage answers as a NUFFT*, but allows advanced users to perform an isolated spreading or interpolation using the usual type 1 or type 2 ``cufinufft`` interface. To do this, the nonzero flag value must be used *only* with ``upsampfac=1.0`` (since no upsampling takes place), and ``kerevalmeth=1``. The known use-case here is estimating so-called density compensation, conventionally used in MRI (see `MRI-NUFFT <https://mind-inria.github.io/mri-nufft/nufft.html>`_), although it might also be useful in spectral Ewald. Please note that this flag is also internally used by type 3 transforms (although it was originally a debug flag).
+
 
-**gpu_spreadinterponly**: if ``0`` do the NUFFT as intended. If ``1``, omit the FFT and kernel FT deconvolution steps and return garbage answers.
-Nonzero value is *only* to be used to aid timing tests (although currently there are no timing codes that exploit this option), and will give wrong or undefined answers for the NUFFT transforms!
 
 
 Algorithm performance options
@@ -326,7 +328,7 @@ Algorithm performance options
 * ``gpu_method=0`` : makes an automatic choice of one of the below methods, based on our heuristics.
 
 * ``gpu_method=1`` : uses a nonuniform points-driven method, either unsorted which is referred to as GM in our paper, or sorted which is called GM-sort in our paper, depending on option ``gpu_sort`` below
-  
+
 * ``gpu_method=2`` : for spreading only, ie, type 1 transforms, uses a shared memory output-block driven method, referred to as SM in our paper. Has no effect for interpolation (type 2 transforms)
 
 * ``gpu_method>2`` : (various upsupported experimental methods due to Melody Shih, not for regular users. Eg ``3`` tests an idea of Paul Springer's to group NU points when spreading, ``4`` is a block gather method of possible interest.)
@@ -335,7 +337,7 @@ Algorithm performance options
 
 **gpu_kerevalmeth**: ``0`` use direct (reference) kernel evaluation, which is not recommended for speed (however, it allows nonstandard ``opts.upsampfac`` to be used). ``1`` use Horner piecewise polynomial evaluation (recommended, and enforces ``upsampfac=2.0``)
 
-**upsampfac**: set upsampling factor. For the recommended ``kerevalmeth=1`` you must choose the standard ``upsampfac=2.0``. If you are willing to risk a slower kernel evaluation, you may set any ``upsampfac>1.0``, but this is experimental and unsupported.
+**upsampfac**: set upsampling factor. For the recommended ``kerevalmeth=1`` you must choose the standard ``upsampfac=2.0``. If you are willing to risk a slower kernel evaluation, you may set any ``upsampfac>1.0``, but this is experimental and unsupported. Finally, ``upsampfac=1.0`` is an advanced GPU setting only to be paired with the "spread/interpolate only" mode triggered by setting ``gpu_spreadinterponly=1`` (see options above); do not use this unless you know what you are doing!
 
 **gpu_maxsubprobsize**: maximum number of NU points to be handled in a single subproblem in the spreading SM method (``gpu_method=2`` only)
 

diff --git a/docs/conf.py b/docs/conf.py
@@ -74,9 +74,9 @@
 # built documents.
 #
 # The short X.Y version.
-version = u'2.3-rc1'
+version = u'2.3'
 # The full version, including alpha/beta/rc tags.
-release = u'2.3.0-rc1'
+release = u'2.3.0'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.