Skip to content

Commit

Permalink
merge master branch
Browse files Browse the repository at this point in the history
  • Loading branch information
lu1and10 committed Oct 22, 2024
2 parents aa2be60 + 67214a3 commit 0618ec9
Show file tree
Hide file tree
Showing 163 changed files with 7,097 additions and 4,360 deletions.
30 changes: 30 additions & 0 deletions .github/workflows/cmake_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,33 @@ jobs:
working-directory: ./build
run: |
ctest -C ${{matrix.build_type}} --output-on-failure
- name: Set up Python
if: matrix.finufft_static_linking
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Build Python wheels
if: matrix.finufft_static_linking
env:
MACOSX_DEPLOYMENT_TARGET: 13
shell: bash
run: |
python3 -m pip install \
--verbose \
-C cmake.define.CMAKE_BUILD_TYPE=${{ matrix.build_type }} \
-C cmake.define.FINUFFT_ARCH_FLAGS=${{ matrix.arch_flags }} \
-C cmake.define.FINUFFT_USE_DUCC0=${{ matrix.ducc_fft }} \
python/finufft
- name: Install pytest
if: matrix.finufft_static_linking
run: |
python3 -m pip install --upgrade pip
python3 -m pip install pytest
- name: Test Python package
if: matrix.finufft_static_linking
run: |
python3 -m pytest python/finufft/test
28 changes: 15 additions & 13 deletions .github/workflows/python_build_wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,22 @@ jobs:
with:
package-dir: 'python/finufft'
env:
CIBW_BEFORE_ALL_MACOS: brew install gcc@14 fftw
CIBW_BEFORE_ALL_MACOS: |
# In order to reinstall a version of GCC compatible with older versions of macOS, we need to first uninstall the existing version.
brew uninstall gcc
pkg=$(brew fetch --force --bottle-tag=monterey gcc | grep 'Downloaded to.*monterey.*' | cut -d' ' -f3)
brew install $pkg
pkg=$(brew fetch --force --bottle-tag=monterey fftw | grep 'Downloaded to.*monterey.*' | cut -d' ' -f3)
brew install $pkg
CIBW_ARCHS_MACOS: "x86_64"
# Need following versions of GCC for compatibility with fftw
# installed by homebrew. Similarly, we set the macOS version
# for compatibility with those libraries.
CIBW_ENVIRONMENT_MACOS: >
CC=gcc-14
CXX=g++-14
MACOSX_DEPLOYMENT_TARGET=13
MACOSX_DEPLOYMENT_TARGET=12
- uses: actions/upload-artifact@v4
with:
Expand All @@ -46,18 +53,18 @@ jobs:
package-dir: 'python/finufft'
env:
CIBW_ARCHS_MACOS: "arm64"
# Make sure to install the ARM64-specific versions of FFTW and GCC.
# Perhaps this is done automatically on the macos-14 image. We should
# look into this further.
CIBW_BEFORE_ALL_MACOS: |
pkg=$(brew fetch --force --bottle-tag=arm64_ventura fftw | grep 'Downloaded to' | cut -d' ' -f3)
# In order to reinstall a version of GCC compatible with older versions of macOS, we need to first uninstall the existing version.
brew uninstall gcc
pkg=$(brew fetch --force --bottle-tag=arm64_monterey gcc | grep 'Downloaded to.*monterey.*' | cut -d' ' -f3)
brew install $pkg
pkg=$(brew fetch --force --bottle-tag=arm64_ventura gcc | grep 'Downloaded to' | cut -d' ' -f3)
pkg=$(brew fetch --force --bottle-tag=arm64_monterey fftw | grep 'Downloaded to.*monterey.*' | cut -d' ' -f3)
brew install $pkg
CIBW_ENVIRONMENT_MACOS: >
CC=gcc-14
CXX=g++-14
MACOSX_DEPLOYMENT_TARGET=14
MACOSX_DEPLOYMENT_TARGET=12
- uses: actions/upload-artifact@v4
with:
Expand Down Expand Up @@ -85,11 +92,6 @@ jobs:
uses: pypa/[email protected]
with:
package-dir: 'python/finufft'
env:
# This is required to force cmake to avoid using MSVC (the default).
# By setting the generator to Ninja, cmake will pick gcc (mingw64)
# as the compiler.
CIBW_CONFIG_SETTINGS: "cmake.args='-G Ninja'"

- uses: actions/upload-artifact@v4
with:
Expand Down
24 changes: 23 additions & 1 deletion CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,7 +1,26 @@
List of features / changes made / release notes, in reverse chronological order.
If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).

V 2.3.0-rc1 (8/6/24)
Master (10/8/24)

* Support and docs for opts.gpu_spreadinterponly=1 for MRI "density compensation
estimation" type 1&2 use-case with upsampfac=1.0 PR564 (Chaithya G R).
* reduced roundoff error in a[n] phase calc in CPU onedim_fseries_kernel().
PR534 (Barnett).
* GPU code type 1,2 also reduced round-off error in phases, to match CPU code;
rationalized onedim_{fseries,nuft}_* GPU codes to match CPU (Barbone, Barnett)
* Added type 3 in 1D, 2D, and 3D, in the GPU library cufinufft. PR #517, Barbone
- Removed the CPU fseries computation (used for benchmark, no longer needed)
- Added complex arithmetic support for cuda_complex type
- Added tests for type 3 in 1D, 2D, and 3D and cuda_complex arithmetic
- Minor fixes on the GPU code:
a) removed memory leaks in case of errors
b) renamed maxbatchsize to batchsize
* Add options for user-provided FFTW locker (PR548, Blackwell). These options
can be be used to prevent crashes when a user is creating/destroying FFTW
plans and FINUFFT plans in threads simultaneously.

V 2.3.0 (9/5/24)

* Switched C++ standards from C++14 to C++17, allowing various templating
improvements (Barbone).
Expand Down Expand Up @@ -72,6 +91,9 @@ V 2.3.0-rc1 (8/6/24)
test/finufft?d_test.cpp to reduce CI fails due to random numbers on some
platforms in single-prec (with DUCC, etc). (Barnett PR516)
* fix GPU segfault due to stream deletion as pointer not value (Barbone PR520)
* new performance-tracking doc page comparing releases (Barbone) #527
* fix various Py 3.8 wheel and numpy distutils logging issues #549 #545
* Cmake option to control -fPIC in static build; default now ON (as v2.2) #551

V 2.2.0 (12/12/23)

Expand Down
49 changes: 30 additions & 19 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ option(FINUFFT_USE_OPENMP "Whether to use OpenMP for parallelization. If disable
option(FINUFFT_USE_CPU "Whether to build the ordinary FINUFFT library (libfinufft)." ON)
option(FINUFFT_USE_CUDA "Whether to build CUDA accelerated FINUFFT library (libcufinufft). This is completely independent of the main FINUFFT library" OFF)
option(FINUFFT_STATIC_LINKING "If ON builds the static finufft library, if OFF build a shared finufft library." ON)
option(FINUFFT_POSITION_INDEPENDENT_CODE "Whether to build the finufft library with position independent code (-fPIC). This forced ON when FINUFFT_SHARED_LINKING is ON." ON)
option(FINUFFT_BUILD_DEVEL "Whether to build development executables" OFF)
option(FINUFFT_BUILD_EXAMPLES "Whether to build the FINUFFT examples" OFF)
option(FINUFFT_BUILD_TESTS "Whether to build the FINUFFT tests" OFF)
Expand All @@ -37,6 +38,11 @@ cmake_dependent_option(FINUFFT_STATIC_LINKING "Disable static libraries in the c
cmake_dependent_option(FINUFFT_SHARED_LINKING "Shared should be the opposite of static linking" ON "NOT FINUFFT_STATIC_LINKING" OFF)
# cmake-format: on

# When building shared libraries, we need to build with -fPIC in all cases
if(FINUFFT_SHARED_LINKING)
set(FINUFFT_POSITION_INDEPENDENT_CODE ON)
endif()

include(cmake/utils.cmake)

set(FINUFFT_CXX_FLAGS_RELEASE
Expand Down Expand Up @@ -115,9 +121,7 @@ endif()

# This set of sources is compiled twice, once in single precision and once in
# double precision The single precision compilation is done with -DSINGLE
set(FINUFFT_PRECISION_DEPENDENT_SOURCES
src/finufft.cpp src/fft.cpp src/simpleinterfaces.cpp src/spreadinterp.cpp
src/utils.cpp)
set(FINUFFT_PRECISION_DEPENDENT_SOURCES)

# If we're building for Fortran, make sure we also include the translation
# layer.
Expand Down Expand Up @@ -231,7 +235,7 @@ function(set_finufft_options target)
set_target_properties(
${target}
PROPERTIES MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>"
POSITION_INDEPENDENT_CODE ${FINUFFT_SHARED_LINKING})
POSITION_INDEPENDENT_CODE ${FINUFFT_POSITION_INDEPENDENT_CODE})
enable_asan(${target})
if(FINUFFT_USE_OPENMP)
target_link_libraries(${target} PRIVATE OpenMP::OpenMP_CXX)
Expand All @@ -246,25 +250,30 @@ endfunction()

if(FINUFFT_USE_CPU)
# Main finufft libraries
add_library(finufft_f32 OBJECT ${FINUFFT_PRECISION_DEPENDENT_SOURCES})
target_compile_definitions(finufft_f32 PRIVATE SINGLE)
set_finufft_options(finufft_f32)

add_library(finufft_f64 OBJECT ${FINUFFT_PRECISION_DEPENDENT_SOURCES})
set_finufft_options(finufft_f64)
if(NOT FINUFFT_STATIC_LINKING)
add_library(finufft SHARED src/utils_precindep.cpp
contrib/legendre_rule_fast.cpp)
add_library(
finufft SHARED
src/spreadinterp.cpp
src/utils.cpp
contrib/legendre_rule_fast.cpp
src/fft.cpp
src/finufft_core.cpp
src/simpleinterfaces.cpp
fortran/finufftfort.cpp)
else()
add_library(finufft STATIC src/utils_precindep.cpp
contrib/legendre_rule_fast.cpp)
add_library(
finufft STATIC
src/spreadinterp.cpp
src/utils.cpp
contrib/legendre_rule_fast.cpp
src/fft.cpp
src/finufft_core.cpp
src/simpleinterfaces.cpp
fortran/finufftfort.cpp)
endif()
target_link_libraries(finufft PRIVATE finufft_f32 finufft_f64)
set_finufft_options(finufft)

if(WIN32 AND FINUFFT_SHARED_LINKING)
target_compile_definitions(finufft_f32 PRIVATE dll_EXPORTS FINUFFT_DLL)
target_compile_definitions(finufft_f64 PRIVATE dll_EXPORTS FINUFFT_DLL)
target_compile_definitions(finufft PRIVATE dll_EXPORTS FINUFFT_DLL)
endif()
find_library(MATH_LIBRARY m)
Expand Down Expand Up @@ -342,12 +351,14 @@ if(FINUFFT_BUILD_PYTHON)
add_subdirectory(python)
endif()

message(STATUS " CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
# cmake-format: off
message(STATUS "FINUFFT configuration summary:")
message(STATUS " CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
message(STATUS " FINUFFT_USE_CPU: ${FINUFFT_USE_CPU}")
message(STATUS " FINUFFT_USE_CUDA: ${FINUFFT_USE_CUDA}")
message(STATUS " FINUFFT_USE_OPENMP: ${FINUFFT_USE_OPENMP}")
message(STATUS " FINUFFT_STATIC_LINKING: ${FINUFFT_STATIC_LINKING}")
message(STATUS " FINUFFT_POSITION_INDEPENDENT_CODE: ${FINUFFT_POSITION_INDEPENDENT_CODE}")
message(STATUS " FINUFFT_ENABLE_INSTALL: ${FINUFFT_ENABLE_INSTALL}")
message(STATUS " FINUFFT_BUILD_EXAMPLES: ${FINUFFT_BUILD_EXAMPLES}")
message(STATUS " FINUFFT_BUILD_TESTS: ${FINUFFT_BUILD_TESTS}")
Expand All @@ -359,7 +370,7 @@ message(STATUS " FINUFFT_FFTW_SUFFIX: ${FINUFFT_FFTW_SUFFIX}")
message(STATUS " FINUFFT_FFTW_LIBRARIES: ${FINUFFT_FFTW_LIBRARIES}")
message(STATUS " FINUFFT_ARCH_FLAGS: ${FINUFFT_ARCH_FLAGS}")
message(STATUS " FINUFFT_USE_DUCC0: ${FINUFFT_USE_DUCC0}")

# cmake-format: on
if(FINUFFT_ENABLE_INSTALL)
include(GNUInstallDirs)
install(TARGETS ${INSTALL_TARGETS} PUBLIC_HEADER)
Expand Down
16 changes: 11 additions & 5 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Copyright (C) 2017-2023 The Simons Foundation, Inc. - All Rights Reserved.
Copyright (C) 2017-2024 The Simons Foundation, Inc. - All Rights Reserved.

Lead developer: Alex H. Barnett; see docs/ackn.rst for other contributors.
See docs/ackn.rst for the list of code authors and contributors.

------

Expand Down Expand Up @@ -29,16 +29,22 @@ tutorial/utils/lgwt.m

If you find this library useful, or it helps you in creating software
or publications, please let us know, and acknowledge that fact by citing our
repository:
source repository:

https://github.com/flatironinstitute/finufft

and the corresponding journal articles (particularly the first):
and the corresponding journal articles (particularly the first for the CPU
and/or the last for the GPU):

A parallel non-uniform fast Fourier transform library based on an
``exponential of semicircle'' kernel. A. H. Barnett, J. F. Magland,
and L. af Klinteberg. SIAM J. Sci. Comput. 41(5), C479-C504 (2019).

Aliasing error of the exp$(\beta \sqrt{1-z^2})$ kernel in the
Aliasing error of the $\exp (\beta \sqrt{1-z^2})$ kernel in the
nonuniform fast Fourier transform. A. H. Barnett,
Appl. Comput. Harmon. Anal. 51, 1-16 (2021).

cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs,
Yu-hsuan Shih, Garrett Wright, Joakim Andén, Johannes Blaschke, and
Alex H. Barnett. PDSEC2021 workshop of the IPDPS2021 conference.
https://arxiv.org/abs/2102.08463
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ see `docs/ackn.rst` for full list of contributors.

<img align="right" src="docs/spreadpic.png" width="400"/>

This is a lightweight CPU library to compute the three standard types of nonuniform FFT to a specified precision, in one, two, or three dimensions. It is written in C++ with interfaces to C, Fortran, MATLAB/octave, Python, and (in a separate [repository](https://github.com/ludvigak/FINUFFT.jl)) Julia. It now also integrates the GPU CUDA library cuFINUFFT (which currently does all but type 3).
This is a lightweight CPU library to compute the three standard types of nonuniform FFT to a specified precision, in one, two, or three dimensions. It is written in C++ with interfaces to C, Fortran, MATLAB/octave, Python, and (in a separate [repository](https://github.com/ludvigak/FINUFFT.jl)) Julia. It now also integrates the GPU CUDA library cuFINUFFT.

Please see the [online documentation](http://finufft.readthedocs.io/en/latest/index.html) which can also be downloaded as a [PDF manual](https://finufft.readthedocs.io/_/downloads/en/latest/pdf/), and a [project overview](https://users.flatironinstitute.org/~ahb/notes/finufft-project-summary-2023.pdf).
You will also want to see CPU example codes in the directories `examples`, `test`, `fortran`, `matlab/test`, `matlab/examples`, `python/finufft/test`, etc, and GPU examples in `examples/cuda`, `test/cuda`, etc.
Expand Down
2 changes: 1 addition & 1 deletion cmake/setupDUCC.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ if(ducc0_ADDED)
set_target_properties(
ducc0
PROPERTIES MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>"
POSITION_INDEPENDENT_CODE ${FINUFFT_SHARED_LINKING})
POSITION_INDEPENDENT_CODE ${FINUFFT_POSITION_INDEPENDENT_CODE})
check_cxx_compiler_flag(-ffast-math HAS_FAST_MATH)
if(HAS_FAST_MATH)
target_compile_options(ducc0 PRIVATE -ffast-math)
Expand Down
3 changes: 2 additions & 1 deletion cmake/setupFFTW.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,8 @@ if(FINUFFT_FFTW_LIBRARIES STREQUAL DEFAULT OR FINUFFT_FFTW_LIBRARIES STREQUAL
set_target_properties(
${element}
PROPERTIES MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>"
POSITION_INDEPENDENT_CODE ${FINUFFT_SHARED_LINKING})
POSITION_INDEPENDENT_CODE
${FINUFFT_POSITION_INDEPENDENT_CODE})
endforeach()

target_include_directories(
Expand Down
5 changes: 4 additions & 1 deletion devel/README
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,7 @@ Tweaks should be done here, and see instructions there for resulting acc test.
Another code that has to match ../src/spreadinterp.cpp is:
reverse_engineer_tol.m

Barnett 7/22/24
Re measuring overall accuracy, to compare kernels, make matlab, and run:
matlab/test/fig_accuracy.m

Barnett 8/20/24
14 changes: 8 additions & 6 deletions docs/c_gpu.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _c_gpu:

C interface (GPU)
=================

Expand Down Expand Up @@ -289,6 +291,8 @@ This deallocates all arrays inside the ``plan`` struct, freeing all internal mem
Note: the plan (being just a pointer to the plan struct) is not actually "destroyed"; rather, its internal struct is destroyed.
There is no need for further deallocation of the plan.

.. _opts_gpu:

Options for GPU code
--------------------

Expand All @@ -311,11 +315,9 @@ while ``modeord=1`` selects FFT-style ordering starting at zero and wrapping ove

**gpu_device_id**: Sets the GPU device ID. Leave at default unless you know what you're doing. [To be documented]

Diagnostic options
~~~~~~~~~~~~~~~~~~
**gpu_spreadinterponly**: If ``0`` do the NUFFT as intended. If ``1``, omit the FFT and deconvolution (diagonal division by kernel Fourier transform) steps, which returns *garbage answers as a NUFFT*, but allows advanced users to perform an isolated spreading or interpolation using the usual type 1 or type 2 ``cufinufft`` interface. To do this, the nonzero flag value must be used *only* with ``upsampfac=1.0`` (since no upsampling takes place), and ``kerevalmeth=1``. The known use-case here is estimating so-called density compensation, conventionally used in MRI (see `MRI-NUFFT <https://mind-inria.github.io/mri-nufft/nufft.html>`_), although it might also be useful in spectral Ewald. Please note that this flag is also internally used by type 3 transforms (although it was originally a debug flag).


**gpu_spreadinterponly**: if ``0`` do the NUFFT as intended. If ``1``, omit the FFT and kernel FT deconvolution steps and return garbage answers.
Nonzero value is *only* to be used to aid timing tests (although currently there are no timing codes that exploit this option), and will give wrong or undefined answers for the NUFFT transforms!


Algorithm performance options
Expand All @@ -326,7 +328,7 @@ Algorithm performance options
* ``gpu_method=0`` : makes an automatic choice of one of the below methods, based on our heuristics.

* ``gpu_method=1`` : uses a nonuniform points-driven method, either unsorted which is referred to as GM in our paper, or sorted which is called GM-sort in our paper, depending on option ``gpu_sort`` below

* ``gpu_method=2`` : for spreading only, ie, type 1 transforms, uses a shared memory output-block driven method, referred to as SM in our paper. Has no effect for interpolation (type 2 transforms)

* ``gpu_method>2`` : (various upsupported experimental methods due to Melody Shih, not for regular users. Eg ``3`` tests an idea of Paul Springer's to group NU points when spreading, ``4`` is a block gather method of possible interest.)
Expand All @@ -335,7 +337,7 @@ Algorithm performance options

**gpu_kerevalmeth**: ``0`` use direct (reference) kernel evaluation, which is not recommended for speed (however, it allows nonstandard ``opts.upsampfac`` to be used). ``1`` use Horner piecewise polynomial evaluation (recommended, and enforces ``upsampfac=2.0``)

**upsampfac**: set upsampling factor. For the recommended ``kerevalmeth=1`` you must choose the standard ``upsampfac=2.0``. If you are willing to risk a slower kernel evaluation, you may set any ``upsampfac>1.0``, but this is experimental and unsupported.
**upsampfac**: set upsampling factor. For the recommended ``kerevalmeth=1`` you must choose the standard ``upsampfac=2.0``. If you are willing to risk a slower kernel evaluation, you may set any ``upsampfac>1.0``, but this is experimental and unsupported. Finally, ``upsampfac=1.0`` is an advanced GPU setting only to be paired with the "spread/interpolate only" mode triggered by setting ``gpu_spreadinterponly=1`` (see options above); do not use this unless you know what you are doing!

**gpu_maxsubprobsize**: maximum number of NU points to be handled in a single subproblem in the spreading SM method (``gpu_method=2`` only)

Expand Down
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,9 +74,9 @@
# built documents.
#
# The short X.Y version.
version = u'2.3-rc1'
version = u'2.3'
# The full version, including alpha/beta/rc tags.
release = u'2.3.0-rc1'
release = u'2.3.0'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
Loading

0 comments on commit 0618ec9

Please sign in to comment.