Skip to content

Bad performance of views/slicing #2734

Open
@razorx89

Description

@razorx89

Hi,

I am still getting used to the library, but was able to isolate an unexpected performance hit. I want to update just a subregion of a pre-allocated 1D tensor. Maybe there is a better pattern to achieve the same result?

#include <chrono>
#include <xtensor/xrandom.hpp>
#include <xtensor/xtensor.hpp>

double mean_milliseconds_from_total(std::chrono::nanoseconds total,
                                    size_t num_repeats) {
  std::chrono::duration<double, std::milli> total_ms = total;
  return total_ms.count() / (double)num_repeats;
}

int main() {
  size_t num_repeats = 100;
  xt::xtensor<double, 1> a = xt::random::rand<double>({10000000});
  xt::xtensor<double, 1> b = xt::random::rand<double>({10000000});
  xt::xtensor<double, 1> c = xt::zeros<double>({10000000});

  // case 1: full tensor
  auto started = std::chrono::high_resolution_clock::now();
  for (size_t i = 0; i < num_repeats; ++i)
    c = a + b;
  auto finished = std::chrono::high_resolution_clock::now();
  std::cout << "elapsed time: "
            << mean_milliseconds_from_total(finished - started, num_repeats)
            << "ms" << std::endl;

  // case 2: view of tensor with xt::all()
  started = std::chrono::high_resolution_clock::now();
  for (size_t i = 0; i < num_repeats; ++i)
    xt::view(c, xt::all()) = xt::view(a + b, xt::all());
  finished = std::chrono::high_resolution_clock::now();
  std::cout << "elapsed time: "
            << mean_milliseconds_from_total(finished - started, num_repeats)
            << "ms" << std::endl;

  // case 3: view of tensor with xt::range()
  started = std::chrono::high_resolution_clock::now();
  for (size_t i = 0; i < num_repeats; ++i)
    xt::view(c, xt::range(0, c.size())) =
        xt::view(a + b, xt::range(0, c.size()));
  finished = std::chrono::high_resolution_clock::now();
  std::cout << "elapsed time: "
            << mean_milliseconds_from_total(finished - started, num_repeats)
            << "ms" << std::endl;
  return 0;
}

Result:

elapsed time: 8.00238ms
elapsed time: 31.0913ms
elapsed time: 30.9484ms

I understand that introducing views should have a performance hit, but for doing essentially the same task (memory layout, contiguous memory, same range, equal step size of one), it is quite a big hit. Is this expected behavior or am I doing something wrong?

Thanks.

Versions:

  • xtl v0.7.5
  • xtensor v0.24.7
  • Apple clang version 14.0.3 (clang-1403.0.22.14.1)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions