Skip to content

GVRS C Performance

Gary Lucas edited this page Jul 30, 2024 · 20 revisions

Introduction

The GvrsReadPerformance program implements a set of tests to evaluate the performance of different aspects of the GVRS C API read operation. The text below shows timing results for the Row-Major test which was run on a number of different platforms. The Row-Major test loops over the cells in a raster grid on a row-by-row basis performing a data read operation for each cell. Row-Major is a particularly demanding test that exercises the full suite of functions for a GVRS read operation. As such, it provides a good benchmark for evaluating performance.

Performance Test

For this test, we used the ETOPO1 data set. ETOPO1 is a global-scale grid of elevation and ocean-depth values that contains over 233 million data cells. For Windows, the test was run on a medium-size laptop computer with a solid-state drive (SSD). For testing under Linux, we used virtual computer instances running on Amazon cloud-based services (Amazon EC2 instances). Amazon offers a free tier which supports one of their lower-end virtual computer configurations, the t2.micro. We also tested using some of their fee-based virtual computer configurations.

Data source:   ETOPO1_v1.0.4.gvrs   (uncompressed)
Raster size:   10800 x 21600 for 233280000 grid cells
Tile size:     120 x 180 for 10800 tiles
File size:     455.9 MiB
Data type:     2 byte integers
Operating System Processor Memory Time (sec) Operations (Million/sec)
Windows Intel i7-8750H 2.2 GHz 16 GiB 1.892 123.30
Amazon Linux m6i.large (Intel) 2.9 GHz 8 GiB 1.026 227.37
Amazon Linux c6a.large (AMD EPYC) 4 GiB 0.981 237.79
Amazon Linux m6g.large (ARM 64-bit) 8 GiB 1.625 143.56
Amazon Linux t2.micro (x86_64) 1.0 GiB 2.572 90.70
Debian Linux t2.micro (x86_64) 1.0 GiB 2.559 91.15
Amazon Linux t4g.nano (ARM 64-bit) 0.5 GiB 6.124 38.09

Amazon's m6i family of Intel(R) based processors attempts to strike a balance between memory and performance. The c6a family is optimized for computation. The GVRS C API does not implementation multi-threading at this time and does not take advantage of multi-core processor capabilities.

Running on the cloud shows that the GVRS C API functions correctly across different variations of Linux and on different hardware with acceptable performance.

Comparison with original Java-based implementation

As a basis of comparison, we ran the original Java equivalent of the Row Major test on the same Windows laptop as was used to obtain the results above.

Implementation Time (sec) Operations (Million/sec)
GVRS-C API 1.892 123.30
GVRS-Java 2.094 111.39

The performance of the current GVRS-C implementation is a slight improvement over that of the original Java-based implementation. But the change is not dramatic.

Performance Bottlenecks

Tests using the Linux gprof utility indicate that 84% of the processing time for the Row-Major test occurs in a single function, GvrsElementReadInt. The gprof value compensates for the time spent in internal function calls within the targeted function. The large percentage of the processing time is contributed by just the code in GvrsElementReadInt. So that code is a candidate for futher investigation in seeking ways to improve processing throughput.

The ETOPO1 product used for this test was stored without data compression. For products stored with data compression, the GVRS-C API decompresses subsets of the source file on an as-needed basis. When it does, there is an additional processing cost for decompressing data. Performance issues related to data compression will be the topic of a future article. For now, we note that the Row-Major test requires 4.8 seconds to process the compressed version of ETOPO1.

Memory use

The amount of memory used by the GVRS C API is configurable. Software developers usually configure GVRS based on the needs of the application and the pattern-of-access for the data. The largest contributor is the management of tiles, which are pieces of the virtual raster that are swapped in and out of memory on an as-needed basis. Tiles are managed using a tile cache which is discussed in detail at our Gridfour Project Notes page. The Row-Major test requires a tile cache configured for its "large" size. The maximum memory use (including the cache and other components) for the Row-Major test was under 6 megabytes.

On the other hand, the default "medium" cache size is more than adequate for many applications. Cache size depends, in part, on the configuration of the input GVRS data file. For the ETOPO1 data set, the medium cache size requires only 0.4 megabytes.

Conclusion

One of the goals of the GVRS C software project is to provide an API that is frugal in its use of computer resources. To that end, we seek an efficient design and implementation. The fact that the code supports good throughput even on lower-end platforms, suggests that the software does not consume excessive resources and meets its design goals.

In future work, we hope to conduct testing on dedicated Linux processors and also on single-board computer configurations such as the Raspberry Pi.