Scalability with number of IAA engines

I ran some benchmarks dispatching jobs to the IAA with a single work queue, 8 cores, and sweeping the engines assigned to the work queue from 1 to 8. With one engine I was able to achieve an inbound throughput on the IAA (checked using pcm_accel) of ~1.9GB/s, with 2 engines 3.8GB/s, and then it stayed flat at 3.8GB/s as I added more engines. I tested with input sizes up to 512MB and using up to 32 threads. The bottleneck was encountered with a thread/input size/core count much lower than what I described, but these are the maximum I used for the benchmark. Is this expected? My impression from some recent papers on the on-chip Intel accelerators is that the maximum throughput should be closer to 30GB/s. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scalability with number of IAA engines #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scalability with number of IAA engines #51

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions