Skip to content

Scalability with number of IAA engines #51

Open
@liam0215

Description

@liam0215

I ran some benchmarks dispatching jobs to the IAA with a single work queue, 8 cores, and sweeping the engines assigned to the work queue from 1 to 8. With one engine I was able to achieve an inbound throughput on the IAA (checked using pcm_accel) of ~1.9GB/s, with 2 engines 3.8GB/s, and then it stayed flat at 3.8GB/s as I added more engines. I tested with input sizes up to 512MB and using up to 32 threads. The bottleneck was encountered with a thread/input size/core count much lower than what I described, but these are the maximum I used for the benchmark. Is this expected? My impression from some recent papers on the on-chip Intel accelerators is that the maximum throughput should be closer to 30GB/s.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions