Skip to content

KHASHE for multithreaded insertions #8

@7PintsOfCherryGarcia

Description

@7PintsOfCherryGarcia

I will use KHASHE in order to insert char * keys concurrently.

Assuming N bits my current strategy is to load keys into 2**N different arrays using the lower N bits of the hashed keys to place them into each of the arrays.

//Skipped a lot of notation for (hopefully) better clarity
uint32_t **ensa;
char *key;
while (key = loadkeys()) {
    low = hash(key) & ( (1U << N) - 1 );    
    uint32_t *a = ensa[low];
    a[nextpos++] = key

I can then proceed to insert my keys into a KHASHE table from different threads where each thread works on each of the different arrays. All the keys from every array should map uniquely to the same hash table of the ensemble avoiding any race condition.

This seems a bit wasteful as the hash of every key is computed twice. Once for mapping keys to the initial arrays and again when inserting in the hash table any workaround to this?

Fibonacci hashing is used throughout khashl to protect against bad hashes. I would like to use it to get better distribution of my keys to the different arrays/hash tables of the ensemble. Would this be necessary if I use kh_hash_str()?

Upon further inspection of khashl.h I noticed that:

khashl/khashl.h

Line 313 in 715d5f1

if (*absent) ++g->count; \

Creates a race condition between my threads at insertion time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions