Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Difference attack #34

Open
yoid2000 opened this issue Jan 28, 2019 · 4 comments
Open

Implement Difference attack #34

yoid2000 opened this issue Jan 28, 2019 · 4 comments
Assignees

Comments

@yoid2000
Copy link
Contributor

yoid2000 commented Jan 28, 2019

For this issue, implement the difference attack described in section 5.2.2 of the Extended Diffix paper (https://aircloak.com/wp-content/uploads/Complete-Diffix.pdf). The criteria is singling out. You can see an example of a singling out attack at https://github.com/gda-score/code/blob/master/attacks/dumbList_SingOut.py.

This attack has two parts.

  1. The attacker must find a user that can be isolated.
  2. The attacker must make the set of queries that isolates the user.

To isolate a user, the attacker must find two queries where the counts of distinct users differs by exactly 1. The easiest way to do that is with not equals condition (<>). What we do is find a column that is likely to have many user-unique values. This could be any column where the number of distinct users is say 50% or more than the number of distinct values. The uid column is always such a column, and so is lastname.

You can use the function getTableCharacteristics() to determine which columns apply.

https://gda-score.github.io/gdaScore.m.html#gdaScore.gdaAttack.getTableCharacteristics

After selecting one such column, do an askKnowledge() query to get the contents of that column. Then select values in the column that for which there is only one user. Call these col_iso and val_iso.

For every other column (other than col_iso), we make two queries using ask_attack(). One query looks like this:

select col_other, count(distinct uid)
from table
where col_iso <> val_iso
group by 1

And the other query like this:

select col_other, count(distinct uid)
from table
group by 1

We are looking for the col_other value where the user (the victim) is not in the first query but is in the second. We'll assume that the value where the difference between the second count and the first count is largest will be that value.

Then we make a claim using ask_claim() based on this value.

For each col_other make 20 attack pairs (i.e. use 20 different val_iso values) and 20 corresponding claims.

@yoid2000
Copy link
Contributor Author

yoid2000 commented Feb 6, 2019

I've written a short article explaining how to write an attack. It is here: https://www.gda-score.org/quick-guide-to-writing-attacks/

@resha1417
Copy link

Hello sir,

I got some results during attack, I want to make sure that for those results what i am thinking is correct or not.
When i am attacking on ssn, It is giving me results like this:
For the 1st Query : SELECT ssn, count(DISTINCT uid) FROM accounts GROUP BY ssn
Result: [['', 5369]]
For the 2nd Query:SELECT ssn, count(DISTINCT uid) FROM accounts WHERE uid<>2848 GROUP BY ssn
Result: [['
', 5364]]

As we discussed,where no of pairs are same for query 1 and query 2,for that column i will claim. here i am getting 1 pair for both query, but as it is fully anonymized ,i can not come to know for which ssn uid=2848 belong to. because results giving me the maximum difference =5, but instead of snn ,here is *. so should i have to claim these kind of columns (fully anonymized) also.

Regards,
Resha

@yoid2000
Copy link
Contributor Author

yoid2000 commented Apr 23, 2019 via email

@resha1417
Copy link

resha1417 commented Apr 23, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants