Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for duplicates #48

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

antoine-de
Copy link
Contributor

This PR aim to close #44 (and follow Qwant#26)

Add an option to check the dupplicates: --check-duplicates=10

This will run geocoder tester as always, and for each query, after the tests on the expected fields, we'll check that no objects in the n first fields of the response are duplicates.

If the option is not there everything should run as usual.

The notion of a duplicate is something that the user can't differentiate, so we implemented something quite specific for qwant's display of the autocomplete's response:

  • for a poi, we consider the object's label + it's address
  • for the other objects only the label

For the moment this mechanism is quite hardcoded in get_label_for_duplicates, I'm completely open if you see a more generic way to do this.

The error log will be formatted like:

Duplicates found in the response
# Search was: indre
## Entry ('Reuilly (Indre) (Reuilly)', 'poi', 'Sentier des Tournelles (Reuilly)') has been found for:
           label           |         id          | type | osm_id | housenumber | street | postcode |  city   | country |        lat        |        lon         |               addr               | poi_types 
———————————————————————————|—————————————————————|——————|————————|—————————————|————————|——————————|—————————|—————————|———————————————————|————————————————————|——————————————————————————————————|———————————
 Reuilly (Indre) (Reuilly) | osm:node:1854248363 | poi  |   _    |      _      |   _    |  36260   | Reuilly |    _    | 47.08530172468403 | 2.0474608578328177 | Sentier des Tournelles (Reuilly) |  railway  
 Reuilly (Indre) (Reuilly) | osm:node:4498318505 | poi  |   _    |      _      |   _    |  36260   | Reuilly |    _    | 47.08529686318019 | 2.047508718499927  | Sentier des Tournelles (Reuilly) |  railway  

## Entry ('Indre Oslofjord (Oslo)', 'poi', 'Tøyengata (Oslo)') has been found for:
         label          |        id         | type | osm_id | housenumber | street | postcode | city | country |        lat        |        lon         |       addr       | poi_types 
————————————————————————|———————————————————|——————|————————|—————————————|————————|——————————|——————|—————————|———————————————————|————————————————————|——————————————————|———————————
 Indre Oslofjord (Oslo) | osm:way:233882196 | poi  |   _    |      _      |   _    |    _     | Oslo |    _    | 59.91907628783925 | 10.771447863393677 | Tøyengata (Oslo) |  garden   
 Indre Oslofjord (Oslo) | osm:way:233882197 | poi  |   _    |      _      |   _    |    _     | Oslo |    _    | 59.91908412491954 | 10.771563565673642 | Tøyengata (Oslo) |  garden   

Add an option to check the dupplicates: `--check-duplicates=10`

This will run geocoder tester as always, and for each query, after the
tests on the expected fields, we'll check that no objects in the
`n` first fields of the response are dupplicates.

If the option is not there averything should run as usual.

The notion of a dupplicate is something that the user can't
differentiate, so we implemented something quite specific for qwant's
display of the autocomplete's response:

* for a poi, we consider the object's label + it's address
* for the other objects only the label

For the moment this mechanism is quite hardcoded in
get_label_for_dupplicates, I'm completly open if you see a more generic
way to do this.

The error log will be formatted like:

```
______________________________________________________________ Search:
centre médico-psychologique 12, rue de cuire 69004 lyon
______________________________________________________________

Duplicates found in the response
        label        |    id    | type | osm_id | housenumber |
street    | postcode | city | country |    lat     |    lon    | addr |
poi_types
—————————————————————|——————————|——————|————————|—————————————|——————————————|——————————|——————|—————————|————————————|———————————|——————|———————————
Rue de Cuire (Lyon) | 22597275 |street|   _    |      _      | Rue de
Cuire |  69004   | Lyon |    _    | 45.7754093 | 4.8307156 |  _   |
_
Rue de Cuire (Lyon) | 22597282 |street|   _    |      _      | Rue de
Cuire |  69004   | Lyon |    _    | 45.7754093 | 4.8307156 |  _   |
_

             label             |           id            | type | osm_id
| housenumber |       street        | postcode | city | country |    lat
    |   lon    | addr | poi_types
———————————————————————————————|—————————————————————————|——————|————————|—————————————|—————————————————————|——————————|——————|—————————|———————————|——————————|——————|———————————
12 Rue des Cuirassiers (Lyon) | addr:4.856194;45.758659 |house |   _
    |     12      | Rue des Cuirassiers |  69003   | Lyon |    _    |
45.758659 | 4.856194 |  _   |     _
12 Rue des Cuirassiers (Lyon) | addr:4.856503;45.758678 |house |   _
    |     12      | Rue des Cuirassiers |  69003   | Lyon |    _    |
45.758678 | 4.856503 |  _   |     _

```
@nlehuby
Copy link
Contributor

nlehuby commented Oct 8, 2018

Any chance we can have this PR reviewed any time soon ? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test for duplicates
2 participants