Ogc 508 replace elastic search by postgres v3 #1559

Tschuppi81 · 2024-10-25T09:47:42Z

Search: Adds postgres search including views /search-postgres?q=test

TYPE: Feature
LINK: ogc-508

Checklist

I have performed a self-review of my code
I considered adding a reviewer
I made changes/features for both org and town6, agency, fsi, translator, winterthur, feriennet
I have tested my code thoroughly by hand
I have added tests for my changes/features

…o hybrid properties)

linear · 2024-10-25T09:47:45Z

OGC-508 Elastic Search durch Volltextsuche in Postgres ablösen

… events)

…c documents in python rather than psql

codecov · 2024-10-25T16:12:24Z

Codecov Report

Attention: Patch coverage is 85.60606% with 38 lines in your changes missing coverage. Please review.

Project coverage is 88.70%. Comparing base (10c9a81) to head (95abc04).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/onegov/org/models/search.py	80.68%	28 Missing ⚠️
src/onegov/org/views/search.py	68.18%	7 Missing ⚠️
src/onegov/fsi/views/search.py	92.85%	1 Missing ⚠️
src/onegov/search/cli.py	0.00%	1 Missing ⚠️
src/onegov/search/integration.py	91.66%	1 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
src/onegov/agency/views/search.py	`100.00% <100.00%> (ø)`
src/onegov/directory/models/directory_entry.py	`95.31% <ø> (ø)`
src/onegov/landsgemeinde/views/search.py	`100.00% <100.00%> (ø)`
src/onegov/onboarding/app.py	`100.00% <100.00%> (ø)`
src/onegov/onboarding/models/town_assistant.py	`93.54% <100.00%> (ø)`
src/onegov/org/app.py	`97.81% <100.00%> (ø)`
src/onegov/org/cronjobs.py	`92.83% <ø> (ø)`
src/onegov/org/layout.py	`91.46% <100.00%> (+0.03%)`	⬆️
src/onegov/org/models/__init__.py	`100.00% <100.00%> (ø)`
src/onegov/org/models/ticket.py	`88.67% <ø> (ø)`
... and 12 more

... and 5 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10c9a81...95abc04. Read the comment docs.

Tschuppi81 · 2024-10-25T19:33:18Z

src/onegov/org/models/search.py

+            func.setweight(
+                func.to_tsvector(
+                    language,
+                    getattr(model.fts_idx_data, field, '')),


the weighted vector bases on the static data from column fts_idx_data generated upon update or reindex events.

src/onegov/org/models/search.py

Tschuppi81 · 2024-10-25T19:40:36Z

With this approach no additional hybrid_properties are needed.

Tschuppi81 · 2024-11-04T20:47:29Z

@Daverball Final review for postgres searching on separate views /search-postgres?q=test (not yet productive)

Daverball

It looks fairly close to a first version we can deploy, there are however some engineering decisions that don't make sense to me and harm performance significantly, so I would like you to revisit those problem areas.

src/onegov/org/models/search.py

Daverball · 2024-11-07T14:20:41Z

src/onegov/org/models/search.py

+        else:
+            results = self.generic_search()
+
+        return results[self.offset:self.offset + self.batch_size]


It's not ideal that we always retrieve all the results and then filter them. But I realize it may be difficult to do all the filtering and sorting in pure postgres and we'd still have to retrieve a full count of all the entries, so we're not saving so much in query time as we would in object translation overhead. But the latter may be significantly larger than the former for large result sets.

src/onegov/org/models/search.py

src/onegov/search/indexer.py

src/onegov/search/integration.py

Fix typo Co-authored-by: David Salvisberg <[email protected]>

Remove unnecessary call `all()` Co-authored-by: David Salvisberg <[email protected]>

Merge master

src/onegov/search/utils.py

Daverball

This is what I was hinting at, you need to do this in two steps in separate locations, you can't do it in one function.

src/onegov/search/integration.py

src/onegov/search/utils.py

Filter polymorphic query by polymorphic identity for Searchable models Co-authored-by: David Salvisberg <[email protected]>

rework base model filter Co-authored-by: David Salvisberg <[email protected]>

Tschuppi81 · 2025-01-20T09:08:32Z

@Daverball Could you please check my latest changes?

Daverball

The indexer looks a lot better now. But I think we still have fundamental problems with the actual search. I think it's time to consider alternative architectures, such as a single shared table for the search metadata, so we can do all the counting/filtering/slicing of the entries in the database on a single query. So we only actually have to go out and fetch the models we're actually displaying results for on the current page.

This should mean we now only have one potentially expensive fts query, with the rest turning into simple "Fetch these primary keys from table X and these others from table Y" queries that should be very fast.

Daverball · 2025-01-21T07:28:51Z

src/onegov/org/models/search.py

+    @cached_property
+    def available_documents(self) -> int:
+        if not self.number_of_docs:
+            _ = self.load_batch_results
+        return self.number_of_docs
+
+    @cached_property
+    def available_results(self) -> int:
+        if not self.number_of_results:
+            _ = self.load_batch_results
+        return self.number_of_results


I still don't know what this means, this should be one number and it should be the same regardless. If there's a difference, there's a bug.

Daverball · 2025-01-21T07:50:23Z

src/onegov/org/models/search.py

+            decay_rank = (
+                func.ts_rank_cd(model.fts_idx, ts_query, 0) *
+                    cast(func.pow(0.9,
+                        func.extract('epoch',
+                            func.now() - func.coalesce(
+                                cast(
+                                    model.fts_idx_data[
+                                        'es_last_change'].astext,
+                                    DateTime),
+                                func.now())
+                        ) / 86400),
+                        Numeric)
+            ).label('rank')


We may want to add an index for this expression (although I'm not sure how well database indexes hold up for time-based expressions) since I suspect, that this is quite slow to compute, especially since it relies on JSON data and string to date conversions. The other thing you can do to speed this up would be to store the epoch, rather than a string timestamp in es_last_change, so you don't need any data type conversions.

The other thing I'm not sure about is whether this will do the right thing always, since we store tz-naive dates into the database, NOW() will return in the database's default timezone, which could be configured to something other than UTC, which makes this a bit fragile.

It will work for us right now, but I'd prefer something more portable and robust.

Maybe it would be even better to calculate this rank during indexing and then have a daily/weekly/monthly cronjob to re-calculate the search rank. I think this would be more than precise enough, since new entries and recently changed entries will all bunch up with the same high search rank.

This way we can also encode things like the custom event sorting into this rank and have to do less work after we get our results.

It honestly might be best to define a new table for searching at this point that contains all the search metadata and a reference pointing to the original model. That way we can perform a search using a singular query and can do the counting, sorting and slicing of that query entirely on the database. Having to do this in Python will slow down things by a lot for large instances with search terms that return many results.

Daverball · 2025-01-21T08:02:08Z

src/onegov/org/models/search.py

+        self.search_models = {
+            model for base in self.request.app.session_manager.bases
+            for model in searchable_sqlalchemy_models(base)}


This is still not really correct for performing as few non-overlapping queries as possible. But if we go with a separate table for the search index, this should simplify away to some degree.

Tschuppi81 added 2 commits October 24, 2024 07:25

Introduce postgres search views and integration (start over from v2 n…

6ff58c9

…o hybrid properties)

Introduce postgres search views and integration (start over from v2 n…

18658fb

…o hybrid properties)

Tschuppi81 added 3 commits October 25, 2024 11:58

Put only upcoming events to the top of search results (instead of all…

d0aad65

… events)

Revert several changes

0dd6e57

store static search property data in separate column. Determine publi…

0e165e3

…c documents in python rather than psql

Tschuppi81 added 3 commits October 25, 2024 14:26

Fix linter issues

a3d2a4b

Fix fsi search views and add tests

54d9860

fix tests

2e3816d

Tschuppi81 commented Oct 25, 2024

View reviewed changes

src/onegov/org/models/search.py Outdated Show resolved Hide resolved

Tschuppi81 marked this pull request as ready for review October 25, 2024 19:40

Tschuppi81 requested a review from Daverball October 25, 2024 19:40

Tschuppi81 added 8 commits October 28, 2024 07:23

Adds to index data and filter for it during search

f4b87d0

Exclude members from seeing non es_public documents

333f57a

Ensure member finds documents with access level 'member'

7e652e4

Resolve mypy issues

b021ef4

Fix statement for members

f75fccf

Removed unused import

e2f9e00

Fix member / manager filters

037d6ce

Ignore mypy truthy-function and unreachable

41b2714

Extend search tests

0eef94f

Tschuppi81 force-pushed the ogc-508-replace-elastic-search-by-postgres-v3 branch from a95eef6 to 0eef94f Compare November 7, 2024 14:45

Daverball requested changes Nov 7, 2024

View reviewed changes

Tschuppi81 added 3 commits November 11, 2024 14:50

Revert renaming

8e50ff5

Test for instance to identify events

e11f317

Renaming variables

1ed1d1d

Tschuppi81 and others added 10 commits November 12, 2024 05:00

Update src/onegov/org/models/search.py

b168a85

Fix typo Co-authored-by: David Salvisberg <[email protected]>

Update src/onegov/org/models/search.py

39313d0

Remove unnecessary call `all()` Co-authored-by: David Salvisberg <[email protected]>

Performance: use jsonb instead of json

14bf8f2

Rework search suggestions

c0f95ed

Fix static number of priorities for properties

2bdd14c

Merge master

8ab1dee

mend

d7f0958

Merge master

Filter search models removing base classes

7c82279

Cleanup

b13f212

Also compare table name

d34584f

Daverball reviewed Dec 3, 2024

View reviewed changes

src/onegov/search/utils.py Outdated Show resolved Hide resolved

Index only searchable base classes removing duplicates in search results

2db81c5

Daverball requested changes Dec 13, 2024

View reviewed changes

src/onegov/search/integration.py Outdated Show resolved Hide resolved

src/onegov/search/utils.py Outdated Show resolved Hide resolved

Tschuppi81 and others added 13 commits December 16, 2024 06:28

Update src/onegov/search/integration.py

324ebbe

Filter polymorphic query by polymorphic identity for Searchable models Co-authored-by: David Salvisberg <[email protected]>

Update src/onegov/search/utils.py

e449cd7

rework base model filter Co-authored-by: David Salvisberg <[email protected]>

Fix syntax

11e93ff

Revert filter for base models for search

5191975

Rework test

50ac000

Merge master

cad0655

Move weighted tsvector to indexing step

1f142d9

Adjust search

d3912d3

Adds a time decay function to search results

127bc26

Merge branch 'master' into ogc-508-replace-elastic-search-by-postgres-v3

a68d9fd

Fix test

775c48e

Resolve merge conflicts

61f3268

Town6 is now in h5 tag, simplify strip

95abc04

Daverball requested changes Jan 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ogc 508 replace elastic search by postgres v3 #1559

Ogc 508 replace elastic search by postgres v3 #1559

Tschuppi81 commented Oct 25, 2024 •

edited

Loading

linear bot commented Oct 25, 2024

codecov bot commented Oct 25, 2024 •

edited

Loading

Tschuppi81 Oct 25, 2024

Tschuppi81 commented Oct 25, 2024

Tschuppi81 commented Nov 4, 2024

Daverball left a comment

Daverball Nov 7, 2024

Daverball left a comment

Tschuppi81 commented Jan 20, 2025

Daverball left a comment

Daverball Jan 21, 2025

Daverball Jan 21, 2025

Daverball Jan 21, 2025

Ogc 508 replace elastic search by postgres v3 #1559

Are you sure you want to change the base?

Ogc 508 replace elastic search by postgres v3 #1559

Conversation

Tschuppi81 commented Oct 25, 2024 • edited Loading

Checklist

linear bot commented Oct 25, 2024

codecov bot commented Oct 25, 2024 • edited Loading

Codecov Report

Tschuppi81 Oct 25, 2024

Choose a reason for hiding this comment

Tschuppi81 commented Oct 25, 2024

Tschuppi81 commented Nov 4, 2024

Daverball left a comment

Choose a reason for hiding this comment

Daverball Nov 7, 2024

Choose a reason for hiding this comment

Daverball left a comment

Choose a reason for hiding this comment

Tschuppi81 commented Jan 20, 2025

Daverball left a comment

Choose a reason for hiding this comment

Daverball Jan 21, 2025

Choose a reason for hiding this comment

Daverball Jan 21, 2025

Choose a reason for hiding this comment

Daverball Jan 21, 2025

Choose a reason for hiding this comment

Tschuppi81 commented Oct 25, 2024 •

edited

Loading

codecov bot commented Oct 25, 2024 •

edited

Loading