Torus is a plug-and-play Elixir library that seamlessly integrates PostgreSQL's search into Ecto, streamlining the construction of advanced search queries.
The package can be installed by adding torus
to your list of dependencies in mix.exs
:
def deps do
[
{:torus, "~> 0.3"}
]
end
Then, in any query, you can (for example) add a prefixed full-text search:
import Torus
# ...
Post
# ... your complex query
|> Torus.full_text([p], [p.title, p.body], "uncove hogwar")
|> select([p], p.title)
|> Repo.all()
["Uncovered hogwarts"]
See full_text/5
for more details.
-
Pattern matching: Searches for a specific pattern in a string.
iex> insert_posts!(["Wand", "Magic wand", "Owl"]) ...> Post ...> |> Torus.ilike([p], [p.title], "wan%") ...> |> select([p], p.title) ...> |> Repo.all() ["Wand"]
See
like/5
,ilike/5
, andsimilar_to/5
for more details. -
Similarity: Searches for items that are closely alike based on attributes, often using measures like cosine similarity or Euclidean distance. Is great for fuzzy searching and ignoring typos in short texts.
iex> insert_posts!(["Hogwarts Secrets", "Quidditch Fever", "Hogwart’s Secret"]) ...> Post ...> |> Torus.similarity([p], [p.title], "hoggwarrds") ...> |> limit(2) ...> |> select([p], p.title) ...> |> Repo.all() ["Hogwarts Secrets", "Hogwart’s Secret"]
See
similarity/5
for more details. -
Text Search Vectors: Uses term-document matrix vectors for full-text search, enabling efficient querying and ranking based on term frequency. - PostgreSQL: Full Text Search. Is great for large datasets to quickly return relevant results.
iex> insert_post!(title: "Hogwarts Shocker", body: "A spell disrupts the Quidditch Cup.") ...> insert_post!(title: "Diagon Bombshell", body: "Secrets uncovered in the heart of Hogwarts.") ...> insert_post!(title: "Completely unrelated", body: "No magic here!") ...> Post ...> |> Torus.full_text([p], [p.title, p.body], "uncov hogwar") ...> |> select([p], p.title) ...> |> Repo.all() ["Diagon Bombshell"]
See
full_text/5
for more details. -
Semantic Search: Understands the contextual meaning of queries to match and retrieve related content, often utilizing natural language processing. Semantic Search with PostgreSQL and OpenAI Embeddings
insert_post!(title: "Hogwarts Shocker", body: "A spell disrupts the Quidditch Cup.") insert_post!(title: "Diagon Bombshell", body: "Secrets uncovered in the heart of Hogwarts.") insert_post!(title: "Completely unrelated", body: "No magic here!") embedding_vector = Torus.to_vector("A magic school in the UK") Post |> Torus.full_text([p], p.embedding, embedding_vector) |> select([p], p.title) |> Repo.all() ["Diagon Bombshell"]
See
semantic/5
for more details. -
Hybrid Search: Combines multiple search techniques (e.g., keyword and semantic) to leverage their strengths for more accurate results.
Will be added soon.
-
3rd Party Engines/Providers: Utilizes external services or software specifically designed for optimized and scalable search capabilities, such as Elasticsearch or Algolia.
Torus is designed to be as efficient and relevant as possible from the start. But handling large datasets and complex search queries tends to be tricky. The best way to combine these two to achieve the best result is to:
- Create a query that returns as relevant results as possible (by tweaking the options of search function). If there is any option missing - feel free to open an issue/contribute back with it, or implement it manually using fragments.
- Test its performance on real production data - maybe it's good enough already?
- If it's not:
- See optimization sections for your search type in
Torus
docs - Inspect your query using
Torus.QueryInspector.tap_substituted_sql/3
orTorus.QueryInspector.tap_explain_analyze/3
- According to the above SQL - add indexes for the queried rows/vectors
- See optimization sections for your search type in
Torus offers a few helpers to debug, explain, and analyze your queries before using them on production. See Torus.QueryInspector
for more details.
For now, Torus supports pattern match, similarity, full-text, and semantic search, with plans to expand support further. These docs will be updated with more examples on which search type to choose and how to make them more performant (by adding indexes or using specific functions).
- Add support for highlighting search results. (Base off of a
ts_headline
function) - Extend similarity search to support
fuzzystrmatch
extension distance options.