From 9527749f50e3d3c6f52a44ba814c8b1182fdac31 Mon Sep 17 00:00:00 2001 From: "mergify[bot]" <37929162+mergify[bot]@users.noreply.github.com> Date: Tue, 30 Jan 2024 09:19:03 +0100 Subject: [PATCH] Suggest chunking for large ELSER fields (#2660) (#2662) (cherry picked from commit f4dacc9dd2b116377ceea3c2707ad1f97356f582) Co-authored-by: Sean Story --- docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc index 49973223b..ddfc41275 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc @@ -397,9 +397,11 @@ image::images/ml-nlp-elser-v2-test.png[alt="Testing ELSER",align="center"] * ELSER works best on small-to-medium sized fields that contain natural language. For connector or web crawler use cases, this aligns best with fields like _title_, _description_, _summary_, or _abstract_. As ELSER encodes the -first 512 tokens of a field, it may not be as good a match for `body_content` on -web crawler documents, or body fields resulting from extracting text from office -documents with connectors. +first 512 tokens of a field, it may not provide as relevant of results for large +fields. For example, `body_content` on web crawler documents, or body fields +resulting from extracting text from office documents with connectors. For larger +fields like these, consider "chunking" the content into multiple values, where +each chunk can be under 512 tokens. * Larger documents take longer at ingestion time, and {infer} time per document also increases the more fields in a document that need to be processed. * The more fields your pipeline has to perform inference on, the longer it takes @@ -510,4 +512,4 @@ image::images/ml-nlp-elser-v2-opt-bm-results.png[alt="ELSER V2 optimized benchma respectively 14 docs/s and 16 docs/s, indicating a performance improvement due to virtual cores of 12%. -image::images/ml-nlp-elser-v2-cp-bm-results.png[alt="ELSER V2 cross-platform benchmarks",align="center"] \ No newline at end of file +image::images/ml-nlp-elser-v2-cp-bm-results.png[alt="ELSER V2 cross-platform benchmarks",align="center"]