Skip to content

Commit 4490619

Browse files
[DOCS] Adds pre-cleaning recommendation to ELSER docs. (#2796) (#2798)
(cherry picked from commit 34a6c7b) Co-authored-by: István Zoltán Szabó <[email protected]>
1 parent cc20c5a commit 4490619

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -440,6 +440,17 @@ per document to ingest.
440440
To learn more about ELSER performance, refer to the <<elser-benchmarks>>.
441441

442442

443+
[discrete]
444+
[[pre-cleaning]]
445+
== Pre-cleaning input text
446+
447+
The quality of the input text significantly affects the quality of the embeddings.
448+
To achieve the best results, it's recommended to clean the input text before generating embeddings.
449+
The exact preprocessing you may need to do heavily depends on your text.
450+
For example, if your text contains HTML tags, use the {ref}/htmlstrip-processor.html[HTML strip processor] in an ingest pipeline to remove unnecessary elements.
451+
Always review and clean your input text before ingestion to eliminate any irrelevant entities that might affect the results.
452+
453+
443454
[discrete]
444455
[[elser-adaptive-allocations]]
445456
== Adaptive allocations

0 commit comments

Comments
 (0)