Skip to content

Content de-duplication #1496

Answered by bogdankostic
sekh77 asked this question in Questions
Sep 23, 2021 · 4 comments · 22 replies
Discussion options

You must be logged in to vote

Hi @sekh77!

We have a MostSimilarDocumentsPipeline (see here) that allows you to find the most similar documents given one document. For creating document embeddings, you might want to use a sentence-transformers model (see here for details).

I hope this answers your question :)

Replies: 4 comments 22 replies

Comment options

You must be logged in to vote
2 replies
@sekh77
Comment options

@bogdankostic
Comment options

Answer selected by bogdankostic
Comment options

You must be logged in to vote
9 replies
@bogdankostic
Comment options

@sekh77
Comment options

@Timoeller
Comment options

@sekh77
Comment options

@sekh77
Comment options

Comment options

You must be logged in to vote
11 replies
@linedejgaard
Comment options

@sankalp-acl
Comment options

@JoeREISys
Comment options

@JoeREISys
Comment options

@sankalp-acl
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants