admin管理员组文章数量:1334373
I am creating a small application using spring-ai with mongodb-atlas (local docker container) to store the RAG data.
I want to "seed" the mongoDB with some content on the service start. The content is a list of documents with metadata. The problem is that this content will be inserted each time, the application starts and I have not found a way to prevent the insertion of duplicate data. I can't simply remove all data from the database, as I want to add data later that should be persisted and kept in there even when the service is restarted and maybe filled with different/newer presets.
Right now I'm trying something like that:
@Autowired
public void init(VectorStore vectorStore) {
List<Document> documents = List.of(
new Document("Once there was a little Girl",
Map.of("type", "init", "pos", "1", "plot", "1")),
new Document("The girls name was Mary",
Map.of("type", "init", "pos", "2", "plot", "1")),
new Document("Once there was a little Boy",
Map.of("type", "init", "pos", "1", "plot", "2")),
new Document("The boys name was Peter",
Map.of("type", "init", "pos", "2", "plot", "2")),
new Document("Peter was a wild kid",
Map.of("type", "init", "pos", "3", "plot", "2"))
);
List<String> collect = vectorStore.similaritySearch("type == 'init'")
.stream().map(Document::getId).collect(Collectors.toList());
vectorStore.delete(
collect
);
vectorStore.add(documents);
}
This doesn't work because there is one metadata map that is stored a bit differently (in mongoDB I can see that the order of fields in the metadata map is different somehow) and that row is not removed in the delete step. So with each start, this row is duplicated. The behaviour is pretty stable, When I change the value of type from init to story, a different row will escape deletion. This drives me mad...
I would like to have a way to provide initial data to the DB that may change when the service evolves, without filling up the DB with additional trash that presumably will lead to problems later. (I assume that will be tha case, but I'm not in a stage yet to verify that this will be a problem, nevertheless, it is anoying)
Has anyone solved something similar?
本文标签: javaInitializing RAG using vectorstore without duplicatesStack Overflow
版权声明:本文标题:java - Initializing RAG using vectorstore without duplicates - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742342365a2456858.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论