admin管理员组文章数量:1122832
I am writing a function to index some resources using llamaindex and Milvus for the vector db.
When storing the data, I also include metadata for each resource that is ingested. I am trying to understand what is the correct way to avoid re-indexing all the documents every time I call my function. Only the documents missing from the index should be included. The idea was to use an id I am keeping in my metadata.
This is how I ingest and persist my data without checking if a document is already indexed:
documents = SimpleDirectoryReader(
input_files=get_content_paths_list()
file_metadata=get_metadata_paths_list(),
).load_data()
Settings.embed_model = HuggingFaceEmbedding(model_name="dunzhang/stella_en_1.5B_v5")
# ollama
Settings.llm = Ollama(model="llama3.2", request_timeout=360.0)
storage_context = StorageContext.from_defaults(
vector_store=get_or_create_collection(dim=1024, collection_name="my_collection")
)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, show_progress=True
)
本文标签: llama indexSearch Milvus db before reindexing a documentStack Overflow
版权声明:本文标题:llama index - Search Milvus db before re-indexing a document - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736304624a1932300.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论