admin管理员组

文章数量:1122832

I am writing a function to index some resources using llamaindex and Milvus for the vector db.

When storing the data, I also include metadata for each resource that is ingested. I am trying to understand what is the correct way to avoid re-indexing all the documents every time I call my function. Only the documents missing from the index should be included. The idea was to use an id I am keeping in my metadata.

This is how I ingest and persist my data without checking if a document is already indexed:

documents = SimpleDirectoryReader(
        input_files=get_content_paths_list()
        file_metadata=get_metadata_paths_list(),
    ).load_data()

Settings.embed_model = HuggingFaceEmbedding(model_name="dunzhang/stella_en_1.5B_v5")

# ollama
Settings.llm = Ollama(model="llama3.2", request_timeout=360.0)

storage_context = StorageContext.from_defaults(
        vector_store=get_or_create_collection(dim=1024, collection_name="my_collection")
    )

index = VectorStoreIndex.from_documents(
        documents, storage_context=storage_context, show_progress=True
    )

本文标签: llama indexSearch Milvus db before reindexing a documentStack Overflow