admin管理员组文章数量:1122846
I'm working on a chatbot using LlamaIndex based on Ollama LLM. I have a set of pdf files, I'm creating a chatbot to read those files and answer the queries from those. Initially, I used this model
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
but it took a lot of time to index.
Now I'm using this
model_name="distilbert-base-uncased"
this "distilbert-base-uncased" one reduced my index timing but the answers I'm getting for my queries are not great. Also, the query processing time is longer (1 min to 3 mins).
I want to improve my query processing time and better answers. I'm new to this field(LLM), so I apologize if my explanation is unclear.
My Output with timing:
Starting to read documents... Time taken for document loading: 38.18 seconds Document reading completed. Setting up the embedding and language models... No sentence-transformers model found with name distilbert-base-uncased. Creating a new one with mean pooling. Time taken for embedding model and language model setup: 1.84 seconds Model setup completed.Starting index creation... Time taken for index creation: 305.35 seconds Index creation completed. Enter your query (or type 'exit' to quit): what is Generics and give me the example program for Generic Classes? Do you want to proceed with this query? (yes to proceed, no to skip): yes Response: The ability .....
Query executed in 100.83 seconds. Start Time: 2024-11-22 16:59:49 End Time: 2024-11-22 17:01:29
import time
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
def log_time(start_time, action):
"""Helper function to log the time taken for a specific action."""
elapsed_time = time.time() - start_time
print(f"Time taken for {action}: {elapsed_time:.2f} seconds")
return elapsed_time
# Load documents from the 'data' directory
def load_documents(directory):
print("Starting to read documents...")
start_time = time.time() # Start time for document loading
documents = SimpleDirectoryReader(directory).load_data()
log_time(start_time, "document loading")
print("Document reading completed.")
return documents
# Set up embedding model and language model
def setup_models():
print("Setting up the embedding and language models...")
start_time = time.time() # Start time for model setup
Settings.embed_model = HuggingFaceEmbedding(model_name="distilbert-base-uncased")
Settings.llm = Ollama(model="llama3", request_timeout=3600.0)
log_time(start_time, "embedding model and language model setup")
print("Model setup completed.")
# Create an index from the loaded documents
def create_index(documents):
print("Starting index creation...")
start_time = time.time() # Start time for index creation
index = VectorStoreIndex.from_documents(documents)
log_time(start_time, "index creation")
print("Index creation completed.")
return index
# Query the index
def query_engine_loop(query_engine):
while True:
user_query = input("Enter your query (or type 'exit' to quit): ")
if user_query.lower() == 'exit':
print("Exiting the query engine.")
break
confirm = input("Do you want to proceed with this query? (yes to proceed, no to skip): ").strip().lower()
if confirm == 'yes':
start_time = time.time() # Start time for query execution
response = query_engine.query(user_query)
end_time = time.time() # End time for query execution
print("Response:", response)
print(f"Query executed in {end_time - start_time:.2f} seconds.")
print(f"Start Time: {time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(start_time))}")
print(f"End Time: {time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(end_time))}")
elif confirm == 'no':
print("Query skipped.")
else:
print("Invalid input. Please type 'yes' or 'no'.")
def main():
# Start by loading documents
documents = load_documents("data")
# Setup models
setup_models()
# Create index
index = create_index(documents)
# Create a query engine from the index
query_engine = index.as_query_engine()
# Start querying loop
query_engine_loop(query_engine)
if __name__ == "__main__":
main()
Thanks!!!
本文标签:
版权声明:本文标题:python - How to improve query execution timing and indexing better in Ollama with llama_index Using a Local Model? - Stack Overf 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736304033a1932091.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论