admin管理员组

文章数量:1122846

I'm facing an issue in streaming the response for below ConversationalRetrievalChain used in RAG Application in python.This RAG application used pgVector Embedding and langchain as LLM framework.I'm using fastapi for build this RAG. Without streaming RAG seems to be working normally. I'm unable to figure out how to stream using ConversationalRetrievalChain.

note : ConversationalRetrievalChain is deprecated but still should support streaming.


bm25_retriever = get_bm25_retriever(index[1])
llm_reranker = LLMChainExtractor.from_llm(llm)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=llm_reranker,
    base_retriever=bm25_retriever
) 

llm = ChatBedrock(
                model_id="anthropic.claude-3-5-sonnet",
                model_kwargs=model_kwargs,
                client=client,
                streaming=True,
                callbacks=[FinalStreamingStdOutCallbackHandler()]
            )

conversation_with_retrieval = ConversationalRetrievalChain.from_llm(
                llm,
                compression_retriever,
                chain_type="stuff",
                memory=memory,
                get_chat_history=lambda h :h,
                return_source_documents=False,
                verbose=True,
                combine_docs_chain_kwargs={"prompt": prompt.partial(format_instructions=parser.get_format_instructions())}
            )

I've tried many methods in ConversationalRetreivalChain

  1. stream : for chunk in conversation_with_retrieval.stream(input_text)

  2. astream: for chunk in conversation_with_retrieval.stream(input_text)

  3. also tried runnablePassthrough, runnableSequence.

  4. Also tried removing the ConversationalRetreivalChain replacing it with create retrieval chain, that gives a peculiar issue ( streams after generating the complete response instead of streming on the go chunk by chunk )

If anyone can please help...

本文标签: pythonStreaming Response with ConversationalRetrievalChainStack Overflow