Handle follow-up or clarifying questions in RAG scenarios (with ease)

Posted by AdditionalWeb107@reddit | LocalLLaMA | View on Reddit | 3 comments

There several threads here on reddit like this [one](https://www.reddit.com/r/LocalLLaMA/comments/18mqwg6/best_practice_for_rag_with_followup_chat/) and this [one](https://www.reddit.com/r/LangChain/comments/1djcvh0/chat_history_for_rag_how_to_search_for_follow_up/) that highlight challenges with effectively handling follow-up questions from a user, especially in RAG scenarios. Specifically, these are multi-turn conversations that can range from **Adjusting a Retrieval** **User:** What are the benefits of renewable energy? **Assistant:** Renewable energy reduces greenhouse gas emissions, lowers air pollution, and provides sustainable power sources like solar and wind.... **User:** Include cost considerations in the response. **OR.....** # Clarifying a Response **User:** Can you tell me about the history of the internet? **Assistant:** The internet was developed from research programs like ARPANET in the late 1960s.... **User:** Can you focus on how ARPANET worked? **OR...** # Switching Intent **User:** What are the symptoms of diabetes? **Assistant:** Common symptoms include frequent urination, excessive thirst, fatigue, and blurry vision. **User:** How is it diagnosed? Most of these scenarios requires carefully crafting, editing and optimizing prompts to an LLM to rewrite the follow-up query, extract relevant contextual information and then trigger retrieval to answer the question. The whole process is slow, error prone and adds significant latency. [Arch](https://github.com/katanemo/archgw) (an intelligent gateway for agents) pushed out an update (0.1.7) to accurately handle multi-turn intent, extracting relevant contextual information and calling downstream developer APIs (aka function calling) in <500ms! Arch is an open source infrastructure gateway for agents so that developers can focus on what matters most. Arch is engineered with purpose-built (fast) LLMs for the seamless integration of prompts with APIs (among other things). More details on how that multi-turn works: [https://docs.archgw.com/build\_with\_arch/multi\_turn.html](https://docs.archgw.com/build_with_arch/multi_turn.html) and you can run the demo here: [https://github.com/katanemo/archgw/tree/main/demos/multi\_turn\_rag\_agent](https://github.com/katanemo/archgw/tree/main/demos/multi_turn_rag_agent) The high-level architecture and request flow looks like this, and below is a sample multi-turn interaction that it can help developers build quickly. [Prompt to API processing handled via Arch Gateway](https://preview.redd.it/s61q7r39ho8e1.png?width=2626&format=png&auto=webp&s=97a4827bdc86663bbf52a8524a2d6e8f677d7c98) [Example of a multi-turn response handled via Arch](https://preview.redd.it/407oqppxeo8e1.png?width=1064&format=png&auto=webp&s=72ccdd6020de6ce229199e69727f01eeb1ae072b) **Disclaimer**: I am one of the core contributors to [https://github.com/katanemo/archgw](https://github.com/katanemo/archgw) \- and would love to answer any questions you may have.

3 Comments

[-]

paskalby@reddit

Thanks, it seems to be something very promettende. I would like to try to integrate it into a project where multi-turn is crucial, but I have the requirement of cloud deployment (Azure Foundry or Google AI). What experiences are there on this front?

AdditionalWeb107@reddit (OP)

We are actively working with customers who are deploying our bits to Azure. If you want it would be great to have you join our discord channel (link here: https://github.com/katanemo/archgw?tab=readme-ov-file#Contact) and we can chat a bit more about your design and application goals?

Handle follow-up or clarifying questions in RAG scenarios (with ease)

Reply to Post

3 Comments

paskalby@reddit

AdditionalWeb107@reddit (OP)

Difficult-Opening-65@reddit