Handle follow-up or clarifying questions in RAG scenarios (with ease)

Posted by AdditionalWeb107@reddit | LocalLLaMA | View on Reddit | 3 comments

There several threads here on reddit like this [one](https://www.reddit.com/r/LocalLLaMA/comments/18mqwg6/best_practice_for_rag_with_followup_chat/) and this [one](https://www.reddit.com/r/LangChain/comments/1djcvh0/chat_history_for_rag_how_to_search_for_follow_up/) that highlight challenges with effectively handling follow-up questions from a user, especially in RAG scenarios. Specifically, these are multi-turn conversations that can range from **Adjusting a Retrieval** **User:** What are the benefits of renewable energy? **Assistant:** Renewable energy reduces greenhouse gas emissions, lowers air pollution, and provides sustainable power sources like solar and wind.... **User:** Include cost considerations in the response. **OR.....** # Clarifying a Response **User:** Can you tell me about the history of the internet? **Assistant:** The internet was developed from research programs like ARPANET in the late 1960s.... **User:** Can you focus on how ARPANET worked? **OR...** # Switching Intent **User:** What are the symptoms of diabetes? **Assistant:** Common symptoms include frequent urination, excessive thirst, fatigue, and blurry vision. **User:** How is it diagnosed? Most of these scenarios requires carefully crafting, editing and optimizing prompts to an LLM to rewrite the follow-up query, extract relevant contextual information and then trigger retrieval to answer the question. The whole process is slow, error prone and adds significant latency. [Arch](https://github.com/katanemo/archgw) (an intelligent gateway for agents) pushed out an update (0.1.7) to accurately handle multi-turn intent, extracting relevant contextual information and calling downstream developer APIs (aka function calling) in <500ms! Arch is an open source infrastructure gateway for agents so that developers can focus on what matters most. Arch is engineered with purpose-built (fast) LLMs for the seamless integration of prompts with APIs (among other things). More details on how that multi-turn works: [https://docs.archgw.com/build\_with\_arch/multi\_turn.html](https://docs.archgw.com/build_with_arch/multi_turn.html) and you can run the demo here: [https://github.com/katanemo/archgw/tree/main/demos/multi\_turn\_rag\_agent](https://github.com/katanemo/archgw/tree/main/demos/multi_turn_rag_agent) The high-level architecture and request flow looks like this, and below is a sample multi-turn interaction that it can help developers build quickly. [Prompt to API processing handled via Arch Gateway](https://preview.redd.it/s61q7r39ho8e1.png?width=2626&format=png&auto=webp&s=97a4827bdc86663bbf52a8524a2d6e8f677d7c98) [Example of a multi-turn response handled via Arch](https://preview.redd.it/407oqppxeo8e1.png?width=1064&format=png&auto=webp&s=72ccdd6020de6ce229199e69727f01eeb1ae072b) **Disclaimer**: I am one of the core contributors to [https://github.com/katanemo/archgw](https://github.com/katanemo/archgw) \- and would love to answer any questions you may have.