Handle follow-up or clarifying questions in RAG scenarios (with ease)
Posted by AdditionalWeb107@reddit | LocalLLaMA | View on Reddit | 3 comments
There several threads here on reddit like this [one](https://www.reddit.com/r/LocalLLaMA/comments/18mqwg6/best_practice_for_rag_with_followup_chat/) and this [one](https://www.reddit.com/r/LangChain/comments/1djcvh0/chat_history_for_rag_how_to_search_for_follow_up/) that highlight challenges with effectively handling follow-up questions from a user, especially in RAG scenarios. Specifically, these are multi-turn conversations that can range from
**Adjusting a Retrieval**
**User:** What are the benefits of renewable energy?
**Assistant:** Renewable energy reduces greenhouse gas emissions, lowers air pollution, and provides sustainable power sources like solar and wind....
**User:** Include cost considerations in the response.
**OR.....**
# Clarifying a Response
**User:** Can you tell me about the history of the internet?
**Assistant:** The internet was developed from research programs like ARPANET in the late 1960s....
**User:** Can you focus on how ARPANET worked?
**OR...**
# Switching Intent
**User:** What are the symptoms of diabetes?
**Assistant:** Common symptoms include frequent urination, excessive thirst, fatigue, and blurry vision.
**User:** How is it diagnosed?
Most of these scenarios requires carefully crafting, editing and optimizing prompts to an LLM to rewrite the follow-up query, extract relevant contextual information and then trigger retrieval to answer the question. The whole process is slow, error prone and adds significant latency.
[Arch](https://github.com/katanemo/archgw) (an intelligent gateway for agents) pushed out an update (0.1.7) to accurately handle multi-turn intent, extracting relevant contextual information and calling downstream developer APIs (aka function calling) in <500ms! Arch is an open source infrastructure gateway for agents so that developers can focus on what matters most. Arch is engineered with purpose-built (fast) LLMs for the seamless integration of prompts with APIs (among other things). More details on how that multi-turn works: [https://docs.archgw.com/build\_with\_arch/multi\_turn.html](https://docs.archgw.com/build_with_arch/multi_turn.html) and you can run the demo here: [https://github.com/katanemo/archgw/tree/main/demos/multi\_turn\_rag\_agent](https://github.com/katanemo/archgw/tree/main/demos/multi_turn_rag_agent)
The high-level architecture and request flow looks like this, and below is a sample multi-turn interaction that it can help developers build quickly.
[Prompt to API processing handled via Arch Gateway](https://preview.redd.it/s61q7r39ho8e1.png?width=2626&format=png&auto=webp&s=97a4827bdc86663bbf52a8524a2d6e8f677d7c98)
[Example of a multi-turn response handled via Arch](https://preview.redd.it/407oqppxeo8e1.png?width=1064&format=png&auto=webp&s=72ccdd6020de6ce229199e69727f01eeb1ae072b)
**Disclaimer**: I am one of the core contributors to [https://github.com/katanemo/archgw](https://github.com/katanemo/archgw) \- and would love to answer any questions you may have.
3 Comments
paskalby@reddit
AdditionalWeb107@reddit (OP)
Difficult-Opening-65@reddit