Add Reactome user guide Q&A with intent-based routing.#166
Open
heliamoh wants to merge 4 commits into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new “Reactome user guide” knowledge source to the chatbot, including an ingestion/embedding pipeline and an intent router so UI/how-to questions can be answered from Reactome’s website documentation rather than the biological knowledgebase.
Changes:
- Introduces user guide HTML fetching + section chunking to generate Chroma embeddings via
embeddings_manager. - Adds a user guide RAG chain (with citation-oriented prompting) and an intent classifier to route queries between
reactomevsuserguide. - Updates safety + rephrase prompts to treat Reactome UI/how-to questions as in-scope, and updates docs/dependencies accordingly.
Reviewed changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/retrievers/userguide/retriever.py | Creates a Chroma-backed retriever for user guide section embeddings. |
| src/retrievers/userguide/rag.py | Builds a user guide RAG chain wired to the user guide retriever + prompt. |
| src/retrievers/userguide/prompt.py | Adds a citation-strict prompt for user guide Q&A. |
| src/retrievers/rag_chain.py | Extends the shared RAG chain factory to accept an optional document_prompt. |
| src/data_generation/userguide/urls.py | Defines canonical Reactome user guide URLs to ingest. |
| src/data_generation/userguide/html_loader.py | Loads cached HTML and splits pages into section-level Documents for embedding. |
| src/data_generation/userguide/fetch.py | Downloads user guide HTML pages with a cache and polite request pacing. |
| src/data_generation/userguide/init.py | Implements user guide embedding generation and Chroma persistence. |
| src/agent/tasks/safety_checker.py | Expands “relevance” to include Reactome UI/how-to questions. |
| src/agent/tasks/rephrase.py | Prevents UI/how-to questions from being rewritten into biology/mechanism questions. |
| src/agent/tasks/intent_classifier.py | Adds an LLM-based classifier to route queries to reactome vs userguide. |
| src/agent/profiles/react_to_me.py | Integrates intent routing and dynamically enables user guide RAG when embeddings exist. |
| README.md | Documents date-based versioning for user guide embeddings. |
| pyproject.toml | Adds HTML parsing dependencies for ingestion. |
| poetry.lock | Locks new dependencies and their transitive requirements. |
| docs/embeddings_manager.md | Documents embeddings_manager make .../userguide/<YYYY-MM> usage and --force. |
| bin/embeddings_manager | Adds userguide target and ensures embeddings archive directory exists before pulling. |
Comments suppressed due to low confidence (1)
src/agent/profiles/react_to_me.py:158
generate_answerindexesstate["chat_history"], but the initial graph invocation passes onlyuser_input(AgentGraph.ainvoke uses InputState(user_input=...)), sochat_historymay be absent on the first turn and this can raise a KeyError. Usestate.get("chat_history")with an empty-list/seed-message fallback instead.
state["chat_history"]
if state["chat_history"]
else [HumanMessage(state["user_input"])]
),
},
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
embeddings_managerExamples