Problem: Your Team's Knowledge Is Scattered Everywhere
Specs live in Notion. Bug context lives in Jira. Decisions live in Slack. When someone asks "why did we build it this way?", you spend 20 minutes searching three tools.
This tutorial builds a single RAG pipeline that indexes all three and answers questions across them.
You'll learn:
- How to pull data from Notion, Slack, and Jira using LlamaIndex readers
- How to merge multiple indexes into one queryable
ComposableGraph - How to run it locally with Ollama or swap in OpenAI
Time: 20 min | Level: Intermediate
Why This Happens
LlamaIndex treats each data source as a separate VectorStoreIndex by default. Querying across sources requires composing them — which LlamaIndex supports natively but doesn't wire up automatically.
What you're building:
Notion pages ──┐
Slack messages ─┼──► LlamaIndex ComposableGraph ──► Single query endpoint
Jira issues ───┘
Common pain points this solves:
- "Which Slack thread explained why we deprecated that endpoint?" — answered
- "What's the Jira ticket status for the feature in this Notion spec?" — answered
- Having to log into three tools to answer one question
Solution
Step 1: Install Dependencies
pip install llama-index llama-index-readers-notion \
llama-index-readers-slack llama-index-readers-jira \
llama-index-llms-ollama python-dotenv
Create a .env file:
NOTION_INTEGRATION_TOKEN=secret_xxx
SLACK_BOT_TOKEN=xoxb-xxx
JIRA_SERVER=https://yourcompany.atlassian.net
JIRA_EMAIL=you@company.com
JIRA_API_TOKEN=xxx
Expected: No errors. If you hit dependency conflicts, use a virtualenv — LlamaIndex has transitive deps that can clash with older projects.
Step 2: Load Data From Each Source
import os
from dotenv import load_dotenv
from llama_index.readers.notion import NotionPageReader
from llama_index.readers.slack import SlackReader
from llama_index.readers.jira import JiraReader
load_dotenv()
# Notion: pass a list of page IDs to index
notion_reader = NotionPageReader(
integration_token=os.environ["NOTION_INTEGRATION_TOKEN"]
)
notion_docs = notion_reader.load_data(
page_ids=["your-page-id-1", "your-page-id-2"]
)
# Slack: pulls last N days of messages from specified channels
slack_reader = SlackReader(
slack_token=os.environ["SLACK_BOT_TOKEN"],
earliest_date_timestamp=None, # None = all history
)
slack_docs = slack_reader.load_data(
channel_ids=["C0XXXXXXX", "C0YYYYYYY"] # Your channel IDs
)
# Jira: JQL query gives you fine-grained control
jira_reader = JiraReader(
email=os.environ["JIRA_EMAIL"],
api_token=os.environ["JIRA_API_TOKEN"],
server_url=os.environ["JIRA_SERVER"],
)
jira_docs = jira_reader.load_data(
query="project = ENG AND updated >= -30d ORDER BY updated DESC"
)
print(f"Loaded: {len(notion_docs)} Notion, {len(slack_docs)} Slack, {len(jira_docs)} Jira")
Expected output:
Loaded: 12 Notion, 847 Slack, 203 Jira
If it fails:
- Notion 401: Make sure your integration is connected to the pages you're trying to read (Notion's "Share" panel → invite your integration)
- Slack empty results: Bot needs
channels:historyandchannels:readOAuth scopes - Jira 403: API tokens require "Browse Projects" permission on the project
Step 3: Build Separate Indexes Per Source
from llama_index.core import VectorStoreIndex, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
# Using Ollama locally — swap for OpenAI if you prefer
Settings.llm = Ollama(model="llama3.2", request_timeout=120.0)
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")
# Each source gets its own index
# This lets you query sources independently later if needed
notion_index = VectorStoreIndex.from_documents(notion_docs)
slack_index = VectorStoreIndex.from_documents(slack_docs)
jira_index = VectorStoreIndex.from_documents(jira_docs)
print("Indexes built.")
Why separate indexes? It lets LlamaIndex route queries to the most relevant source before synthesizing a final answer. A single merged index loses this routing capability.
Expected: Embedding calls will take 30-90 seconds depending on document count and your hardware.
Step 4: Compose Into a Single Graph
from llama_index.core import ComposableGraph
from llama_index.core.indices.keyword_table import SimpleKeywordTableIndex
# Summary descriptions tell the router what each index contains
# These are critical — vague summaries produce poor routing
index_summaries = [
"Contains Notion pages: product specs, design docs, meeting notes, and RFCs.",
"Contains Slack messages: team discussions, decisions, and async Q&A threads.",
"Contains Jira issues: bug reports, feature requests, task status, and sprint history.",
]
graph = ComposableGraph.from_indices(
SimpleKeywordTableIndex, # Top-level router index
[notion_index, slack_index, jira_index],
index_summaries=index_summaries,
max_keywords_per_chunk=10,
)
print("Graph composed.")
If routing feels off: Rewrite your index_summaries to be more specific. The router uses these descriptions to decide which index to query — they're the most impactful tuning lever you have.
Step 5: Query Across All Sources
query_engine = graph.as_query_engine(
verbose=True # Shows which index was queried
)
# Try a cross-source question
response = query_engine.query(
"What was the final decision on authentication approach, "
"and is there a Jira ticket tracking the implementation?"
)
print(response)
Expected output:
> Querying with idx: notion_index...
> Querying with idx: jira_index...
Based on the Notion spec (Auth RFC v2), the team decided to use JWTs
with a 15-minute expiry. The implementation is tracked in ENG-4821
(currently In Progress, assigned to @dana).
Verification
Run a sanity check against a question you already know the answer to:
# Pick something you can verify manually
test_response = query_engine.query(
"What did the team decide about the database migration timeline?"
)
print(test_response)
Cross-check the answer against your actual Slack/Notion/Jira. If the answer is wrong, check:
- Was the relevant document loaded? (
print(len(notion_docs))etc.) - Is the source description in
index_summariesspecific enough? - For Slack: are older messages indexed? (check
earliest_date_timestamp)
Persisting the Index (Optional but Recommended)
Re-embedding on every run is slow. Persist to disk:
from llama_index.core import StorageContext, load_index_from_storage
# Save
notion_index.storage_context.persist("./storage/notion")
slack_index.storage_context.persist("./storage/slack")
jira_index.storage_context.persist("./storage/jira")
# Load next time (skip re-embedding)
storage_context = StorageContext.from_defaults(persist_dir="./storage/notion")
notion_index = load_index_from_storage(storage_context)
Set up a nightly cron to refresh Slack and Jira (they change daily). Notion pages change less often — weekly refresh is usually fine.
What You Learned
- LlamaIndex readers abstract away API auth and pagination for each source
ComposableGraphroutes queries to the right index using summary descriptions you write- Index summaries are the most important tuning knob — be specific about what each source contains
- Limitation:
ComposableGraphdoes keyword routing, not semantic routing. For production, considerRouterQueryEnginewith an LLM-based selector for better accuracy - When not to use this: If your team's Slack has millions of messages, embedding all of it is expensive. Filter by channel or date range aggressively
Next step: Wrap query_engine in a FastAPI endpoint and point your team's Slack bot at it.
Tested on LlamaIndex 0.12.x, Python 3.12, Ollama 0.5, macOS & Ubuntu 24.04