Use cases

Ways teams are
building with Pellumin.

Six concrete scenarios — with code, architecture, and the outcome you should expect — covering the patterns we see most.

Customer Support

AI support copilot that grounds every answer

Cut median response time in half — without giving up on accuracy.

Read scenario →
Autonomous Agents

A research agent that delivers reports, not bullet points

Plan → search → rank → synthesize, all behind a single endpoint.

Read scenario →
RAG / LLM apps

Live RAG that doesn't go stale

Skip the crawler, skip the vector DB, skip the re-indexing cron.

Read scenario →
Content & SEO

Briefing material your writers actually use

Hand every writer a structured, cited brief — generated in 20 seconds.

Read scenario →
Market Intelligence

A daily intelligence digest, on autopilot

Wake up to a digest of competitors, pricing changes, and industry moves.

Read scenario →
Internal Tools

Live web search for your internal copilots

Give every internal tool — dashboards, briefings, ops — a current-information superpower.

Read scenario →
Customer Support

AI support copilot that grounds every answer

Cut median response time in half — without giving up on accuracy.

Built for
Support engineersCX leadersProduct teams
First-reply time12 min
With Pellumin3 min

The problem

Your support team answers the same questions every day. Your docs cover most of them, but agents have to dig through five tabs to find the exact answer. Meanwhile, AI suggestion tools confidently hallucinate features that don't exist.

Why this is hard to build yourself

  • Static FAQ tools go stale within weeks of every release.
  • Public-LLM autocomplete invents endpoints, parameters, pricing — and your agents have to catch every one.
  • Building your own retrieval stack means standing up a crawler, an embedder, a vector store, and a query layer — a multi-quarter project.

How it works with Pellumin

  • Send each incoming ticket to Smart Search, scoped to your docs domain.
  • Display the cited answer and source cards directly in the agent's inbox.
  • Reply in two clicks, or one if the answer is clean enough to send as-is.

Architecture

  1. 1Inbound webhook (e.g. Zendesk) → your service
  2. 2Your service → Smart Search with `q = site:docs.example.com {ticket}`
  3. 3Render answer + source list in the agent UI
  4. 4Agent reviews, edits if needed, sends

Implementation (Python)

import os, httpx

API = "https://api.pellumin.com"
KEY = os.environ["PELLUMIN_API_KEY"]
DOCS = "docs.example.com"

def support_suggestion(ticket: str) -> dict:
    """Return a draft answer + source citations for a support ticket."""
    r = httpx.get(
        f"{API}/api/summary",
        params={"q": f"site:{DOCS} {ticket}"},
        headers={"Authorization": f"Bearer {KEY}"},
        timeout=20,
    )
    r.raise_for_status()
    return r.json()

suggestion = support_suggestion("How do I rotate my API key?")
# suggestion["answer"]  → markdown draft, ready to paste
# suggestion["results"] → docs links to send the user

Outcomes

Setup timeUnder a day
Per-reply cost≈ $0.016
Answer freshnessReal-time (every release)
Citations includedAlways

Autonomous Agents

A research agent that delivers reports, not bullet points

Plan → search → rank → synthesize, all behind a single endpoint.

Built for
Founders shipping AI productsAgent platform buildersR&D teams
Build vs. integrate1 quarter
With Pellumin1 afternoon

The problem

Your agent needs to produce structured, citable research on demand — not regurgitated paragraphs from a stale corpus. Stitching together a planner, multiple search calls, ranking, and extraction yourself burns weeks of engineering for a feature you'd rather not own.

Why this is hard to build yourself

  • Multi-step research is genuinely complex: planning, parallel I/O, ranking, extraction, synthesis — each a moving target.
  • Most search APIs return links, not answers. You still own the entire post-processing pipeline.
  • Token costs creep up as your agent spirals through iterations.

How it works with Pellumin

  • Hand a topic to Deep Search; let it plan its own sub-questions and run them in parallel.
  • Receive a structured Markdown report with a fixed shape and inline citations.
  • Show progress in your UI by reading sub-questions and source rankings from the response.

Architecture

  1. 1User asks a multi-faceted question
  2. 2Your agent decides depth is needed → calls Deep Search
  3. 3Deep Search internally plans 4–5 sub-questions, runs parallel searches, ranks the merged pool, extracts the top sources, and synthesizes a report
  4. 4You render the markdown report + citation chips in your UI

Implementation (Python)

import os, asyncio, httpx

API = "https://api.pellumin.com"
KEY = os.environ["PELLUMIN_API_KEY"]

async def deep_research(topic: str) -> dict:
    async with httpx.AsyncClient(timeout=90) as c:
        r = await c.get(
            f"{API}/api/research",
            params={"q": topic},
            headers={"Authorization": f"Bearer {KEY}"},
        )
        r.raise_for_status()
        return r.json()

report = asyncio.run(
    deep_research("compare modern rust web frameworks for production")
)
print(report["answer"])              # full markdown report
print(report["follow_up_questions"]) # planned sub-questions
print(len(report["results"]))        # ranked sources

Outcomes

Eng. weeks saved6–8 per agent
Per-report cost≈ $0.08
Time to first report10–30s
Sources per report~10 ranked

RAG / LLM apps

Live RAG that doesn't go stale

Skip the crawler, skip the vector DB, skip the re-indexing cron.

Built for
AI app buildersEmbedded chatbot teamsProduct engineers
Setup8–12 weeks
With Pellumin1 afternoon

The problem

Your chat product needs grounding in current information. Building a continuous-crawl + vector pipeline costs months of work and ongoing infra spend — for a feature that's only valuable when the data is fresh.

Why this is hard to build yourself

  • Vector indexes need re-embedding every time the source moves, which is constantly.
  • Cold storage of HTML, embeddings, and metadata stacks up quickly.
  • You're paying to refresh content most users never query.

How it works with Pellumin

  • Call Smart Search per turn for grounded answers, fetching live web content only when the user actually asks.
  • For deeper questions, escalate to Deep Search and cache the result for 24 hours.
  • No vector DB, no crawler, no embedding cost — you only pay for what users ask about.

Architecture

  1. 1User sends a chat message
  2. 2Your service routes: factual lookup → Smart Search (~2s), multi-faceted topic → Deep Search (~20s)
  3. 3Pass the cited answer plus the source array into your AI's context window
  4. 4Render response with the source cards rendered in the UI

Implementation (TypeScript)

const API = "https://api.pellumin.com";
const KEY = process.env.PELLUMIN_API_KEY!;

async function groundedChat(turn: string) {
  // 1. Get a cited answer from the live web.
  const sr = await fetch(
    `${API}/api/summary?q=${encodeURIComponent(turn)}`,
    { headers: { Authorization: `Bearer ${KEY}` } }
  );
  const grounding = await sr.json();

  // 2. Inline the answer + sources into your AI's context.
  return {
    reply: grounding.answer,
    sources: grounding.results,
  };
}

const out = await groundedChat("What changed in TLS 1.3 vs 1.2?");

Outcomes

Infra to maintainZero (just the API)
Data freshnessReal-time, per-query
Cost per chat turn≈ $0.016
Citations renderedInline + source list

Content & SEO

Briefing material your writers actually use

Hand every writer a structured, cited brief — generated in 20 seconds.

Built for
Content teamsSEO leadsEditorial managers
Brief-writing time60 min
With Pellumin5 min

The problem

Your writers spend the first hour of every article doing the same shallow research — five tabs of competitor posts, three news pieces, and a Wikipedia skim. The output varies wildly; some briefs are great, most are thin.

Why this is hard to build yourself

  • Manual research is the slowest, least-loved part of the workflow.
  • Outsourced research has unpredictable quality and review burden.
  • Templated AI tools regurgitate content from training data, missing this week's developments.

How it works with Pellumin

  • Drop the working title into Deep Search.
  • Receive a structured Markdown brief with sub-topics, sources, and citations.
  • Paste straight into the writer's doc, or post to Slack.

Architecture

  1. 1Writer adds working title to a Google Sheet / Notion DB
  2. 2A nightly cron calls Deep Search for each new title
  3. 3Markdown brief is appended to the row, with source URLs as a list
  4. 4Writer opens the brief, starts with research already done

Implementation (Python)

import os, httpx, json, sys

API = "https://api.pellumin.com"
KEY = os.environ["PELLUMIN_API_KEY"]

def brief(title: str) -> str:
    r = httpx.get(
        f"{API}/api/research",
        params={"q": title},
        headers={"Authorization": f"Bearer {KEY}"},
        timeout=120,
    )
    data = r.json()
    out = [f"# Working brief — {data['query']}", ""]
    out.append(data["answer"])
    out.append("\n## Sources to cite")
    for src in data["results"]:
        out.append(f"- [{src['title']}]({src['url']})")
    return "\n".join(out)

print(brief(sys.argv[1]))

Outcomes

Time to brief≈ 20 seconds
Cost per brief≈ $0.08
Citations included10+ ranked sources
ConsistencySame shape every brief

Market Intelligence

A daily intelligence digest, on autopilot

Wake up to a digest of competitors, pricing changes, and industry moves.

Built for
FoundersPMsStrategy & ops
Morning routine45 min
With Pellumin5 min

The problem

You want to know what's happening in your space without spending an hour every morning on it. Most monitoring tools either bury you in noise or stay too high-level to be useful.

Why this is hard to build yourself

  • Generic news aggregators surface everything except the signal you actually need.
  • Custom monitoring stacks are brittle: keyword-only matches miss context.
  • Synthesizing a digest still takes 30+ minutes of human time.

How it works with Pellumin

  • Define a small set of recurring research questions (competitors, regulatory shifts, pricing).
  • Schedule a daily job that runs Deep Search for each.
  • Receive synthesized, cited briefs by email or Slack every morning.

Architecture

  1. 1Cron job (or GitHub Action) runs daily at 7 AM
  2. 2For each topic in your watchlist → Deep Search
  3. 3Reports concatenated into a single Markdown digest
  4. 4Posted to a Slack channel / sent via email

Implementation (Python)

# scripts/daily_intel.py — run from cron
import os, datetime, httpx

API = "https://api.pellumin.com"
KEY = os.environ["PELLUMIN_API_KEY"]

TOPICS = [
    "latest pricing updates from our top 5 competitors",
    "industry hiring trends this week",
    "regulatory changes affecting our market",
]

def section(topic: str) -> str:
    r = httpx.get(f"{API}/api/research", params={"q": topic},
                  headers={"Authorization": f"Bearer {KEY}"}, timeout=120)
    data = r.json()
    return f"## {topic}\n\n{data['answer']}\n"

today = datetime.date.today().isoformat()
print(f"# Market Intelligence — {today}\n")
for t in TOPICS:
    print(section(t))

Outcomes

SetupSingle script + cron
Daily cost≈ $0.24 (3 topics)
Time saved30 min / day
FormatSame Markdown every day

Internal Tools

Live web search for your internal copilots

Give every internal tool — dashboards, briefings, ops — a current-information superpower.

Built for
ITInternal platform teamsOps & analytics
Time to liveWeeks
With PelluminAn hour

The problem

Internal tools at most companies are bricks: stale data, manual lookups, and a dozen tabs open. Embedding live web search makes them feel modern — but routing through your model provider, paying their search markup, and dealing with their rate limits is fiddly.

Why this is hard to build yourself

  • Internal traffic spikes irregularly (Mon morning everyone checks the dashboard).
  • Cost predictability matters: finance wants to know what each dashboard costs.
  • Compliance teams need to see what was queried, when, by whom.

How it works with Pellumin

  • Add Smart Search to any dashboard with a single endpoint call.
  • Per-workspace usage logs make finance and compliance happy.
  • Per-credit pricing makes cost forecasts predictable.

Architecture

  1. 1Internal tool calls Smart Search through a thin server-side proxy
  2. 2Workspace audit logs every query (visible to admins)
  3. 3Dashboard renders the cited answer + source cards

Implementation (TypeScript)

// Next.js Route Handler (server-side)
import { NextRequest, NextResponse } from "next/server";

const API = "https://api.pellumin.com";
const KEY = process.env.PELLUMIN_API_KEY!;

export async function GET(req: NextRequest) {
  const q = req.nextUrl.searchParams.get("q") ?? "";
  const r = await fetch(
    `${API}/api/summary?q=${encodeURIComponent(q)}`,
    { headers: { Authorization: `Bearer ${KEY}` } }
  );
  return NextResponse.json(await r.json());
}

Outcomes

Audit logPer-user, per-call
Cost predictabilityFlat per credit
SetupServer route only
Key safetyStays server-side

One API. Pick the pattern that fits.

1,000 free credits a month — no card required.

Start free →