How a global fintech cut tier-1 support cost by 42% with HITL routing

Last summer, our tier-1 support queue was running 4,200 tickets a week. The team was great, but it was also burning out. Average handle time was creeping up. CSAT was creeping down. Our BPO contract was up for renewal at a price that, frankly, didn’t match the value we were getting.

We had tried two off-the-shelf chatbots before. Both fell over the moment a ticket left the happy path. So when our CEO floated “let’s try one more thing, with agents this time,” I was skeptical — but interested.

Where we started

We wrote down what we actually wanted. Not “AI strategy.” A list:

Don’t lock us into one model. We’ve watched too many vendor races to commit for five years.
Don’t store user tokens. Our security team would not negotiate on this.
Audit everything. Our auditors want per-request logs they can search.
Let humans approve the risky stuff. Specifically: refunds above $200.

We evaluated seven platforms. Most of them passed three out of four. Thaliq was the only one that passed all four — and the only one that didn’t need a six-week SOW just to prove it.

“Thaliq was the only platform that didn’t ask us to pick a model, a channel, or a vendor to bet on for the next five years. That’s the whole point.”

— Reverb · internal evaluation memo

The shape of the solution

The agent is a few hundred lines of declarative spec. We define it once. It runs the same whether the request comes from our web widget, our WhatsApp business number, or our backend job that sweeps email overnight.

agent.yaml copy

name: tier1-support
model: auto                  # router picks best per request
tools:
- zendesk_create_ticket
- refund_authorize
- kb_search
hitl:
on_refund_above: 200
on_sentiment: frustrated
guardrails:
pii_redaction: true
prompt_injection_screen: true
audit: full
isolation: per_tenant

The interesting line is model: auto. The router picks between Claude Sonnet, GPT-4o, and a smaller fallback based on cost, latency, and capability. We didn’t have to think about that for three months. Then GPT-4o got cheaper and we changed one config line.

What “tools” actually means

Each tool is a real function with a typed schema. The agent doesn’t free-text its way into our Zendesk — it calls a function, the function authenticates as the user (not as Thaliq), and we get an audit trail of every call.

Guardrails and HITL

The honest answer is: we did not trust the agent on day one. We’re a fintech. The phrase “the AI refunded the customer $4,000” is not one we ever want to say.

So we set the refund threshold to $50 for the first two weeks. The agent would draft the refund, but a human in our ops team had to approve it. Every approval — accepted, rejected, modified — fed back into our metrics.

After two weeks, we moved the threshold to $100. After four, to $200. Every escalation is still logged and reviewable. We sleep fine.

What didn’t work the first time

A few things, candidly:

Prompt drift. Our first version of the system prompt was 1,800 words. The agent got polite and useless. We cut it to 240 and added tool descriptions instead.
Knowledge base latency. Our help center was hosted on a vendor whose search endpoint was slow. We mirrored it into a Thaliq RAG index — first response time dropped from 4s to 800ms.
Frustration detection. The first sentiment classifier was over-eager. About 1 in 7 tickets was getting flagged as “frustrated.” We tuned it, and now it’s roughly 1 in 30 — which matches what humans flag.

The numbers

Numbers after 90 days in production, cohort of 12,400 weekly tickets:

71%

auto-resolved (target: 60%)

−42%

cost per resolved ticket

4.7 / 5

post-chat CSAT

−38%

median time-to-first-response

The cost number is the one that closed our internal business case. The CSAT number is the one that closed it with our customer team. “Faster and cheaper” wasn’t going to be enough if customers hated it. They didn’t.

What’s next

We’re moving tier-2 onto the same stack next quarter — billing disputes and account changes. Different tools, same shape. The thing we’re most excited about isn’t the new agent; it’s that we already have the surface, the audit pipeline, and the HITL queue we need.

If you’re evaluating agentic platforms right now, two pieces of unsolicited advice. One: write down your four hard requirements before you take the first demo. Two: ask the vendor how their streams behave when a human steps in. The answer tells you everything.

Mariana Köhler

VP CustOps · Reverb

How a global fintech cut tier-1 support cost by 42% with HITL routing

Where we started

The shape of the solution

What “tools” actually means

Guardrails and HITL

What didn’t work the first time

The numbers

What’s next

Keep reading

Inside the MCP Gateway: typed tool routing across 14 servers

A model router you can read: from rules to a 200-line config

Thaliq v1.4 is GA: resumable streams, plan gating, audit exports