Enterprise AI Thinking
A Framework for
Getting Enterprise AI Right
AI adoption in enterprises fails more often than it succeeds — not because the technology doesn't work,
but because the organisational, data, and security foundations aren't in place. This page covers the
core questions every enterprise must answer and the horizontal capabilities that underpin serious AI deployments.
01 — Getting Started
Where Do We Start?
Most AI initiatives fail at the first step: choosing the wrong problem. The right entry point balances
near-term value with long-term capability building — and it starts with honest assessment, not enthusiasm.
Not every problem needs AI. Start by listing the 10–15 repetitive, high-effort tasks in your organisation
where the cost of a wrong answer is low and the volume is high. These are your best pilot candidates.
Avoid starting with high-stakes decisions (credit, compliance, HR) until you have operational experience.
Use a frontier model API (GPT-4, Claude, Gemini) when speed matters and your data isn't sensitive.
Fine-tune an open-weight model (Llama, Mistral) when you need data control or cost at scale.
Build from scratch only for competitive-moat use cases where the model behaviour is deeply proprietary.
Most enterprises should start with APIs and migrate selectively.
Quick wins (document summarisation, internal Q&A, meeting notes) deliver ROI within weeks and build
internal confidence. Strategic bets (agentic workflows, customer-facing AI, decision automation) need
12–18 months of infrastructure and trust-building. Portfolio both. Don't let quick wins become the
permanent ceiling.
AI readiness isn't about GPU clusters. It's about: (1) clean, accessible data in the relevant domain,
(2) at least one technically literate champion in the business team, (3) a governance structure that
can say "no" to a use case, and (4) executive tolerance for a 6-month iteration cycle before
measuring ROI.
02 — Data Privacy
What About Data Privacy?
Enterprise data is fundamentally different from public internet data. Before any AI deployment touches
real business information, the data governance questions need clear answers — legally, contractually,
and architecturally.
India's Digital Personal Data Protection (DPDP) Act, sector regulations (RBI, SEBI, IRDAI), and
customer contracts may restrict where data can be processed. Audit which data crosses borders when
you call a cloud LLM API. Many enterprises are surprised to find their contracts prohibit sending
certain data to US-based inference endpoints.
Read the data processing addendum, not just the privacy policy. Key questions: Is my data used to
train future models? Are prompts logged, and for how long? What is the data retention policy on
conversation history? Enterprise-tier agreements from major vendors typically offer opt-outs, but
you must negotiate them — they are not on by default.
A data gateway (see capabilities section) can strip or pseudonymise PII before it leaves your
perimeter. Customer names, Aadhaar numbers, account IDs, and mobile numbers should be replaced
with tokens before hitting any external LLM. The model can still reason; the sensitive identifiers
never leave your network.
For the most sensitive workloads (legal, HR, patient data, trading signals), run models inside
your own infrastructure. Open-weight models like Llama 3 and Mistral can run on a modest 4×A10
cluster or even CPU-only for smaller models. Cloud providers also offer VPC-isolated endpoints
where your data never touches shared infrastructure.
03 — Security
What Are the Security Implications?
AI systems introduce attack surfaces that traditional enterprise security wasn't designed for.
The threats are real, the mitigations are known, but they require deliberate architectural choices.
Prompt injection is the AI equivalent of SQL injection: a malicious input in a document or email
can hijack what your AI agent does. If your AI reads external content (emails, PDFs, web pages)
and then takes actions, it is vulnerable. Mitigations include strict output parsing, privilege
separation, and never letting the AI both read untrusted input and take high-value actions in
the same pipeline step.
Retrieval-augmented generation (RAG) pulls documents into the context window at query time.
If your retrieval layer doesn't enforce document-level access controls, a user can ask a question
that retrieves a document they aren't supposed to see. Apply row-level security in your vector
store, and ensure the retrieval pipeline respects the same permissions as your document system.
Treat AI agents like privileged service accounts, not superusers. Give each agent the minimum
permissions required for its task. Use short-lived credentials for external API calls. Apply
the same IAM policies you'd apply to a junior analyst — they can read, they can draft, but
they can't commit, transfer, or delete without a second confirmation.
Every AI decision that affects a customer or triggers a workflow should be logged: the prompt,
the retrieved context, the model response, and the downstream action. Not for ML debugging —
for compliance, incident response, and regulator conversations. Build structured logging into
your AI layer from day one; retrofitting it later is painful.
04 — From Pilots to Production
How Do We Scale Beyond the Pilot?
Most enterprise AI pilots succeed on their own terms. Most never reach production. The gap isn't
technical — it's organisational, economic, and architectural.
Common causes: (1) No production owner — the pilot was run by IT or a consultant with no
internal champion. (2) Data isn't clean enough at scale — the pilot used curated data.
(3) Integration cost was underestimated — connecting to legacy systems takes 10× longer than
the AI part. (4) The KPI was fuzzy — "better experience" doesn't survive a budget review.
The biggest scaling blockers are human, not technical. Risk and compliance teams need a risk
framework before they'll approve a live deployment. Legal needs liability language for AI
outputs. HR needs a reskilling narrative. Line managers need to trust the output enough to
act on it. These conversations take months and should start during the pilot, not after.
A pilot running 100 queries a day has very different infrastructure requirements from 100,000.
Latency, token costs, rate limits, and model availability become real engineering problems.
Plan for caching, request batching, fallback models, and cost alerting. A single LLM call
can cost ₹0.01 — at 10 million calls a month, that's ₹1 lakh in inference alone.
"Hours saved" is a start, but the CFO wants to see headcount efficiency, error rate reduction,
or revenue per FTE. Pick one KPI per use case, establish a baseline before the pilot, and
measure the same way after. Also track failure modes: hallucination rate, escalation rate,
user override rate. These tell you where the model isn't trusted yet.
05 — Horizontal Capabilities
The Enterprise AI Capabilities Stack
Beyond individual use cases, enterprises need a reusable infrastructure layer. These six capabilities
appear in every serious enterprise AI architecture — they don't belong to any one use case;
they serve all of them.
🔀
Model Routing & Orchestration
Route each request to the right model based on task type, required intelligence, latency, and cost. A question about a PDF header doesn't need GPT-4 — it needs a fast, cheap model. Orchestration layers like LangGraph or custom routers manage multi-step agent workflows.
🚪
Data Gateway
A single controlled layer through which all AI systems access enterprise data. Enforces access policies, strips PII before external calls, logs every data access event, and provides a consistent interface for RAG pipelines, agents, and analytics tools.
🔌
MCP — Model Context Protocol
An emerging open standard (Anthropic-originated, now broadly adopted) that lets AI models connect to tools, databases, and APIs through a standardised interface. Think of it as USB-C for AI integrations — instead of building custom connectors for each tool, you implement MCP once.
🛡️
Security-Friendly Architecture
VPC-isolated inference endpoints, zero-trust networking for agent-to-tool calls, air-gapped deployments for regulated workloads, and secrets management for API keys. AI adds new attack surfaces; the architecture must treat it as a privileged internal service, not a SaaS integration.
📡
Evaluation & Observability
Continuous measurement of model quality in production — not just accuracy on a benchmark, but hallucination rate, citation accuracy, user override rate, and latency percentiles. Tools like LangSmith, Arize, or custom eval harnesses catch model drift before it becomes a business incident.
🤝
Human-in-the-Loop Design
Deliberate insertion of human review at the right points in AI workflows — not everywhere (defeats the purpose) but at high-stakes, low-volume decision points. The design question is: when does the cost of a wrong AI answer exceed the cost of a human review? Build the answer into the workflow.
Who This Is For
The Right Audience for This Conversation
Enterprise AI decisions are made by a specific set of people — and resisted by others.
This is for
- CEOs and MDs setting the AI agenda
- CIOs and CTOs evaluating architecture choices
- Business leaders owning a high-value use case
- Risk, compliance, and legal heads stress-testing AI
- Strategy and transformation teams running pilots
Not the focus
- Developers building AI applications (different conversation)
- Data scientists optimising model performance
- Vendors pitching AI products
- Those looking for a specific tool recommendation
About
Swapnil Pawar works at the intersection of quantitative finance, AI systems, and enterprise strategy.
He has built and deployed AI applications across financial services, legal, and research domains —
and has advised senior leadership teams on how to think about AI adoption at scale.
His background spans investment management, management consulting, and entrepreneurship.
He holds a degree from IIT Bombay and an MBA from IIM Ahmedabad.
This page reflects frameworks developed through engagements with enterprise teams — not vendor
material, not academic theory. It is updated as the landscape evolves.