AWS Lambda Explained: Pricing, Cold Starts, and When to Use It

June 4, 20267 min readAWS LambdaServerlessAWSCold Starts

AWS Lambda is the compute layer that makes “serverless” serverless: you ship a function, AWS runs it on demand, and you pay only while it’s actually executing. Here’s what it is, how the pricing really works, why cold starts happen, and where Lambda fits — and doesn’t.

What AWS Lambda is

Lambda runs your code in response to events — an HTTP request, a queue message, a schedule — without any server for you to provision or manage. You hand AWS a function and a bit of config (memory, timeout); it handles the machine, the OS, scaling, and patching. When ten thousand requests arrive at once, Lambda runs many copies in parallel; when none arrive, it runs nothing and charges nothing.

That last property is the whole point. There’s no instance sitting idle between requests, which is why a Lambda-based app scales to zero and underpins the near-zero idle cost of a serverless stack.

How the pricing actually works

Lambda bills on two axes, and understanding both demystifies the bill:

Requests — a flat charge per invocation (currently $0.20 per million).
Duration — measured in GB-seconds: how long your function ran, multiplied by the memory you allocated. A function with more memory costs more per second but also gets proportionally more CPU, so it often finishes faster — meaning more memory is sometimes cheaper overall.

There’s also a standing free tier — on the order of a million requests and 400,000 GB-seconds a month — that doesn’t expire after your first year. For a low-traffic app, that free tier alone often covers the entire compute bill, which is why early-stage serverless costs round to the price of the domain’s hosted zone. (We break the full bill down in what it costs to run a serverless SaaS.)

Cold starts, explained

The most-discussed Lambda quirk. When a request arrives and no warm copy of your function is available, Lambda has to create a new execution environment — download your code, start the runtime, run your initialization — before handling the request. That first-request latency is a cold start. Subsequent requests reuse the warm environment and skip it.

What actually moves the needle on cold starts:

Runtime and package size. Lean Node.js or Python functions cold-start in tens to low-hundreds of milliseconds; a bloated bundle or a heavy framework is what turns a cold start into something users notice. Keep the deployment package small.
Initialization work. Code outside your handler runs once per cold start. Establishing a database pool or loading a large config there is fine; doing expensive work that could be lazy is not.
VPC attachment used to be a major cold-start tax; modern Lambda networking has largely fixed it, but staying out of a VPC when you don’t need one is still simplest.
Provisioned Concurrency keeps a set number of environments warm for a fee — worth it for latency-critical paths, unnecessary for most apps, and it does reintroduce a small always-on cost.

For the large majority of SaaS workloads, cold starts on a lean function are a non-issue — a few hundred milliseconds on an occasional first request, invisible behind a CDN for static assets.

Where Lambda fits

Lambda shines for spiky, request-driven, or event-driven work: HTTP APIs, server-rendered pages, webhook handlers, scheduled jobs, queue and stream processors, glue between AWS services. Anything that’s bursty or mostly idle is a near-perfect fit, because you’re billed for the bursts and nothing for the idle.

Where it doesn’t

Long-running work. Lambda caps a single invocation at 15 minutes. For longer jobs, use Step Functions to orchestrate, or a container service.
Sustained, high, steady throughput. If a function runs flat-out 24/7, an always-on container (ECS/Fargate) can be cheaper than per-request billing. The crossover only matters at consistently high load.
Specialized or heavy runtimes — large ML models, GPU work, or anything needing more than Lambda’s memory/compute or local-storage limits.
Ultra-low-latency tails where even an occasional cold-start millisecond budget is unacceptable without paying for Provisioned Concurrency.

How it fits the SaaS stack

In a serverless SaaS, Lambda is the workhorse behind nearly everything that runs code: the Hono API, the server-rendered web app, the billing webhook handler, and background jobs — each a function behind an API Gateway HTTP API, with CloudFront in front for caching and static assets. Pair that with Aurora DSQL for data and you have a stack with no always-on compute at all, which is exactly why idle costs almost nothing.

The bottom line

Lambda trades a little control — runtime limits, the occasional cold start — for an enormous operational win: no servers, automatic scaling, and a bill that tracks usage instead of capacity. For the request-driven workloads most SaaS products are built from, that trade is overwhelmingly worth it, and it’s why Lambda is the default compute layer in the stack we build on.

Skip the wiring and start from a working stack

cdkbase is a fork-ready AWS serverless template that ships everything in this article — CDK infrastructure, Cognito auth, Aurora DSQL, a Hono API, Stripe billing, and web/SPA/mobile frontends — already wired together and built for Claude Code. See pricing or read the getting-started guide.