“I'm batch-calling the OpenAI API and keep getting 429 rate-limit errors”
Rate-limit OpenAI calls with a buffer (stop hitting 429s)
Create a buffer pointed at the OpenAI endpoint with a requests-per-second cap, then push your prompts in. Fliq releases them at your rate, in order, retrying 429s for free.
You loop over a few thousand rows and fire an OpenAI completion for each. Halfway through you start eating 429 Too Many Requests, and your Promise.all turns into a mess of ad-hoc sleeps and retries. A Fliq buffer is a token bucket in front of one endpoint: set a per-second limit once, push every request in, and Fliq drains them at exactly that rate — in submission order, one at a time.
The request
Create the buffer with the OpenAI URL, your auth header, and a rate_limit. Then push one item per prompt.
# 1. Create the buffer (do this once)
curl -X POST https://api.fliq.sh/buffers \
-H "Authorization: Bearer fliq_sk_your_token" \
-H "Content-Type: application/json" \
-d '{
"name": "openai-completions",
"url": "https://api.openai.com/v1/chat/completions",
"method": "POST",
"headers": {
"Authorization": "Bearer sk-openai-...",
"Content-Type": "application/json"
},
"rate_limit": 8,
"max_retries": 3,
"backoff": "exponential"
}'
# 2. Push an item per prompt (BUFFER_ID from the create response)
curl -X POST https://api.fliq.sh/buffers/BUFFER_ID/items \
-H "Authorization: Bearer fliq_sk_your_token" \
-H "Content-Type: application/json" \
-d '{
"body": "{\"model\":\"gpt-4o-mini\",\"messages\":[{\"role\":\"user\",\"content\":\"Summarise row 1\"}]}"
}' const FLIQ = { Authorization: "Bearer fliq_sk_your_token", "Content-Type": "application/json" };
// 1. Create the buffer once.
const buf = await (await fetch("https://api.fliq.sh/buffers", {
method: "POST",
headers: FLIQ,
body: JSON.stringify({
name: "openai-completions",
url: "https://api.openai.com/v1/chat/completions",
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
"Content-Type": "application/json",
},
rate_limit: 8, // <= 8 requests/sec to OpenAI
max_retries: 3,
backoff: "exponential",
}),
})).json();
// 2. Push every prompt — Fliq paces the delivery.
for (const row of rows) {
await fetch(`https://api.fliq.sh/buffers/${buf.id}/items`, {
method: "POST",
headers: FLIQ,
body: JSON.stringify({
body: JSON.stringify({
model: "gpt-4o-mini",
messages: [{ role: "user", content: `Summarise ${row.text}` }],
}),
}),
});
} Pushing an item is cheap and returns immediately — you can enqueue thousands without blocking.
What Fliq handles for you
- The rate cap. A per-second token bucket means OpenAI never sees more than
rate_limitrequests/sec. No bursts, noPromise.allstampede. - 429s for free. If the endpoint returns
429, Fliq reschedules the item using theRetry-Afterheader — and it does not count against the item’s retry budget. - In-order, one at a time. At most one request per buffer is in flight; a failing item holds its place and retries with backoff rather than letting later items jump ahead.
- Status at a glance.
GET /buffers/BUFFER_ID/statsreturns the pending/running/completed/failed breakdown across the whole buffer.
Related
- Buffers — token bucket, ordering, and 429 handling in full
- Distributed rate limiting without Redis — how buffers work under the hood
- Drip-feed Shopify bulk updates with a buffer
Reference: /docs/buffers