Why We Built Fliq: The Case Against Self-Hosted Job Queues

We didn’t set out to build a product. We set out to learn Go.

In late 2025, my co-founder and I wanted to build something real with Go — not a toy project, but something that would force us to deal with concurrency, database transactions, graceful shutdowns, and all the other things you only learn by actually shipping production software.

We picked a distributed job scheduler because it hit all those requirements. Schedule HTTP requests. Execute them on time. Retry on failure. Simple in concept, deeply complex in implementation.

Somewhere along the way, we realized we were building something people actually needed.

The pain we kept seeing

Both of us had spent years building web applications. And in every project, at some point, someone would say: “We need to run this thing later.”

Send a follow-up email in 3 days. Retry a failed webhook in 5 minutes. Expire a session after 24 hours. Check if a payment cleared tomorrow morning. Archive old records every night.

The patterns are universal. The solutions are not.

The self-hosted path

In most teams, “run this later” turns into a ticket for the infrastructure team. The conversation goes something like:

“Let’s just use cron.” Works for simple recurring jobs. Breaks down when you need dynamic scheduling, per-user jobs, or retry logic.
“Let’s add Redis + BullMQ.” Now you’re running a Redis instance. You need to monitor it, scale it, handle failures when Redis goes down. Your application code is now tightly coupled to a specific queue library.
“Let’s use Celery/Sidekiq.” Same problems as BullMQ, but now with more moving parts. Celery needs a message broker (Redis or RabbitMQ) and a result backend (Redis or PostgreSQL). That’s three services for “run this later.”
“Let’s use AWS Step Functions.” Powerful, but the JSON state machine DSL is painful to write and debug. Costs add up fast. And now your application logic lives in AWS instead of your codebase.

Every path adds infrastructure. Every piece of infrastructure needs monitoring, scaling, and on-call coverage.

The insight: it’s just HTTP

The realization that changed everything was simple: most background jobs are just HTTP requests that need to happen later.

Think about it:

Send an email — POST to your email API
Retry a webhook — POST to the webhook URL again
Expire a trial — POST to your billing endpoint
Generate a report — POST to your report generation endpoint

You don’t need a sophisticated message queue for this. You don’t need a task graph. You don’t need distributed state machines. You need someone to call a URL at a specific time, and retry if it fails.

That’s Fliq in one sentence.

What Fliq actually is

Fliq is an HTTP workflow engine. You tell it:

What URL to call
When to call it (specific time or cron expression)
What to send (method, headers, body)
How to handle failures (retry count)

And it handles the rest: firing the request on time, retrying on failure with exponential backoff, and recording the full execution history.

curl -X POST https://api.fliq.sh/v1/jobs \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-app.com/api/send-reminder",
    "method": "POST",
    "body": "{\"userId\": \"user_123\", \"type\": \"trial-expiry\"}",
    "headers": {"Content-Type": "application/json"},
    "scheduled_at": "2026-03-30T10:00:00Z",
    "max_retries": 3
  }'

That’s it. One API call. No Redis, no queue workers, no infrastructure.

Why HTTP beats message queues for most use cases

Message queues (Kafka, RabbitMQ, SQS) are designed for high-throughput, ordered message processing between tightly coupled services. They’re the right tool when you need:

Ordered processing guarantees
Fan-out to multiple consumers
Back-pressure and flow control
Exactly-once delivery semantics

But most background jobs don’t need any of that. They need: “Call this URL at this time. Retry if it fails.”

HTTP-based scheduling has several advantages for this use case:

1. Universal compatibility

Every framework, every language, every platform can receive HTTP requests. Your job handler is just an API endpoint. There’s no SDK to install, no queue library to learn, no consumer process to run.

// This is a Fliq job handler. It's also just a normal API route.
export async function POST(request: Request) {
  const { userId } = await request.json();
  await sendReminderEmail(userId);
  return Response.json({ sent: true });
}

2. Works with serverless

Serverless functions (Vercel, Cloudflare Workers, AWS Lambda) can’t run persistent queue consumers. They’re designed for request-response. Fliq works perfectly with serverless because it sends HTTP requests — which is exactly what serverless functions are built to receive.

3. No infrastructure to manage

No Redis instance. No RabbitMQ cluster. No Kafka topics. No consumer groups. No dead letter queues. Fliq is a managed service — you make API calls, we handle the infrastructure.

4. Built-in observability

Every HTTP request has a status code, a response body, and a response time. Fliq records all of this for every execution attempt. You get full observability without setting up Prometheus, Grafana, or custom dashboards.

The technical challenges we solved

Building a reliable job scheduler sounds simple. It’s not. Here are some of the harder problems we tackled:

At-least-once execution, idempotency on you

The hardest problem in distributed systems: making sure a job runs even when machines crash and networks fail — without inventing an exactly-once guarantee we can’t keep. We use a FOR UPDATE SKIP LOCKED pattern in PostgreSQL for job claiming. Each worker claims jobs atomically, and a heartbeat + reaper pattern handles crash recovery.

If a worker crashes mid-execution, the reaper detects the missing heartbeat and makes the job available for another worker to claim. The job runs again — so delivery is at-least-once, and every call carries a stable X-Fliq-Delivery-Id so your handler can deduplicate. Make your handlers idempotent.

Fast, indexed pickup

There’s no edge network and no magic. One Postgres database is the source of truth; workers poll an indexed queue and claim due jobs within about two seconds of their fire time. Boring and predictable beats a latency number we’d have to make up.

Two-phase attempt tracking

We open an execution attempt record before making the HTTP call, and close it after. If the worker crashes during the HTTP call, we have a record of the attempt — including the fact that it didn’t complete. This is crucial for debugging and for preventing silent failures.

No HTTP inside transactions

A lesson learned the hard way: never make an HTTP call inside a database transaction. The HTTP call might take seconds (or time out), holding a database connection the entire time. Under load, this exhausts your connection pool and takes down the entire service.

We structure every job execution as: read from DB, close transaction, make HTTP call, write result to DB. The HTTP call happens outside any transaction.

Where we are today

Fliq is in public beta, running on a Postgres-native stack we operate ourselves — no Redis, no Kafka, no invented SLA. We show our live production status on the status page instead of quoting an uptime number we haven’t earned yet. Today:

Free during beta: 100,000 executions/day, no card
Pay-as-you-go: $1 per 100,000 executions, retries included
Open source: the Go backend and dashboard are on GitHub — self-host the whole thing

It’s a fit for teams building SaaS billing flows, email automation, webhook retries, scheduled jobs, and AI agent workflows.

The future: AI-native scheduling

One of the most exciting developments we’re seeing is AI agents that need to schedule actions. An agent might decide: “I should check the stock price in 4 hours and buy if it’s below $150.” Or: “Remind the user about this task tomorrow morning.”

We’re building an MCP server (in beta) so AI agents can schedule Fliq jobs through natural language. The agent won’t need to understand cron expressions or HTTP headers — it describes what should happen and when, and the MCP server translates that into a Fliq API call.

This is where we think the market is heading: infrastructure for the AI internet, where agents and automation need the same scheduling primitives that human-built applications do.

Try it yourself

If you’re tired of managing Redis, debugging Celery, or writing CloudFormation for EventBridge rules, give Fliq a try. The free tier gives you 100,000 executions per day — enough to build and test any scheduling workflow.

Start building with Fliq — free tier, no credit card→