Skip to main content

Error Handling and Retries

What you will learn

  • How Zigflow handles activity failures
  • The default retry policy
  • How to configure retries
  • The try and catch pattern
  • The raise task
  • Current limitations

Activity retries

Activities in Zigflow (HTTP calls, external activity calls, container runs) are retried automatically by Temporal when they fail.

The default retry policy is:

SettingDefault
Initial interval1 second
Backoff coefficient2.0
Maximum interval1 minute
Maximum attempts5

A failing activity retries at 1s, 2s, 4s, 8s and 16s (capped at 60s) before giving up.


Configuring retries

Override the retry policy for a task using metadata.activityOptions.retryPolicy. Durations are expressed as objects with seconds, minutes, hours or days keys:

- callApi:
metadata:
activityOptions:
retryPolicy:
initialInterval:
seconds: 5
backoffCoefficient: 1.5
maximumInterval:
seconds: 30
maximumAttempts: 3
nonRetryableErrorTypes:
- validation-error
call: http
with:
method: post
endpoint: https://api.example.com/process

nonRetryableErrorTypes accepts a list of error type strings. Tasks that fail with those error types are not retried.

Set maximumAttempts: 1 to disable retries entirely for a task.


The try/catch pattern

Use the try task to catch failures and handle them gracefully.

The try block runs as a child workflow. If any task inside it fails, the catch block runs instead.

- fetchUser:
try:
- getUser:
call: http
output:
as:
user: ${ . }
with:
method: get
endpoint: https://api.example.com/users/999
catch:
do:
- handleMissing:
output:
as:
error: ${ . }
set:
message: User not found

The catch block receives the error as its input. The output of the try task is whatever the catch block returns.


The raise task

Use the raise task to throw an explicit error and stop the workflow.

Errors follow the RFC 7807 Problem Details format.

- checkPermission:
switch:
- denied:
when: ${ $data.role != "admin" }
then: rejectRequest
- default:
then: processRequest

- rejectRequest:
raise:
error:
type: https://serverlessworkflow.io/spec/1.0.0/errors/authorization
status: 403
title: Forbidden
detail: Only admin users can perform this action

Standard error types from the Serverless Workflow specification:

TypeStatus
https://serverlessworkflow.io/spec/1.0.0/errors/configuration400
https://serverlessworkflow.io/spec/1.0.0/errors/validation400
https://serverlessworkflow.io/spec/1.0.0/errors/expression400
https://serverlessworkflow.io/spec/1.0.0/errors/authentication401
https://serverlessworkflow.io/spec/1.0.0/errors/authorization403
https://serverlessworkflow.io/spec/1.0.0/errors/timeout408
https://serverlessworkflow.io/spec/1.0.0/errors/communication500
https://serverlessworkflow.io/spec/1.0.0/errors/runtime500

The status shown is the recommended default. Use the HTTP status code that best describes the error.


Current limitations

  • The catch block cannot filter by error type. It catches all errors from the try block.
  • There is no finally equivalent. Clean-up logic must go after the try task in the main do list.

Common mistakes

Expecting retries to continue after the maximum attempt count. Once maximumAttempts is exhausted, the activity fails permanently. The error propagates up. Wrap the task in a try block to handle this case.

Using raise inside a try block and expecting the catch to handle it. A raise inside the try block is caught by the catch block. Use this intentionally only if you want to normalise errors to a consistent format.

Not setting startToCloseTimeout for long-running activities. The default start-to-close timeout is 5 minutes. Long-running activities (such as container executions or waiting on external systems) should increase this via metadata.activityOptions.startToCloseTimeout.