close

DEV Community

AranaDeDoros
AranaDeDoros

Posted on

What Happens When One Parallel Call Fails? Structured Concurrency in Scala

When building backend systems, we often fan out to multiple services in parallel:

  • Price providers
  • Recommendation engines
  • Search indexes
  • Payment gateways

The real question isn't "How do I run these in parallel?"

It's:

  • What happens if one fails?
  • What happens if one times out?
  • Do retries leak work?
  • Can I keep partial results?

I explored this while building a small price aggregation simulator in Scala 3 using Cats Effect.


The scenario

The setup was simple:

  • Multiple providers
  • Each returns a price
  • We call them in parallel
  • We aggregate the results

But the interesting part wasn't parallelism.

It was failure semantics.

The Core Abstraction

trait PriceProvider[F[_]]:
  def name: ProviderName
  def fetchPrice(productId: ProductId): F[Price]
Enter fullscreen mode Exit fullscreen mode

Providers may fail.

One specific failure is modeled explicitly:

sealed trait ProviderError extends Throwable
final case class TimeOutError(provider: String,time: LocalDateTime) 
extends ProviderError(s"$provider timed out", time)
Enter fullscreen mode Exit fullscreen mode

Timeouts are not transport hacks.

They are part of the domain.

Retry as a Decorator

Instead of baking retry into the provider, I wrapped providers with a retry policy:

def retryOnTimeout[A](
  fa: IO[A],
  maxRetries: Int,
  delay: FiniteDuration
): IO[A] =

  def loop(attempt: Int): IO[A] =
    fa.handleErrorWith {
      case _: TimeOutError if attempt < maxRetries =>
        IO.sleep(delay) *> loop(attempt + 1)
      case e =>
        IO.raiseError(e)
    }

  loop(0)
Enter fullscreen mode Exit fullscreen mode

Then composed it:

final class RetryingProvider(
  underlying: PriceProvider[IO],
  maxRetries: Int,
  delay: FiniteDuration
) extends PriceProvider[IO]:

  override def fetchPrice(productId: ProductId): IO[Price] =
    retryOnTimeout(
      underlying.fetchPrice(productId),
      maxRetries,
      delay
    )
Enter fullscreen mode Exit fullscreen mode

Important detail:

  • Only timeouts retry
  • Other failures propagate immediately
  • The underlying provider remains untouched

Retry is a policy layer, not embedded behavior.

Aggregation Strategy #1 — Fail Fast

def fetchAll(
  providers: List[PriceProvider[IO]],
  productId: ProductId
): IO[List[Price]] =
  providers.parTraverse(_.fetchPrice(productId))
Enter fullscreen mode Exit fullscreen mode

Semantics:

  • All providers run in parallel
  • If one fails, the whole operation fails
  • Remaining providers are cancelled

This is strict and correct when all results are required.

Aggregation Strategy #2 — Keep Partial Results

Sometimes partial success is acceptable.

So instead of failing the entire computation, we capture results explicitly:

def fetchSafe(
  provider: PriceProvider[IO],
  productId: ProductId
): IO[ProviderResult] =
  provider.fetchPrice(productId)
    .map(price =>
      ProviderResult.Success(provider.name, price)
    )
    .handleError {
      case e: ProviderError =>
        ProviderResult.Failure(provider.name, e)
    }
Enter fullscreen mode Exit fullscreen mode

Then aggregate:

def fetchAllPartial(
  providers: List[PriceProvider[IO]],
  productId: ProductId
): IO[List[ProviderResult]] =
  providers.parTraverse(p => fetchSafe(p, productId))
Enter fullscreen mode Exit fullscreen mode

Now:

  • All providers run in parallel
  • Failures don't cancel siblings
  • Partial results are preserved
  • The operation always completes

Different policy. Same building blocks.

Optional: Enforcing a Quorum

You can even require a minimum number of successes:

def requireAtLeast(
  n: Int,
  results: List[ProviderResult]
): IO[List[(ProviderName, Price)]] =
  val successes = results.collect {
    case ProviderResult.Success(p, price) => (p, price)
  }

  if successes.size >= n then IO.pure(successes)
  else IO.raiseError(new RuntimeException("Not enough providers"))
Enter fullscreen mode Exit fullscreen mode

Now the system supports:

  • Fail-fast
  • Partial aggregation
  • Quorum-based acceptance

All composed explicitly.

What This Experiment Revealed

The interesting part wasn't syntax. It was separation of concerns.

The system has three independent policy layers:

1. Failure classification (timeouts vs other errors)

2. Retry behavior (decorator)

3. Aggregation strategy (fail-fast vs partial vs quorum)

None of them are tangled together.

That's the real value.

  • When to retry
  • When to cancel
  • When partial results are acceptable
  • When to fail the whole operation

The abstraction you choose determines how clearly you can express those decisions.

And that's a backend engineering concern, not a language feature.

Top comments (0)