Blazing Fast

Cost Effective

& Resilient

AWS SQS Listener

Hi there, I’m Sanjay 👋!

LinkedIn Badge Website Badge Twitter Badge Github Badge Stack Overflow

About me

  • 🔭 Working: Lead Software Engineer - working on Spring Boot, Reactive Programming, Microservices, Kafka, Cassandra, Kubernetes, AWS.
  • 🖥️ Interests: I love building cool Software & Systems, Self-Hosting, Gaming
  • 🌱 Learning: Go | Rust | Scala | Design Patterns
  • 💬 Ask me about: Java | Reactive Spring | Containers | AWS
  • 🧑‍🤝‍🧑 Collaboration: Looking to collaborate on several projects over here, check out my GitHub

Languages, Frameworks and Platforms

Java Spring Kotlin Project Reactor Kafka Cassandra AWS Kubernetes

What is AWS SQS

  • SQS offers a Secure, Durable, and Highly Available Hosted Queue
  • Used to integrate and decouple distributed software systems and components
  • Standard queues support at-least-once message delivery, and FIFO queues support exactly-once message processing and high-throughput mode
  • Offers Dead Letter Queues

SQS Concepts w.r.t Lambda

  • Queue Types: Standard and FIFO
  • Visibility timeout: wait time after which message is visible again if not deleted after processing
  • In-Flight messages: messages that are received by the consumer but not deleted
  • SQS Event Source Mapping: Lambda Service reads items from a SQS & invokes Lambda Function
  • Enable trigger: Enable or disable SQS-Lambda Integration
  • Batch size: number of records to send to the function in each batch. Standard: 10,000(max). FIFO: 10(max)
  • Batch window: wait time (in second) to gather records before invoking the function, applicable for Standard Queue
  • Reserved Concurrency: the maximum number of concurrent instances allocated to your function.
  • Maximum Concurrency: the number of concurrent instances of the function that an Amazon SQS event source can invoke.

Message Lifecycle

sqs-message-lifecycle

SQS Visibility Timeout

SQS Visibility Timeout
  • To avoid message loss, it's consumers responsibility to delete the message after processing
  • Message remains in queue after it is received, but SQS sets a visibility timeout to prevent other consumer from processing same message again
  • Default visibility timeout is 30 seconds. Can be set between 0 seconds to 12 hours
  • Can be set a Queue level or dynamically changed per message
  • Caution: When using FIFO make sure to use Message GroupId which provides high distribution to avoid blocking processing due to error
    Note:
  • For Standard queues, the visibility timeout isn't a guarantee against receiving a message twice.
  • FIFO queues allow the producer or consumer to attempt multiple retries: producers can retry send using deduplicationId and consumers doesn't receive messages for same message groupId unless deleted or timed-out

SQS Lambda Integration Architecture

SQS_Lambda_Architecture
  1. FIVE Instances Lambda Service polls SQS queue every WaitTimeSeconds secs
    (defaults to 20s). More info here.*
  2. Function code is invoked when BatchSize number of messages are accumulated or MaximumBatchingWindowInSeconds are passed after receiving first message.
    Function invocation is scaled if more messages are available in queue.
  3. Function responds with List of failed messages (if ReportBatchItemFailures is enabled) or Success/Error.
  4. Lambda Service deletes the successfully processed messages.

Note: *There is no clear documentation from AWS regarding SQS polling frequency.

Lambda Scaling

  • Standard Queues: Lambda uses long polling & reads up to 5 batches to invoke your function.
  • Up until 06 Nov 2023, the Lambda was adding up to 60 concurrent executions/minute, scaling up to a maximum of 1,250 concurrent executions in approximately 20 minutes.
  • Now Lambda functions can scale up to 5x faster, adding up to 300 concurrent executions/minute.
  • Even at peak performance, the Maximum number of messages processed concurrently by Lambda = 10 to 10K messages in a batch x 1250 executions = 12.5K to 12.5M
  • FIFO queues: Lambda sends messages to your function in the order that it receives them and ensures that messages in the same group are delivered to Lambda in order.
  • Lambda sorts the messages into groups and sends only one batch at a time for a group.
  • Your function can scale in concurrency to the number of active message groups.

Faster polling scale-up for AWS Lambda

How can we do this ourselves? But Better!

  1. Implement a Dynamic SQS Poller, based on Number of Messages.
  2. Convert Message to POJO & Concurrently Invoke Processing Logic.
  3. For FIFO Queues, make sure to process messages from same GroupId in Sequence, while processing the Groups Concurrently.
  4. Categorize Exceptions into Retryable & Non-Retryable.
  5. Implement Error Handling, Retries and Timeout.
  6. Publish the Failed Messages Directly to the DLQ and Mark it as Processed.
  7. Collect the Processed messages & Delete those in Batch.
  8. Implement Listener Health-check - when Listener stops polling/processing, set the health to DOWN.

Code

Questions?

slide qr code      profile qr code