Blazing Fast
Cost Effective
& Resilient
AWS SQS Listener
Hi there, I’m Sanjay 👋!
About me
- 🔭 Working: Lead Software Engineer - working on Spring Boot, Reactive Programming, Microservices, Kafka, Cassandra, Kubernetes, AWS.
- 🖥️ Interests: I love building cool Software & Systems, Self-Hosting, Gaming
- 🌱 Learning:
Go | Rust | Scala | Design Patterns
- 💬 Ask me about: Java | Reactive Spring | Containers | AWS
- 🧑🤝🧑 Collaboration: Looking to collaborate on several projects over here, check out my GitHub
What is AWS SQS
- SQS offers a Secure, Durable, and Highly Available Hosted Queue
- Used to integrate and decouple distributed software systems and components
- Standard queues support at-least-once message delivery, and FIFO queues support exactly-once message processing and high-throughput mode
- Offers Dead Letter Queues
SQS Concepts w.r.t Lambda
- Queue Types: Standard and FIFO
- Visibility timeout: wait time after which message is visible again if not deleted after processing
- In-Flight messages: messages that are received by the consumer but not deleted
- SQS Event Source Mapping: Lambda Service reads items from a SQS & invokes Lambda Function
- Enable trigger: Enable or disable SQS-Lambda Integration
- Batch size: number of records to send to the function in each batch. Standard: 10,000(max). FIFO: 10(max)
- Batch window: wait time (in second) to gather records before invoking the function, applicable for Standard Queue
- Reserved Concurrency: the maximum number of concurrent instances allocated to your function.
- Maximum Concurrency: the number of concurrent instances of the function that an Amazon SQS event source can invoke.
Message Lifecycle
SQS Visibility Timeout
- To avoid message loss, it's consumers responsibility to delete the message after processing
- Message remains in queue after it is received, but SQS sets a visibility timeout to prevent other consumer from processing same message again
- Default visibility timeout is 30 seconds. Can be set between 0 seconds to 12 hours
- Can be set a Queue level or dynamically changed per message
- Caution: When using FIFO make sure to use Message GroupId which provides high distribution to avoid blocking processing due to error
Note:
- For Standard queues, the visibility timeout isn't a guarantee against receiving a message twice.
- FIFO queues allow the producer or consumer to attempt multiple retries: producers can retry send using deduplicationId and consumers doesn't receive messages for same message groupId unless deleted or timed-out
SQS Lambda Integration Architecture
- FIVE Instances Lambda Service polls SQS queue every WaitTimeSeconds secs
(defaults to 20s). More info here.*
- Function code is invoked when BatchSize number of messages are accumulated or MaximumBatchingWindowInSeconds are passed after receiving first message.
Function invocation is scaled if more messages are available in queue.
- Function responds with List of failed messages (if ReportBatchItemFailures is enabled) or Success/Error.
- Lambda Service deletes the successfully processed messages.
Note: *There is no clear documentation from AWS regarding SQS polling frequency.
Lambda Scaling
- Standard Queues: Lambda uses long polling & reads up to 5 batches to invoke your function.
- Up until 06 Nov 2023, the Lambda was adding up to 60 concurrent executions/minute, scaling up to a maximum of 1,250 concurrent executions in approximately 20 minutes.
- Now Lambda functions can scale up to 5x faster, adding up to 300 concurrent executions/minute.
- Even at peak performance, the Maximum number of messages processed concurrently by Lambda = 10 to 10K messages in a batch x 1250 executions = 12.5K to 12.5M
- FIFO queues: Lambda sends messages to your function in the order that it receives them and ensures that messages in the same group are delivered to Lambda in order.
- Lambda sorts the messages into groups and sends only one batch at a time for a group.
- Your function can scale in concurrency to the number of active message groups.
How can we do this ourselves? But Better!
- Implement a Dynamic SQS Poller, based on Number of Messages.
- Convert Message to POJO & Concurrently Invoke Processing Logic.
- For FIFO Queues, make sure to process messages from same GroupId in Sequence, while processing the Groups Concurrently.
- Categorize Exceptions into Retryable & Non-Retryable.
- Implement Error Handling, Retries and Timeout.
- Publish the Failed Messages Directly to the DLQ and Mark it as Processed.
- Collect the Processed messages & Delete those in Batch.
- Implement Listener Health-check - when Listener stops polling/processing, set the health to DOWN.