Blazing Fast

Cost Effective

Hi there, I’m Sanjay 👋!

🔭 Working: Lead Software Engineer - working on Spring Boot, Reactive Programming, Microservices, Kafka, Cassandra, Kubernetes, AWS.
🖥️ Interests: I love building cool Software & Systems, Self-Hosting, Gaming
🌱 Learning: Go | Rust | Scala | Design Patterns
💬 Ask me about: Java | Reactive Spring | Containers | AWS
🧑‍🤝‍🧑 Collaboration: Looking to collaborate on several projects over here, check out my GitHub

SQS offers a Secure, Durable, and Highly Available Hosted Queue
Used to integrate and decouple distributed software systems and components
Standard queues support at-least-once message delivery, and FIFO queues support exactly-once message processing and high-throughput mode
Offers Dead Letter Queues

Queue Types: Standard and FIFO
Visibility timeout: wait time after which message is visible again if not deleted after processing
In-Flight messages: messages that are received by the consumer but not deleted
SQS Event Source Mapping: Lambda Service reads items from a SQS & invokes Lambda Function
Enable trigger: Enable or disable SQS-Lambda Integration
Batch size: number of records to send to the function in each batch. Standard: 10,000(max). FIFO: 10(max)
Batch window: wait time (in second) to gather records before invoking the function, applicable for Standard Queue
Reserved Concurrency: the maximum number of concurrent instances allocated to your function.
Maximum Concurrency: the number of concurrent instances of the function that an Amazon SQS event source can invoke.

To avoid message loss, it's consumers responsibility to delete the message after processing
Message remains in queue after it is received, but SQS sets a visibility timeout to prevent other consumer from processing same message again
Default visibility timeout is 30 seconds. Can be set between 0 seconds to 12 hours
Can be set a Queue level or dynamically changed per message
Caution: When using FIFO make sure to use Message GroupId which provides high distribution to avoid blocking processing due to error

Note:

For Standard queues, the visibility timeout isn't a guarantee against receiving a message twice.
FIFO queues allow the producer or consumer to attempt multiple retries: producers can retry send using deduplicationId and consumers doesn't receive messages for same message groupId unless deleted or timed-out

FIVE Instances Lambda Service polls SQS queue every WaitTimeSeconds secs
(defaults to 20s). More info here.*
Function code is invoked when BatchSize number of messages are accumulated or MaximumBatchingWindowInSeconds are passed after receiving first message.
Function invocation is scaled if more messages are available in queue.
Function responds with List of failed messages (if ReportBatchItemFailures is enabled) or Success/Error.
Lambda Service deletes the successfully processed messages.

Note: *There is no clear documentation from AWS regarding SQS polling frequency.

Standard Queues: Lambda uses long polling & reads up to 5 batches to invoke your function.
Up until 06 Nov 2023, the Lambda was adding up to 60 concurrent executions/minute, scaling up to a maximum of 1,250 concurrent executions in approximately 20 minutes.
Now Lambda functions can scale up to 5x faster, adding up to 300 concurrent executions/minute.
Even at peak performance, the Maximum number of messages processed concurrently by Lambda = 10 to 10K messages in a batch x 1250 executions = 12.5K to 12.5M
FIFO queues: Lambda sends messages to your function in the order that it receives them and ensures that messages in the same group are delivered to Lambda in order.
Lambda sorts the messages into groups and sends only one batch at a time for a group.
Your function can scale in concurrency to the number of active message groups.

Implement a Dynamic SQS Poller, based on Number of Messages.
Convert Message to POJO & Concurrently Invoke Processing Logic.
For FIFO Queues, make sure to process messages from same GroupId in Sequence, while processing the Groups Concurrently.
Categorize Exceptions into Retryable & Non-Retryable.
Implement Error Handling, Retries and Timeout.
Publish the Failed Messages Directly to the DLQ and Mark it as Processed.
Collect the Processed messages & Delete those in Batch.
Implement Listener Health-check - when Listener stops polling/processing, set the health to DOWN.