How to improve AWS Java Lambda Performance by 80%

Diogo Peixoto
8 min readMar 4, 2024

--

When developing AWS Java Lambda functions, performance problems often arise. Questions such as how can we decrease startup time to improve Lambda responsiveness during horizontal scaling? How can we increase Lambda throughput across same execution environment calls? are very common.

This article presents some techniques and features in AWS that help us to reduce in approximately 81% the first lambda call, and 73% for the subsequent calls. However, before jumping into them, let's first understand how the lambda function works.

AWS Java Lambda execution

When AWS Java Lambda receives the first request to execute a function, AWS Lambda prepares an execution environment. In order to do that, AWS downloads the lambda code, creates the environment with the runtime, memory, and configuration specified. Then, AWS Lambda runs initialization code outside the event handler method, and finally runs the code inside it [1].

A picture displaying a timeline with cold start and invocation tasks
Image 1: https://aws.amazon.com/blogs/compute/operating-lambda-performance-optimization-part-1/

After completing Lambda request, AWS retains the execution environment instead of destroying it. The time between completion and destruction is determined by many factors that developers cannot control. As the deletion circumstances cannot be controlled by us, we should not rely on reusing the execution environment to have a good execution time.

The process of downloading and starting a new executing environment before executing initialization code is called cold start. The other that doesn't need to do it, is called warm start.

Example

Throughout this article we are going to work with a simple example to demonstrate the concepts presented as well as their improvement metrics.

An image displaying a lambda function reading an object from S3 bucket
Image 2: Lambda reading S3 bucket object

In our example above, we are going to read a S3 object using AWS Java Lambda function. To get the code, check the medium-code repository.

1. Initialize AWS Lambda dependencies outside the handler method

One simple technique that helps to have a better performance across lambda calls is to initialize resources outside the handler method.

Doing that, the resource initialization runs just one time. Good resources candidates to be initialized outside the handler methods are: database clients, AWS resources clients, such as S3Client, and others that don't depend on handler method parameters.

The following sections show the results where we improved performance by approximately ~45% for the warm start, and ~61% for the cold start. It is a massive improvement only changing a couple of lines of code.

Scenario 1: Dependencies inside handler

public class App implements RequestHandler<String, String> {

@Override
public String handleRequest(String input, Context context) {
S3Client s3 = S3Client.builder().region(Region.CA_CENTRAL_1).build();
// Do something
}
}
## Cold invocation
Picked up JAVA_TOOL_OPTIONS: -XX:+TieredCompilation -XX:TieredStopAtLevel=4
START RequestId: 32faf192-1591-4eae-9d20-1dd347760277 Version: $LATEST
END RequestId: 32faf192-1591-4eae-9d20-1dd347760277
REPORT RequestId: 32faf192-1591-4eae-9d20-1dd347760277 Duration: 11407.33 ms Billed Duration: 11408 ms Memory Size: 512 MB Max Memory Used: 175 MB Init Duration: 536.28 ms


## Warm invocation
START RequestId: 00a389b7-03bf-47f4-bb42-03f5cfe992f6 Version: $LATEST
END RequestId: 00a389b7-03bf-47f4-bb42-03f5cfe992f6
REPORT RequestId: 00a389b7-03bf-47f4-bb42-03f5cfe992f6 Duration: 289.39 ms Billed Duration: 290 ms Memory Size: 512 MB Max Memory Used: 176 MB

Scenario 2: Dependencies outside handler

public class App implements RequestHandler<String, String> {

private final S3Client s3 = S3Client.builder().region(Region.CA_CENTRAL_1).build();

@Override
public String handleRequest(String input, Context context) {
// Do something
}
}
## Cold invocation
Picked up JAVA_TOOL_OPTIONS: -XX:+TieredCompilation -XX:TieredStopAtLevel=4
START RequestId: b3c8fcc8-afd6-466e-9ae1-de2022f09dcb Version: $LATEST
END RequestId: b3c8fcc8-afd6-466e-9ae1-de2022f09dcb
REPORT RequestId: b3c8fcc8-afd6-466e-9ae1-de2022f09dcb Duration: 4369.99 ms Billed Duration: 4370 ms Memory Size: 512 MB Max Memory Used: 182 MB Init Duration: 2329.03 ms

## Warm invocation
START RequestId: 82c95df7-b0e9-4436-aa09-f93c61a586b0 Version: $LATEST
END RequestId: 82c95df7-b0e9-4436-aa09-f93c61a586b0
REPORT RequestId: 82c95df7-b0e9-4436-aa09-f93c61a586b0 Duration: 156.52 ms Billed Duration: 157 ms Memory Size: 512 MB Max Memory Used: 180 MB

Results

Table displaying results of dependencies inside and outside handler method.
Table 1: Results of dependencies inside and outside handler method.

In the dependencies outside handler method, the only metric worse than dependencies inside handler is the initialization duration. Because in scenario 2, AWS needs to initialize the S3Client dependency, while in scenario 1, the initialization is performed during lambda handler execution code.

2. Set JVM to use only C1 compiler

The Hotspot is a Java performance engine that executes runtime analysis and employs techniques such as just-in-time (JIT) compilation and adaptive optimization to improve application execution.

A JIT compiler compiles bytecode to native code for frequently executed sections. Hence making a Java application performs as good as compiled language.

Tiered compilation was introduced in JDK7, bringing more control to the developers to choose what level of optimization they want, according to application profile and enabling them to strike a balance between the JIT compilers C1 and C2 [2].

In a Nutshell, C1 is most suitable for short-lived applications as well as applications that need fast startup times. Meanwhile, C2 is better for long-lived applications as well as applications that need overall performance, but it also uses more memory and takes a longer time to achieve it. For more detailed information on C1 and C2, please refer How Tiered Compilation works in OpenJDK [3].

In the AWS Lambda Java 21 runtime, the JVM flag for tiered compilation is set to stop at level 1 by default. whereas for Java 11 runtime and below this is not the default value. However, we can set the tiered compilation level to 1 by adding JAVA_TOOL_OPTIONS environment variable with value -XX:+TieredCompilation -XX:TieredStopAtLevel=1.

An image setting lambda environmnet variable
Image 3: Setting lambda environment variable

So, If you are going to test your Lambda using Java version above 11, you won't see any improvements, because its default value is 1. To verify the improvements, you have to set -XX:TieredStopAtLevel=4, get metrics, and then set it back to -XX:TieredStopAtLevel=1.

Compiler level 4

## Cold invocation
Picked up JAVA_TOOL_OPTIONS: -XX:+TieredCompilation -XX:TieredStopAtLevel=4
START RequestId: 32faf192-1591-4eae-9d20-1dd347760277 Version: $LATEST
END RequestId: 32faf192-1591-4eae-9d20-1dd347760277
REPORT RequestId: 32faf192-1591-4eae-9d20-1dd347760277 Duration: 11407.33 ms Billed Duration: 11408 ms Memory Size: 512 MB Max Memory Used: 175 MB Init Duration: 536.28 ms


## Warm invocation
START RequestId: 00a389b7-03bf-47f4-bb42-03f5cfe992f6 Version: $LATEST
END RequestId: 00a389b7-03bf-47f4-bb42-03f5cfe992f6
REPORT RequestId: 00a389b7-03bf-47f4-bb42-03f5cfe992f6 Duration: 289.39 ms Billed Duration: 290 ms Memory Size: 512 MB Max Memory Used: 176 MB

Compiler level 1

## Cold invocation
Picked up JAVA_TOOL_OPTIONS: -XX:+TieredCompilation -XX:TieredStopAtLevel=1
START RequestId: ccc18a2a-358d-4a51-a93c-1321124f5782 Version: $LATEST
END RequestId: ccc18a2a-358d-4a51-a93c-1321124f5782
REPORT RequestId: ccc18a2a-358d-4a51-a93c-1321124f5782 Duration: 2777.24 ms Billed Duration: 2778 ms Memory Size: 512 MB Max Memory Used: 160 MB Init Duration: 1585.66 ms

## Warm invocation
START RequestId: 844af3be-ce92-4b31-9bf2-376b83b703b2 Version: $LATEST
END RequestId: 844af3be-ce92-4b31-9bf2-376b83b703b2
REPORT RequestId: 844af3be-ce92-4b31-9bf2-376b83b703b2 Duration: 85.72 ms Billed Duration: 86 ms Memory Size: 512 MB Max Memory Used: 163 MB

Results

Table displaying results of compiler level 1 and compiler level 4
Table 2: Results of compiler level 1 and level 4.

As in our example we are seeking for startup time, we set the compiler level to 1, which improved to ~36% cold execution duration and ~45% warm execution duration compared to compiler level 4.

3. Snap Start

Lambda SnapStart for Java can improve startup performance for latency-sensitive applications by up to 10x at no extra cost, typically with no changes to your function code. The largest contributor to startup latency (often referred to as cold start time) is the time that Lambda spends initializing the function, which includes loading the function’s code, starting the runtime, and initializing the function code [4].

Enabling SnapStart allows AWS Lambda to be initialized when the function is published. AWS initializes the execution environment, takes a snapshot of the memory and disk state of the Firecracker VM, encrypts it, and caches it.

When Lambda is invoked for the first time, or when lambda is scaled up, AWS resumes the new execution environment from the cached snapshot instead of initializing from scratch, thus improving startup latency. Activating Lambda SnapStart can be done in the AWS console or using CDK. For more information, refer to AWS docs.

Enabling SnapStart, AWS Lambda took 2130.88 ms to execute for the first time. It is an improvement of ~23% from our previous test. This is due to the FireCracker VM restore that took 626.11 ms compared to 1585.66 ms for a full Lambda initialization.

## Cold invocation
RESTORE_START Runtime Version: java:21.v10 Runtime Version ARN: arn:aws:lambda:ca-central-1::runtime:2b04f677de1d4fc77196dcb1051a81ee8856b6dc4bb78e941f2bc400187237e1
RESTORE_REPORT Restore Duration: 626.11 ms
START RequestId: c56e5cb2-2687-4eb5-891b-50d89e50057b Version: 1
END RequestId: c56e5cb2-2687-4eb5-891b-50d89e50057b
REPORT RequestId: c56e5cb2-2687-4eb5-891b-50d89e50057b Duration: 2138.88 ms Billed Duration: 2312 ms Memory Size: 512 MB Max Memory Used: 147 MB Restore Duration: 626.11 ms Billed Restore Duration: 173 ms


## Warm invocation
START RequestId: 2d695e88-237e-49b2-8c0a-ab56dd65227f Version: 1
END RequestId: 2d695e88-237e-49b2-8c0a-ab56dd65227f
REPORT RequestId: 2d695e88-237e-49b2-8c0a-ab56dd65227f Duration: 75.60 ms Billed Duration: 76 ms Memory Size: 512 MB Max Memory Used: 150 MB
Table displaying results comparing Snap Start enabled and disabled
Table 3: Results comparing Snap Start enabled and disabled

Conclusion

In summary, applying dependency initialization outside the handler method, JIT compiler level 1, and enabling SnapStart improved total duration for cold execution in ~81% and ~73% for warm execution.

It is important to note that in the cold execution, the penalty of initialization was compensated by the handler execution. Having an overall improvement of ~81%.

Table displaying results of code without any improvements and with all improvements presented in the article
Table 4: Results without any improvements and with all improvements

Also, these three techniques helped to improve not only startup time, but also execution time. As a bonus, if you want to dive deep into performance articles you can refer to:

  1. Optimizing AWS Lambda function performance for Java
  2. Operating Lambda: Performance optimization — Part 1
  3. Operating Lambda: Performance optimization — Part 2
  4. Operating Lambda: Performance optimization — Part 3
  5. Improving startup performance with Lambda SnapStart
  6. Reduce SDK startup time for AWS Lambda

References

[1] https://aws.amazon.com/blogs/compute/operating-lambda-performance-optimization-part-1/

[2] https://docs.oracle.com/javase/8/docs/technotes/guides/vm/performance-enhancements-7.html

[3] https://devblogs.microsoft.com/java/how-tiered-compilation-works-in-openjdk/

[4] https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html

--

--

Diogo Peixoto
Diogo Peixoto

Written by Diogo Peixoto

Apaixonado por compartilhar, errar, aprender e um pouco de engenharia de software

Responses (1)