Handling binary responses with a custom AWS Lambda runtime in Rust
2022-10-18
This blog post shares how I handle binary responses with a custom AWS Lambda runtime written in Rust.
Background
We can make any dynamic responses from an Amazon API Gateway REST API (REST API) with a Lambda integration. I have decided to implement an AWS Lambda (Lambda) function that produces dynamic binary data for my REST API. As I have been learning Rust, I have decided to implement a Lambda function with Rust*. Please note this blog post deals with a Lambda function for Lambda non-proxy integration.
* I believe that Python and Node.js (JavaScript) are not very good at processing binary data. Go may be a good choice for binary data handling, and AWS provides an official Lambda runtime for Go, though I want to learn Rust anyway.
Lambda runtime for Rust
Although AWS is seemingly enthusiastic about Rust, they provide no official Lambda runtime for Rust.
So we have to implement a custom Lambda runtime in Rust or create a container.
If you look around for how to implement a Lambda function in Rust, you will likely come across the library aws-lambda-rust-runtime
.
It does the heavy lifting for us when we implement a custom Lambda runtime in Rust.
Thanks to aws-lambda-rust-runtime
, implementing a Lambda function in Rust is fairly easy.
There is a simple tutorial on the GitHub repository of aws-lambda-rust-runtime
.
I do not repeat it here but show an excerpt from the tutorial below, the actual Rust code you have to write:
use ;
use ;
async
async
Dealing with binary data with aws-lambda-rust-runtime
aws-lambda-rust-runtime
works very well as long as our output is JSON-encodable.
Returning Vec<u8> does not work
The run
function of the core module lambda_runtime
accepts any service that returns a value implementing serde::Serialize
.
To output binary, I first thought my service function could simply return a Vec<u8>
and tried to do so.
It turned out that lambda_runtime
produced a JSON representation of an array instead of a BLOB.
For instance, when I provided lambda_runtime
with a Vec<u8>
of [0x61, 0x62, 0x63]
, I got
instead of abc
*.
* 0x61
, 0x62
, and 0x63
represent 'a'
, 'b'
, and 'c'
on ASCII respectively.
How does lambda_runtime handle results?
lambda_runtime::run
converts any service function outputs into JSON, and this behavior is hard-coded in lambda-runtime/src/requests.rs#L77-L85 where you can find serde_json::to_vec(&self.body)
:
Any solution?
There was a discussion about binary responses on GitHub, which suggested using the lambda_http
module.
As far as I looked into it, lambda_http
was designed for Lambda proxy integration.
So it could not simply produce a plain BLOB.
Then I came up with the following two workarounds,
- Embed a Base64-encoded binary as a field value in a JSON object, extract it with a mapping template for integration responses, and apply
CONVERT_TO_BINARY
. (This method neither produces a simple BLOB.) - Tweak
lambda_runtime
so that it can handle a raw binary output.
The easier pathway should have been the first one. However, I took the second one* as it was an opportunity to practice Rust.
* Later, I found my efforts were unnecessary. You can jump to the Section "Simpler solution" for a much simpler and easier way.
Tweak for lambda_runtime
While the primary goal of the tweak was to support binary responses, I set a secondary one: make sure that existing programs that output a JSON-serializable object continue to work (backward compatibility).
This section introduces
IntoRequest
trait that is the outlet of function responses- My workaround: introduction of
IntoBytes
andRawBytes
IntoRequest trait
IntoRequest
(lambda-runtime/src/requests.rs#L8-L10) serializes every successful output from your service function and creates a request object to send to the invocation response endpoint for the Lambda function*:
pub
While IntoRequest
plays a key role in lambda_runtime
, you do not directly implement it*2 for your function outputs.
There is a "bridge" between a function result and IntoRequest
: EventCompletionRequest<T>
where T
is the type of your function output (lambda-runtime/src/requests.rs#L68-L71):
pub
lambda_runtime
wraps your function output with EventCompletionRequest<T>
and processes it as IntoRequest
(lambda-runtime/src/lib.rs#L164-L168):
EventCompletionRequest .into_req
IntoRequest
is implemented for EventCompletionRequest<T>
such that T
is serde::Serialize
(lambda-runtime/src/requests.rs#L73-L75):
That is why a Serialize
that your service function outputs becomes a JSON object.
* The word "Request" in the name IntoRequest
while representing a function response may confuse you, but "Request" here stands for a request sent to an invocation response endpoint for a custom Lambda runtime.
*2 IntoRequest
is not exported anyway.
Introduction of IntoBytes and RawBytes
We have to somehow generalize the serde_json::to_vec
call in IntoRequest::into_req
.
We can consider it a conversion from a service output to a byte sequence (Vec<u8>
).
So how about to specialize IntoRequest
for EventCompletionRequest<'a, Vec<u8>>
?
Unfortunately, this does not work because we cannot make lambda_runtime::run
accept both Serialize
and Vec<u8>
as a service output.
pub async
We need a new type that can be translated into either Serialize
or a raw byte sequence.
Well, how about to introduce a new trait IntoBytes
and specialize IntoRequest
for EventCompletionRequest<'a, IntoBytes>
instead of EventCompletionRequest<'a, Serialize>
as follows?
Then, we have to rewrite the signature of lambda_runtime::run
.
How about the following?
pub async
If we do like the above, we will lose the backward compatibility; i.e., the service function can no longer simply return a Serialize
.
To work around this, I have introduced another struct IntoBytesBridge
:
;
IntoBytes
is specialized for IntoBytesBridge<Serialize>
as follows:
Thanks to IntoBytesBridge
, we can rewrite the signature of lambda_runtime::run
into:
pub async
It works with Serialize
because IntoBytes
is implemented for IntoBytesBridge<Serialize>
(see above).
Now we introduce a new data type RawBytes
to tell our intention to output a raw byte sequence and specialize IntoBytes
for IntoBytesBridge<RawBytes>
.
;
Then we can output a raw byte sequence by wrapping the output from our service function with RawBytes
.
The following example outputs a raw string abc
rather than a JSON array [97, 98, 99]
:
use ;
use Value;
async
async
Limitation of the Lambda integration?
A strange thing happened when I tried the new feature developed in the previous section to implement my REST API.
My API outputted byte sequences slightly different from what I had returned as service responses.
I realized that my API occasionally produced a triple (0xE, 0xBF, 0xBD)
.
After looking into the problem, it turned out the triple substituted a byte supposed to have its most significant bit 1
; in other words, bitwise AND with 0x80
was not zero.
The triple (0xEF, 0xBF, 0xBD)
was indeed a UTF-8 representation of a replacement character (U+FFFD
), and it meant someone expecting a valid UTF-8 sequence had replaced unwanted bytes in my API output.
So who was responsible for that?
The endpoint for Lambda invocation responses?
Or the Lambda integration of the Amazon API Gateway?
Or even the Rust dependencies?
When I directly invoked my Lambda function with the AWS CLI command aws lambda invoke
, I got an exact byte sequence as I expected.
So the Lambda invocation response endpoint and Rust dependencies were in the clear, and the Lambda integration was the cause.
I knew Lambda proxy integration expects JSON, a subset of a valid UTF-8 sequence, as the output.
But I had not realized Lambda non-proxy integration also expects a UTF-8 sequence as the output*.
* I have not confirmed this in a legitimate source yet.
Workaround
My workaround was to encode the service output into Base64 and specify CONVERT_TO_BINARY
to contentHandling
.
But one question arises; if we have to encode a binary into a Base64 text, have we really needed the extension of lambda_runtime
?
This question leads to a much simpler solution described in the next section.
Simpler solution
In the last section, we learned Lambda non-proxy integration does not allow a service to output an arbitrary byte sequence.
Thus, we have to Base64-encode our service outputs anyway.
It means that the benefits of tweaking lambda_runtime
are half-lost.
In the Section "Any solutions?", I have suggested another solution that requires no tweaks on lambda_runtime
.
- Embed a Base64-encoded binary as a field value in a JSON object, extract it with a mapping template for integration responses, and apply
CONVERT_TO_BINARY
.
However, I feel it is a bit awkward because we have to write a mapping template only to extract a field value.
So, how about directly outputting a Base64-encoded String
and serializing it as JSON?
Since Serialize
is implemented for String
, our service functions can output it without any tweaks:
use ;
use Value;
async
async
I initially thought it did not work because the output would be enclosed by extra double quotations ("
) and become an invalid Base64 text.
But, it turned out to work!
When I got an output via aws lambda invoke
, it actually included enclosing double quotations.
But Lambda integration somehow recognized how to deal with it and correctly decoded an intended Base64 text.
Wrap up
In this blog post, we saw aws-lambda-rust-runtime
helped us to implement a custom Lambda runtime in Rust.
Then I showed you my tweaks on aws-lambda-rust-runtime
to handle binary data.
However, we found that simply returning a Base64-encoded String
was the easiest way to deal with binary outputs with Lambda integration for Amazon API Gateway.
While it has turned out not very useful, you can find my tweaks on aws-lambda-rust-runtime
on my GitHub fork.
Appendix
Building a Rust Lambda runtime with CDK
I found rust.aws-cdk-lambda
helpful when we build a Lambda function in Rust with the AWS Cloud Development Kit (CDK).