Pipelines for AWS Lambda – Part 4: Troubleshooting

Nov 5, 2021

TL/DR;

It’s best to test Lambda "inside-out" by fist making sure the Lambda works, then the invocation (in this case API Gateway), then external. Pipeline error logging and CloudWatch logging are your best friends for troubleshooting.

Background

In this series of posts we walked through the following steps for using the AWS Serverless Application Model (SAM) for setting up a GitHub Actions pipeline for deploying serverless functions written in Node.js. Previous posts covered:

Part 1: The Deployment Stack – Writing a reusable CloudFormation template to create required AWS resources for the pipeline
Part 2: The Code – Writing a basic REST API in Node.js
Part 3: The Pipeline – Configuring the GitHub pipeline generated by SAM to work with Node.js and different branch naming conventions

When building a SAM application, you have two choices for how you configure your API gateway: Api and HttpApi. You can review the AWS Documentation for a comparison of these two options. We will discuss techniques that apply to each of these options.

Linting

I can't say this enough. When you are looking for problems with your application, Look for horses and not zebras. The longer you spend trying to track down the cause of a problem without finding a solution, the more likely the answer is staring you right in the face. Very often, problems can be found with static code analysis ("linting") and specifically eslint if you are using Node.js. While working on the proof of concept for this post, I chased a bug way too long that was just an invalid reference inserted by my IDE. The error message pointed me to the exact line but since the error message said it couldn’t find a reference (and I thought I hadn’t added any new references), I thought there was something wrong with loading the dependencies. Since the code I was using for this series was so simple, I didn't bother adding eslint. As soon as I did, I found the issue since it highlighted the unused reference introduced by the IDE. The moral of the story is use linting to find easily-fixed problems.

Unit Testing

I don't want to get into a philosophical conversation on what is and is not a unit test. I might venture into that conversation another day. For the sake of this post, let's consider "unit testing" analogous to "local testing" – any test than can be run outside of the AWS ecosystem. This way you can run the test in your local dev environment or in the pipeline. These test are extremely important to successful development for microservices and the cloud. You need to be able to test the atomic transaction your lambda is supposed to perform. The great part of Lambda is that you can invoke the code multiple ways. The same function can be invoked from an API Gateway like in our example or from an SQS queue, SNS topic, CloudWatch event, etc. If your code works, it should work across any use case. Of course if you are integrating with other AWS services like S3 or need network connectivity, then the permissions and resources need to be configured correctly in AWS. However none of this matters if there is a bug in your code. Test your code thoroughly and shoot for 100% code coverage even if that means your "unit test" smells more like an "integration test" (ex: use docker run or docker-compose to spin up a database in a container rather to test CRUD transactions). Structure your code based on business logic and then have a handler function that only handles the routing of parameters from the event parameters to your function(s). Then test this function.

Note that you can also run Lambda functions in a container. The easiest way to do this is with sam local invoke which will use the information in your SAM template to create the Lambda Function in a container. For this simple Hello World example, I think this is a perfectly valid technique. However, as you start adding other AWS services to your Lambda, you would need to extend any permissions needed to run the Lambda to an access key shared with the developer (i.e the developer’s persona IAM user and role). In other words, you have achieve all of the same security requirements in your local environment that need to be met in the AWS account. I would argue you are better off always running in a dev AWS account rather than locally. This might seem unnecessarily painful at first, but if you follow these other testing techniques, you are very unlikely to have issues and you will actually move more quickly since everything is developed and tested within the ecosystem of the pipeline and AWS account so you don’t run into configuration problems due to differences between the local and AWS environment.

Validating the SAM Template

Even though we are using SAM for the GitHub Actions pipeline, you can follow all of the steps in this series of posts without every having to use the SAM CLI. This is by design. I am a firm believer that you should be able to develop for AWS using only code and your standard development tools and services so these posts are intended to document a process that follows that belief. However, since we are using SAM for deployment, it is good to use SAM for local features where it makes sense. Before you can run SAM, you will need to make sure you have installed it as defined in the AWS documentation. To validate your SAM template (template.yaml in our example), simply run sam validate. Note that you may need to specify your region with the --region option if you have not configured this in you default AWS configuration. If this is the case, sam validate will respond with this information. Below is an example of an error found using sam validate:

$ sam validate
2021-11-04 11:17:30 Loading policies from IAM...
2021-11-04 11:17:32 Finished loading policies from IAM.
Template provided at '/Users/doug/code/aws-sam-demo/template.yaml' was invalid SAM Template.
Error: [InvalidResourceException('LambdaNodeApi', "Invalid value for 'Cors' property.")] ('LambdaNodeApi', "Invalid value for 'Cors' property.")

In this example, I had used the AllowOrigin key inside the CorsConfiguration section for an HttpApi but the correct key is AllowOrigins. Note that this error does not point you to the exact line so it is important to review the exact syntax for the section referenced by the error.

If everything is good, you should see output that looks something like this:

$ sam validate
2021-11-04 10:58:12 Loading policies from IAM...
2021-11-04 10:58:14 Finished loading policies from IAM.
/Users/doug/code/aws-sam-demo/template.yaml is a valid SAM Template

Troubleshooting Lambda Function

Testing Invocation

You can test your Lambda function in the AWS console. Navigate to the function (you can start with the CloudFormation stack for your most recent deployment if you aren’t sure about the name of your function) and select the "Test" tab. The default test data will be based off of the "hello-world" template. This does NOT match the schema for a call from an API Gateway so you will most likely need to modify the data to match the values your code expects from the event parameter.

Finding information on the syntax of the event parameter for your Lambda function is surprisingly difficult since there are multiple ways to invoke a Lambda function and each option has its own unique schema for the event value. This matrix provided in the AWS documentation points to all of the various invocation methods. Since we are invoking our Lambdas from an API Gateway in our example, you might want to review the schema for the event as defined for API Gateway invocation provided in the AWS documentation.

Debug Logging

By default, logs and metrics are enabled for all Lambda functions created with AWS SAM. To view the CloudWatch logs, simply navigate to the Lambda function, select the "Monitor" tab, and then click "View logs in CloudWatch". Typically, there will be a unique log stream for each invocation of your function. Select the log stream to see the logs. Note that anything you write to the standard output (console.log in Node.js) is written to the CloudWatch log stream.

Note that even though the event schema is defined and documented, not all values are implemented for every configuration or use case. Therefore you may want to write the event value to the CloudWatch logs as follows:

console.log(event);

Troubleshooting Api Option

Testing via AWS Console

The Api option supports testing directly in the AWS console. You can navigate to the API gateway (again, the CloudFormation stack is your friend here) and select “Resources” from the menu and then select a method ("GET" in our example). Then click the lightning bold icon to access the test page. On this page, you can enter any content required for the request (path/query parameters, headers, body, etc.) and then click the "Test" button to test the API.

After you test the API, you will see the response status, body, headers, and log output on the right-hand side of the page. One important item in the log output is Endpoint request body after transformations. This will show you the value of the event parameter passed to your Lambda function. You can also see the return value of your Lambda displayed as Endpoint response body before transformations. If your response status or body isn’t what you expected, you should review your response object compared to the syntax as defined in the AWS documentation.

Troubleshooting HttpApi Option

Debug Logging

Before you can log API Gateway activity for APIs created with the HttpApi option, you have to enable CloudWatch at the account level. Review the this gist which you can add to the deployment stack template as we did in Part 1.

One key benefit of the HttpApi option is support for generic JWT authorizers which are convenient if your are using a third-party authentication provider such as Auth0. The HttpApi supports a FailOnWarnings property which defaults to false. You can change this value to true as shown below:

  LambdaNodeApi:
    Type: AWS::Serverless::HttpApi
    Properties:
      FailOnWarnings: true

Enabling this setting will provide information on "Warnings" which could actually be preventing AWS from creating resources required for your API to function. The example below shows a failure in the pipeline that occurred during sam deploy due to missing the Audience configuration for the JWT authorizer.

-------------------------------------------------------------------------------------------------
ResourceStatus           ResourceType             LogicalResourceId        ResourceStatusReason   
-------------------------------------------------------------------------------------------------
UPDATE_IN_PROGRESS       AWS::ApiGatewayV2::Api   LambdaNodeApi            -                      
UPDATE_FAILED            AWS::ApiGatewayV2::Api   LambdaNodeApi            Warnings found during  
                                                                           import:         CORS   
                                                                           Scheme is malformed,   
                                                                           ignoring.              
                                                                           Unable to create       
                                                                           Authorizer 'LambdaNode 
                                                                           Authorizer': Audience  
                                                                           list must have at      
                                                                           least 1 item for JWT   
                                                                           Authorizer. Ignoring.  
                                                                           Unable to put method   
                                                                           'GET' on resource at   
                                                                           path '/': Invalid      
                                                                           authorizer definition. 
                                                                           Setting the            
                                                                           authorization type to  
                                                                           JWT requires a valid   
                                                                           authorizer. Ignoring.  
                                                                           (Service:              
                                                                           AmazonApiGatewayV2;    
                                                                           Status Code: 400;      
                                                                           Error Code:            
                                                                           BadRequestException;   
                                                                           Request ID: 43a34e55-d 
                                                                           0d0-4ed2-8571-eb473e71 
                                                                           a9e2; Proxy: null)     
                                                                           (Service: null; Status 
                                                                           Code: 404; Error Code: 
                                                                           BadRequestException;   
                                                                           Request ID: null;      
                                                                           Proxy: null)

Remote Testing

The final phase of testing is to actually execute the API "in the field". This can be done using a tool such as Postman or curl.

What Do You Mean CloudFront Error?

One error to watch out for when testing your function through the API Gateway is a CloudFront error. This may seem like a strange error since nowhere in this series of posts do we mention CloudFront, but I have a skill for finding strange errors that aren’t easy to find solutions for. I invoked a function that supported path parameters but passed in an invalid path and a JWT for an authorizer that was also not expected. The body of the response looked like this:

{
    "message": "'[JWT VALUE GOES HERE]' not a valid key=value pair (missing equal-sign) in Authorization header: 'Bearer [JWT VALUE GOES HERE]'."
}

More confusing was the X-Cache header in the response which stated "Error from cloudfront". My invalid request was being blocked by CloudFront which sits in front of your API Gateway as part of the AWS infrastructure. This was particularly difficult to discover since CloudFront was blocking the API from being called so even once I enabled logging for my API Gateway built on the HttpApi option, I was still not seeing any activity (or error).

Summary

I recommend an "inside-out" approach to troubleshooting as follows:

Use static code analysis or "linting".
Structure your code based on business logic and shoot for 100% code coverage testing this code.
Validate your SAM template with sam validate.
Test your Lambda function in the AWS console.
Use debug logging (possibly logging the event parameter) to troubleshoot the Lambda.
Test APIs created using the Api option using the AWS console.
Use FailOnWarnings and enable CloudWatch logs for to find issues with the HttpApi option.
Test from outside AWS with a tool such as curl or Postman.
CloudFront errors usually mean you are sending a request that is way off target for your API Gateway (probably calling the wrong API).