Pipelines for AWS Lambda – Part 4: Troubleshooting

TL/DR;

It’s best to test Lambda “inside-out” by fist making sure the Lambda works, then the invocation (in this case API Gateway), then external. Pipeline error logging and CloudWatch logging are your best friends for troubleshooting.

Background

In this series of posts we walked through the following steps for using the AWS Serverless Application Model (SAM) for setting up a GitHub Actions pipeline for deploying serverless functions written in Node.js. Previous posts covered:

When building a SAM application, you have two choices for how you configure your API gateway: Api and HttpApi. You can review the AWS Documentation for a comparison of these two options. We will discuss techniques that apply to each of these options.

Linting

I can’t say this enough. When you are looking for problems with your application, Look for horses and not zebras. The longer you spend trying to track down the cause of a problem without finding a solution, the more likely the answer is staring you right in the face. Very often, problems can be found with static code analysis (“linting”) and specifically eslint if you are using Node.js. While working on the proof of concept for this post, I chased a bug way too long that was just an invalid reference inserted by my IDE. The error message pointed me to the exact line but since the error message said it couldn’t find a reference (and I thought I hadn’t added any new references), I thought there was something wrong with loading the dependencies. Since the code I was using for this series was so simple, I didn’t bother adding eslint. As soon as I did, I found the issue since it highlighted the unused reference introduced by the IDE. The moral of the story is use linting to find easily-fixed problems.

Unit Testing

I don’t want to get into a philosophical conversation on what is and is not a unit test. I might venture into that conversation another day. For the sake of this post, let’s consider “unit testing” analogous to “local testing” – any test than can be run outside of the AWS ecosystem. This way you can run the test in your local dev environment or in the pipeline. These test are extremely important to successful development for microservices and the cloud. You need to be able to test the atomic transaction your lambda is supposed to perform. The great part of Lambda is that you can invoke the code multiple ways. The same function can be invoked from an API Gateway like in our example or from an SQS queue, SNS topic, CloudWatch event, etc. If your code works, it should work across any use case. Of course if you are integrating with other AWS services like S3 or need network connectivity, then the permissions and resources need to be configured correctly in AWS. However none of this matters if there is a bug in your code. Test your code thoroughly and shoot for 100% code coverage even if that means your “unit test” smells more like an “integration test” (ex: use docker run or docker-compose to spin up a database in a container rather to test CRUD transactions). Structure your code based on business logic and then have a handler function that only handles the routing of parameters from the event parameters to your function(s). Then test this function.

Note that you can also run Lambda functions in a container. The easiest way to do this is with sam local invoke which will use the information in your SAM template to create the Lambda Function in a container. For this simple Hello World example, I think this is a perfectly valid technique. However, as you start adding other AWS services to your Lambda, you would need to extend any permissions needed to run the Lambda to an access key shared with the developer (i.e the developer’s persona IAM user and role). In other words, you have achieve all of the same security requirements in your local environment that need to be met in the AWS account. I would argue you are better off always running in a dev AWS account rather than locally. This might seem unnecessarily painful at first, but if you follow these other testing techniques, you are very unlikely to have issues and you will actually move more quickly since everything is developed and tested within the ecosystem of the pipeline and AWS account so you don’t run into configuration problems due to differences between the local and AWS environment.

Validating the SAM Template

Even though we are using SAM for the GitHub Actions pipeline, you can follow all of the steps in this series of posts without every having to use the SAM CLI. This is by design. I am a firm believer that you should be able to develop for AWS using only code and your standard development tools and services so these posts are intended to document a process that follows that belief. However, since we are using SAM for deployment, it is good to use SAM for local features where it makes sense. Before you can run SAM, you will need to make sure you have installed it as defined in the AWS documentation. To validate your SAM template (template.yaml in our example), simply run sam validate. Note that you may need to specify your region with the --region option if you have not configured this in you default AWS configuration. If this is the case, sam validate will respond with this information. Below is an example of an error found using sam validate:

$ sam validate
2021-11-04 11:17:30 Loading policies from IAM...
2021-11-04 11:17:32 Finished loading policies from IAM.
Template provided at '/Users/doug/code/aws-sam-demo/template.yaml' was invalid SAM Template.
Error: [InvalidResourceException('LambdaNodeApi', "Invalid value for 'Cors' property.")] ('LambdaNodeApi', "Invalid value for 'Cors' property.")

In this example, I had used the AllowOrigin key inside the CorsConfiguration section for an HttpApi but the correct key is AllowOrigins. Note that this error does not point you to the exact line so it is important to review the exact syntax for the section referenced by the error.

If everything is good, you should see output that looks something like this:

$ sam validate
2021-11-04 10:58:12 Loading policies from IAM...
2021-11-04 10:58:14 Finished loading policies from IAM.
/Users/doug/code/aws-sam-demo/template.yaml is a valid SAM Template

Troubleshooting Lambda Function

Testing Invocation

You can test your Lambda function in the AWS console. Navigate to the function (you can start with the CloudFormation stack for your most recent deployment if you aren’t sure about the name of your function) and select the “Test” tab. The default test data will be based off of the “hello-world” template. This does NOT match the schema for a call from an API Gateway so you will most likely need to modify the data to match the values your code expects from the event parameter.

Finding information on the syntax of the event parameter for your Lambda function is surprisingly difficult since there are multiple ways to invoke a Lambda function and each option has its own unique schema for the event value. This matrix provided in the AWS documentation points to all of the various invocation methods. Since we are invoking our Lambdas from an API Gateway in our example, you might want to review the schema for the event as defined for API Gateway invocation provided in the AWS documentation.

Debug Logging

By default, logs and metrics are enabled for all Lambda functions created with AWS SAM. To view the CloudWatch logs, simply navigate to the Lambda function, select the “Monitor” tab, and then click “View logs in CloudWatch”. Typically, there will be a unique log stream for each invocation of your function. Select the log stream to see the logs. Note that anything you write to the standard output (console.log in Node.js) is written to the CloudWatch log stream.

Note that even though the event schema is defined and documented, not all values are implemented for every configuration or use case. Therefore you may want to write the event value to the CloudWatch logs as follows:

  console.log(event);

Troubleshooting Api Option

Testing via AWS Console

The Api option supports testing directly in the AWS console. You can navigate to the API gateway (again, the CloudFormation stack is your friend here) and select “Resources” from the menu and then select a method (“GET” in our example). Then click the lightning bold icon to access the test page. On this page, you can enter any content required for the request (path/query parameters, headers, body, etc.) and then click the “Test” button to test the API.

After you test the API, you will see the response status, body, headers, and log output on the right-hand side of the page. One important item in the log output is Endpoint request body after transformations. This will show you the value of the event parameter passed to your Lambda function. You can also see the return value of your Lambda displayed as Endpoint response body before transformations. If your response status or body isn’t what you expected, you should review your response object compared to the syntax as defined in the AWS documentation.

Troubleshooting HttpApi Option

Debug Logging

Before you can log API Gateway activity for APIs created with the HttpApi option, you have to enable CloudWatch at the account level. Review the this gist which you can add to the deployment stack template as we did in Part 1.

One key benefit of the HttpApi option is support for generic JWT authorizers which are convenient if your are using a third-party authentication provider such as Auth0. The HttpApi supports a FailOnWarnings property which defaults to false. You can change this value to true as shown below:

  LambdaNodeApi:
    Type: AWS::Serverless::HttpApi
    Properties:
      FailOnWarnings: true

Enabling this setting will provide information on “Warnings” which could actually be preventing AWS from creating resources required for your API to function. The example below shows a failure in the pipeline that occurred during sam deploy due to missing the Audience configuration for the JWT authorizer.

-------------------------------------------------------------------------------------------------
ResourceStatus           ResourceType             LogicalResourceId        ResourceStatusReason   
-------------------------------------------------------------------------------------------------
UPDATE_IN_PROGRESS       AWS::ApiGatewayV2::Api   LambdaNodeApi            -                      
UPDATE_FAILED            AWS::ApiGatewayV2::Api   LambdaNodeApi            Warnings found during  
                                                                           import:         CORS   
                                                                           Scheme is malformed,   
                                                                           ignoring.              
                                                                           Unable to create       
                                                                           Authorizer 'LambdaNode 
                                                                           Authorizer': Audience  
                                                                           list must have at      
                                                                           least 1 item for JWT   
                                                                           Authorizer. Ignoring.  
                                                                           Unable to put method   
                                                                           'GET' on resource at   
                                                                           path '/': Invalid      
                                                                           authorizer definition. 
                                                                           Setting the            
                                                                           authorization type to  
                                                                           JWT requires a valid   
                                                                           authorizer. Ignoring.  
                                                                           (Service:              
                                                                           AmazonApiGatewayV2;    
                                                                           Status Code: 400;      
                                                                           Error Code:            
                                                                           BadRequestException;   
                                                                           Request ID: 43a34e55-d 
                                                                           0d0-4ed2-8571-eb473e71 
                                                                           a9e2; Proxy: null)     
                                                                           (Service: null; Status 
                                                                           Code: 404; Error Code: 
                                                                           BadRequestException;   
                                                                           Request ID: null;      
                                                                           Proxy: null)           

Remote Testing

The final phase of testing is to actually execute the API “in the field”. This can be done using a tool such as Postman or curl.

What Do You Mean CloudFront Error?

One error to watch out for when testing your function through the API Gateway is a CloudFront error. This may seem like a strange error since nowhere in this series of posts do we mention CloudFront, but I have a skill for finding strange errors that aren’t easy to find solutions for. I invoked a function that supported path parameters but passed in an invalid path and a JWT for an authorizer that was also not expected. The body of the response looked like this:

{
    "message": "'[JWT VALUE GOES HERE]' not a valid key=value pair (missing equal-sign) in Authorization header: 'Bearer [JWT VALUE GOES HERE]'."
}

More confusing was the X-Cache header in the response which stated “Error from cloudfront”. My invalid request was being blocked by CloudFront which sits in front of your API Gateway as part of the AWS infrastructure. This was particularly difficult to discover since CloudFront was blocking the API from being called so even once I enabled logging for my API Gateway built on the HttpApi option, I was still not seeing any activity (or error).

Summary

I recommend an “inside-out” approach to troubleshooting as follows:

  • Use static code analysis or “linting”.
  • Structure your code based on business logic and shoot for 100% code coverage testing this code.
  • Validate your SAM template with sam validate.
  • Test your Lambda function in the AWS console.
  • Use debug logging (possibly logging the event parameter) to troubleshoot the Lambda.
  • Test APIs created using the Api option using the AWS console.
  • Use FailOnWarnings and Enable CloudWatch logs for to find issues with the HttpApi option.
  • Test from outside AWS with a tool such as curl or Postman.
  • CloudFront errors usually mean you are sending a request that is way off target for your API Gateway (probably calling the wrong API).

Pipelines for AWS Lambda – Part 3: The Pipeline

TL/DR;

You can create a GitHub Action pipeline with sam pipeline init, but it will be configured for python and feature branches that start with “feature”.

GitHub Action Pipeline

The next step of the tutorial is to run sam pipeline init. Unlike sam pipeline bootstrap, this command does not deploy resources directly to AWS. In the example below, I entered placeholders for the ARNs for the resources created in Part 1, but you can enter these ARNs when configuring your pipeline.

$ sam pipeline init

sam pipeline init generates a pipeline configuration file that your CI/CD system
can use to deploy serverless applications using AWS SAM.
We will guide you through the process to bootstrap resources for each stage,
then walk through the details necessary for creating the pipeline config file.

Please ensure you are in the root folder of your SAM application before you begin.

Select a pipeline structure template to get started:
Select template
	1 - AWS Quick Start Pipeline Templates
	2 - Custom Pipeline Template Location
Choice: 1

Cloning from https://github.com/aws/aws-sam-cli-pipeline-init-templates.git
CI/CD system
	1 - Jenkins
	2 - GitLab CI/CD
	3 - GitHub Actions
	4 - AWS CodePipeline
Choice: 3
You are using the 2-stage pipeline template.
 _________    _________ 
|         |  |         |
| Stage 1 |->| Stage 2 |
|_________|  |_________|

Checking for existing stages...

[!] None detected in this account.

To set up stage(s), please quit the process using Ctrl+C and use one of the following commands:
sam pipeline init --bootstrap       To be guided through the stage and config file creation process.
sam pipeline bootstrap              To specify details for an individual stage.

To reference stage resources bootstrapped in a different account, press enter to proceed []: 

This template configures a pipeline that deploys a serverless application to a testing and a production stage.

What is the GitHub secret name for pipeline user account access key ID? [AWS_ACCESS_KEY_ID]: 
What is the GitHub Secret name for pipeline user account access key secret? [AWS_SECRET_ACCESS_KEY]: 
What is the git branch used for production deployments? [main]: 
What is the template file path? [template.yaml]: 
We use the stage name to automatically retrieve the bootstrapped resources created when you ran `sam pipeline bootstrap`.


What is the name of stage 1 (as provided during the bootstrapping)?
Select an index or enter the stage name: Build
What is the sam application stack name for stage 1? [sam-app]: build-stack
What is the pipeline execution role ARN for stage 1?: pipeline-execution-arn
What is the CloudFormation execution role ARN for stage 1?: clouformation-execution-arn
What is the S3 bucket name for artifacts for stage 1?: build-bucket
What is the ECR repository URI for stage 1? []: 
What is the AWS region for stage 1?: us-east-1
Stage 1 configured successfully, configuring stage 2.


What is the name of stage 2 (as provided during the bootstrapping)?
Select an index or enter the stage name: deploy
What is the sam application stack name for stage 2? [sam-app]: deploy-stack
What is the pipeline execution role ARN for stage 2?: pipeline-execution-arn
What is the CloudFormation execution role ARN for stage 2?: clouformation-execution-arn
What is the S3 bucket name for artifacts for stage 2?: deploy-bucket
What is the ECR repository URI for stage 2? []: 
What is the AWS region for stage 2?: us-east-1
Stage 2 configured successfully.

SUMMARY
We will generate a pipeline config file based on the following information:
	What is the GitHub secret name for pipeline user account access key ID?: AWS_ACCESS_KEY_ID
	What is the GitHub Secret name for pipeline user account access key secret?: AWS_SECRET_ACCESS_KEY
	What is the git branch used for production deployments?: main
	What is the template file path?: template.yaml
	What is the name of stage 1 (as provided during the bootstrapping)?
Select an index or enter the stage name: Build
	What is the sam application stack name for stage 1?: build-stack
	What is the pipeline execution role ARN for stage 1?: pipeline-execution-arn
	What is the CloudFormation execution role ARN for stage 1?: clouformation-execution-arn
	What is the S3 bucket name for artifacts for stage 1?: build-bucket
	What is the ECR repository URI for stage 1?: 
	What is the AWS region for stage 1?: us-east-1
	What is the name of stage 2 (as provided during the bootstrapping)?
Select an index or enter the stage name: deploy
	What is the sam application stack name for stage 2?: deploy-stack
	What is the pipeline execution role ARN for stage 2?: pipeline-execution-arn
	What is the CloudFormation execution role ARN for stage 2?: clouformation-execution-arn
	What is the S3 bucket name for artifacts for stage 2?: deploy-bucket
	What is the ECR repository URI for stage 2?: 
	What is the AWS region for stage 2?: us-east-1
Successfully created the pipeline configuration file(s):
	- .github/workflows/pipeline.yaml

This will create the GitHub pipeline configuration which I have captured in this gist.

Configuring Runtime Platform

Using the default template creates a pipeline to build a python app. So the first change I made was to replace these lines to configure the python actions:

      - uses: actions/setup-python@v2

with the node configuration as follows (note the version specification):

      - uses: actions/setup-node@v2
        with:
          node-version: '14'

Configuring Branch Naming

Another minor issue with the generated pipeline is that it assumes a certain convention for naming branches where all feature branches start with “feature”. Typically for my open source projects, I just use the GitHub issue number and title as my branch name (so something like 123-my-issue-title). Therefore I modified the branch filters at the top of the pipeline configuration as follows:

on:
  push:
    branches:
      - 'main'
      - '[0-9]+**'

Then modified the build-and-deploy-feature stage as follows so it runs on any branch other than main:

  build-and-deploy-feature:
    # this stage is triggered only for feature branches (feature*),
    # which will build the stack and deploy to a stack named with branch name.
    if: github.ref != 'refs/heads/main'

A similar change was required for delete-feature since this runs only in feature branches. Notice that the condition looks at github.event.ref and not github.ref as shown in the previous change.

  delete-feature:
    if: github.event.ref != 'refs/heads/main' && github.event_name == 'delete'

Finally, this naming convention breaks sam delpoy since this uses a CloudFormation stack name that matches the branch name. Since the stack cannot start with a number, I added a “feature-” prefix to the stack name in the build-and-deploy-feature stage as shown:

      - name: Deploy to feature stack in the testing account
        shell: bash
        run: |
          sam deploy --stack-name feature-$(echo ${GITHUB_REF##*/} | tr -cd '[a-zA-Z0-9-]') \
            --capabilities CAPABILITY_IAM \
            --region ${TESTING_REGION} \
            --s3-bucket ${TESTING_ARTIFACTS_BUCKET} \
            --no-fail-on-empty-changeset \
            --role-arn ${TESTING_CLOUDFORMATION_EXECUTION_ROLE}

Summary

The pipeline configuration created by sam pipeline init is fairly comprehensive. It handles creating a unique deployment stack for feature branches, deleting those stacks when the branch is deleted, and a multi-phase deployment for production which includes integration tests. Unfortunately this pipeline defaults to python so we have to update to node.js or whatever platform you prefer. Also, it assumes all feature branches are prefixed with feature so we need to modify the template if we are not following this convention.

Pipelines for AWS Lambda – Part 2: The Code

TL/DR;

One of the great things about AWS Lambda is that you can write your code and deploy without worrying about the hosting environment (kind of). So let’s talk about what that code should look like so you really don’t have to worry.

Background

As I mentioned in my previous post, the AWS Serverless Application Model (SAM), has made (some) things better about developing serverless functions in AWS Lambda. We are going to create a fairly basic Hello World API. The code itself is relatively simple but Lambda only works when deployed with all of the correct resources and permissions linked correctly. Using SAM, we will deploy the Lambda function and an API gateway. The resources and permissions for this initial implementation are pretty simple, but there are still mistakes that can be made so I’ll walk through the troubleshooting steps.

Creating the Lambda Code

Before we talk about deployment, we need to have some code to deploy. To make sure we capture all of the things we need for our function to work, we are just going to scaffold a new project using sam init. There is a large collection of starter templates maintained by AWS and SAM uses this repository to scaffold new projects. Below shows the selections I used to generate a “hello world” project in Node.js:

$ sam init
Which template source would you like to use?
	1 - AWS Quick Start Templates
	2 - Custom Template Location
Choice: 1
What package type would you like to use?
	1 - Zip (artifact is a zip uploaded to S3)	
	2 - Image (artifact is an image uploaded to an ECR image repository)
Package type: 1

Which runtime would you like to use?
	1 - nodejs14.x
	2 - python3.9
	3 - ruby2.7
	4 - go1.x
	5 - java11
	6 - dotnetcore3.1
	7 - nodejs12.x
	8 - nodejs10.x
	9 - python3.8
	10 - python3.7
	11 - python3.6
	12 - python2.7
	13 - ruby2.5
	14 - java8.al2
	15 - java8
	16 - dotnetcore2.1
Runtime: 1

Project name [sam-app]: sam-test-node

Cloning from https://github.com/aws/aws-sam-cli-app-templates

AWS quick start application templates:
	1 - Hello World Example
	2 - Step Functions Sample App (Stock Trader)
	3 - Quick Start: From Scratch
	4 - Quick Start: Scheduled Events
	5 - Quick Start: S3
	6 - Quick Start: SNS
	7 - Quick Start: SQS
	8 - Quick Start: Web Backend
Template selection: 1

    -----------------------
    Generating application:
    -----------------------
    Name: sam-test-node
    Runtime: nodejs14.x
    Dependency Manager: npm
    Application Template: hello-world
    Output Directory: .
    
    Next steps can be found in the README file at ./sam-test-node/README.md

You can view the template code in GitHub to se what is created. Let’s walk through each file.

Function Code

In the hello-world folder, you will find app.js. This file contains all of the code required for the function. There is some commented out code that requires axios for making a simple HTTP call, but the active code does not have any dependencies so if you simply upload this code into a new Lambda function and test it via the AWS console, you will get a simple output message looking like this:

{ "message": "hello world" }

The full code for the function is below:

// const axios = require('axios')
// const url = 'http://checkip.amazonaws.com/';
let response;

/**
 *
 * Event doc: https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-proxy-integrations.html#api-gateway-simple-proxy-for-lambda-input-format
 * @param {Object} event - API Gateway Lambda Proxy Input Format
 *
 * Context doc: https://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-context.html 
 * @param {Object} context
 *
 * Return doc: https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-proxy-integrations.html
 * @returns {Object} object - API Gateway Lambda Proxy Output Format
 * 
 */
exports.lambdaHandler = async (event, context) => {
    try {
        // const ret = await axios(url);
        response = {
            'statusCode': 200,
            'body': JSON.stringify({
                message: 'hello world',
                // location: ret.data.trim()
            })
        }
    } catch (err) {
        console.log(err);
        return err;
    }

    return response
};

There isn’t much code here but what is here is very important. First off, the lambdaHandler function is exposed as a static function meaning you do not need to create an instance of a class to invoke the function. This is important because this is how Lambda expects to invoke the handler so when you specify the handler in the Lambda configuration, it must point to a static function.

Also notice that the handler function is marked async. If you do not specify an async function, a third parameter named callback will be passed to your handler and you will need to invoke this callback as shown in the AWS documentation.

Note: The event parameter varies based on the type of invocation and documentation is not as thorough as you would think. If you write the parameter out with console.log(event), you can see the contents in the CloudWatch log for the Lambda.

Note that the error handler in this code returns the any error caught by the Lambda handler. This allows Lambda to log the invocation as an error. If your Lambda returns a valid response with an error statusCode value (ex: 500), it will still be logged as a successful invocation since the Lambda itself did not fail.

SAM Template

The next file generated by sam init is the template.yaml file which is also placed in the root folder. This template is similar to CloudFormation and in fact can contain most CloudFormation syntax. However, SAM provides simplified syntax and linkage for creating serverless applications. Let’s take a look at the file generated when I ran sam init.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  sam-test-node

  Sample SAM Template for sam-test-node
  
# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
Globals:
  Function:
    Timeout: 3

Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
    Properties:
      CodeUri: hello-world/
      Handler: app.lambdaHandler
      Runtime: nodejs14.x
      Events:
        HelloWorld:
          Type: Api # More info about API Event Source: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#api
          Properties:
            Path: /hello
            Method: get

Outputs:
  # ServerlessRestApi is an implicit API created out of Events key under Serverless::Function
  # Find out more about other implicit resources you can reference within SAM
  # https://github.com/awslabs/serverless-application-model/blob/master/docs/internals/generated_resources.rst#api
  HelloWorldApi:
    Description: "API Gateway endpoint URL for Prod stage for Hello World function"
    Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/hello/"
  HelloWorldFunction:
    Description: "Hello World Lambda Function ARN"
    Value: !GetAtt HelloWorldFunction.Arn
  HelloWorldFunctionIamRole:
    Description: "Implicit IAM Role created for Hello World function"
    Value: !GetAtt HelloWorldFunctionRole.Arn

Note: We don’t specify a name for our Lambda function or API Gateway. When we deploy using SAM, we provide a stack name that is used for the CloudFormation stack but also carried to other resources for consistent naming. This allows us to identify resources created for testing purposes based off of the branch they were created from.

The first meaningful section of the template is the Globals configuration. This allows you to specify – you guessed it – global information that applies to all resources. In this example, the timeout is set to 3 seconds for all Lambda functions. This just happens to be the default, but you can set any default values here you want to apply for all functions. Since we only have one function in this template, we could have just as easily placed this Timeout key in the Properties section of the Lambda function configuration, but it is placed in Globals as an example.

The second important section is the Resources section. Even though there is only one resource specified, SAM will actually create 2 resources: the Lambda function and the API Gateway. The deployment process will also create a third resource: a Lambda Application which will provide information on all of the resources, deployments, and Lambda invocations all in one place.

The first key under Resources is HelloWorldFunction. This is a logical ID that can be used to reference the function in other parts of the template. The Type key specifies that this is a Lambda function, and the Properties key contains all of the configuration for the function (see the AWS documentation for more options for configuring a function). The CodeUri key is optional and defines the base path for your code and, as I mentioned before, the Handler key in the points to the static function in your Lambda code. If you define multiple Lambda Functions in one template and all of your code is in a folder such as src or bin, you can define the CodeUri in the Globals section and have it apply to all of your functions. Otherwise, you can simply include the path in the Handler key like hello-world/app.lambdaHandler and remove the CodeUri key. The Runtime key allows you to specify a specific version of your runtime. I’m using Node.js version 14 in this series of posts, but you can find the list of supported runtimes in the AWS documentation.

The Events section under the Lambda function is where the most significant SAM magic happens. Once again we provide a logical key (HellowWorld) and then give it a Type value (Api) and then we can configure the resource with Properties. In this example we set the Path of the API to /hello and the Method to get. Under the hood, there is a lot more going on here. SAM does all of this based on just these 2 entries:

  • Creates an API Gateway
  • Creates a Route with path /hello and HTTP method GET
  • Creates an integration between the /hello route and the Lambda function
  • Creates a $default deployment stage for the API Gateway

Summary

In this post, we created code using sam init. The two most important files created by this code are the lambda function code and the SAM template (template.yaml). The code generated by SAM is obviously just a placeholder and will require significant editing. The SAM template is very important to how your function is deployed, but we are sticking with the basic Lambda “Hello World” example with a REST API.

Pipelines for AWS Lambda – Part 1: The Deployment Stack

TL/DR;

Serverless applications are complex and AWS doesn’t do much to make setting up pipelines for them easy. I’m going to walk through how to use CloudFormation templates to configure multiple pipelines.

Background

As I posted before, Tutorials are (Often) Bad and AWS tutorials are no exception. AWS has a tendency to use the CLI or console for tutorials. While this is a fine way to learn, you would never want to use these techniques outside of a sandbox environment. For production applications, you want to use a deployment pipeline. To create a pipeline to deploy to AWS, you need to configure a User with the permissions that the pipeline will need. However, you also should control the creation of Identity and Access Management (IAM) resources such as Users. This creates a “chicken and egg” situation: How do you allow your organization to manage creation of IAM resources which are required to create a deployment pipeline when you don’t have a deployment pipeline to help manage this activity? The short answer is CloudFormation.

While I love the concept of serverless applications, the development experience has always been a challenge. With the introduction of the AWS Serverless Application Model (SAM), things definitely got better, but it is still difficult to find good documentation and SAM itself does not always follow what I consider AWS best practices. In this series of posts, I’ll walk through the creation of a simple REST API written in Node.js and hosted in AWS Lambda behind an API Gateway. I will highlight all of the various “gotchas” I stumbled on along the way. To keep things simpler for this example, I’m not going to be using containers to deploy my Lambda function. In this first post, I want to focus on using CloudFormation to set up a the AWS resources required for your pipeline.

Creating the Deployment Stack

So right off the bat, when trying to follow the tutorial on setting up a SAM pipeline, I noticed that the very first step: sam pipeline bootstrap created resources in AWS. Thankfully this command does use CloudFormation. There is no way to specify a stack name, apply tags, etc. so I don’t understand why SAM doesn’t give you the option of just creating the CloudFormation template and then executing on your own. At least you can grab the template from the stack which is what I have done in this gist.

The template creates the following resources:

  • A bucket to store pipeline artifacts (your Lambda code)
  • A policy to allow the pipeline to write to the bucket
  • A bucket to log activity on the artifacts bucket
  • A policy to allow the bucket activity to be logged
  • A role to be assumed when CloudFormation templates are executed
  • A role to be assumed by your pipeline
  • A policy to allow the pipeline to create and execute CloudFormation change sets, and to read and write artifacts to the bucket
  • A user granted the pipeline execution role
  • An access key for the pipeline user
  • A Secrets Manager entry to store the pipeline user credentials

That is a lot of resources and we aren’t even doing anything with Lambda yet. These are simply the resources required to run the pipeline.

I modified the template to remove any container-related resources and added names to most of the resources. You can find this version in this gist.

You can run this template in the AWS console by going to CloudFormation->Stacks, selecting Create stack->With new resources (standard), select “Upload a template file”, select the file saved from the gist. You must provide a stack name, but you can leave the default parameter values or enter your own unique identifier to be used for the resource names.

If you save the template from the gist as deployment-stack.yml, you can create the stack using the AWS CLI as follows:

$ aws cloudformation create-stack \
--stack-name aws-sam-demo \
--capabilities CAPABILITY_AUTO_EXPAND \
--template-body file:///$PWD/deployment-stack.yml

Note you will need to also specify --region if you have not already defined a default region in your local AWS settings.

Adding Secrets

Managing sensitive data can seem more complicated than necessary sometimes. Since we are building a pipeline with GitHub Actions which supports its own Secrets management, it may seem intuitive to use this to store all of your sensitive information. However, you should only use GitHub secrets (or any pipeline-based secure storage) to store information about connecting to AWS and not for information used by AWS. This is because we will be using CloudFormation to deploy to AWS and if you pass sensitive information via either a parameter or environment variable, it will be visible as plain text in the CloudFormation stack configuration.

For secrets that will be controlled by AWS, you can add the secret to the CloudFormation template and just allow AWS to set the value (and potentially rotate the secret). Below is a CloudFormation template that can be used to create a Secrets Manager entry for a password generated by AWS.

AWSTemplateFormatVersion: '2010-09-09'
Parameters:
  SecretId:
      Type: String
      Default: DbSecret
Resources:
  PostgresSecret:
    Type: 'AWS::SecretsManager::Secret'
    Properties:
      Name: !Sub ${SecretId}
      GenerateSecretString:
        GenerateStringKey: "DB_PASSWORD"
        SecretStringTemplate: '{ "DB_USER": "admin" }'
        PasswordLength: 30

This will create a Secret named “DbSecret” in the format shown below:

{
  "DB_USER": "admin",
  "DB_PASSWORD": "[generated password goes here]
}

For secrets that are defined outside of AWS (ex: third-party API keys), you need to just create the Secret and then enter the sensitive values either via the console or CLI. While this manual process may seem problematic, it can be secure as long as you manage who can update the secrets.

Enabling API Gateway Logging

As described in the AWS Documentation, the API Gateway service does not have access to write to CloudWatch logs by default. Thankfully I found this gist:

AWSTemplateFormatVersion: '2010-09-09'
Resources:
  ApiGwAccountConfig:
    Type: "AWS::ApiGateway::Account"
    Properties:
      CloudWatchRoleArn: !GetAtt "ApiGatewayLoggingRole.Arn"
  ApiGatewayLoggingRole:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - "apigateway.amazonaws.com"
            Action: "sts:AssumeRole"
      Path: "/"
      ManagedPolicyArns:
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"

This template only needs to be executed once for any AWS account so you can run this template on its own to enable logging for your API Gateways. Note that you can still control whether logging is enabled for any gateway. This just makes sure the service can write to logs.

Summary

Before you can deploy a Lambda Function using AWS SAM, you need to create resources (primarily IAM resources). You can create these resources with sam pipeline bootstrap, but you won’t have much control over the details of the resources. Therefore, I recommend using a CloudFormation template that matches the template generated by SAM. This same template can be used over and over for multiple stacks.

CodeTender: A Brief History

I was lucky to be involved in building a micro-services platform with Vasil Kovatchev. Vasil is a great software architect and overall great guy. One concept he introduced was the “bartender” script. The idea was to build a working micro-service with a placeholder name that won’t conflict with any code. The first such template was called “Pangalactic Gargleblaster”. The bartender script replaces placeholders in the code (“Pangalactic” and “Gargleblaster”) and does some other stuff to make the service work with the platform. The bartender script is a bash script and the “other stuff” is…well…more bash script. We were quickly serving up not just Pangalactic Gargleblasters, but also Flaming Volcanos and Tequila Sunrises and Whisky Sours. I fell in love with this concept, but I don’t, however, love the bash script since it is very tightly coupled to our ecosystem. Since node.js is a de facto standard for dev utilities (no offense, Python), I set out to build a CLI with these basic requirements:

  • Clone a git repo or copy a folder to a specified folder
  • Display prompts to collect values to replace placeholders with
  • Run scripts before or after replacing placeholders

Since “bartender” is a bit too common of a name, I went with “CodeTender”. That was March of 2018 (according to my commit history). Fast forward a few years and version 0.0.4-alpha was still full of bugs and my somewhat ambitious issue list was collecting dust along with the code. A somewhat unrelated thread with Vasil and another colleague reminded me of CodeTender so I set out to get it working. A few minutes (hours maybe) later, and the basics were working. So of course, time to use it.

I have been playing with Home Assistant lately and wanted to start playing with custom dashboard cards. I found a boilerplate project and decided why just clone the repo when I could get CodeTender working even better and make a Home Assistant custom card template? So after about another week, I have burned down a lot of my issue list and added some cool features. It is still very alpha, but it’s close to ready for prime time.

I will formally launch CodeTender in a future post, but for anyone interested, you can check it out on my GitHub.

Don’t Dispose Your Own EF Connections

I’m working on upgrading a framework to dotnet core so I am moving from .Net 2.x conventions to netstandard 2.2. Our code was using DbContext.Database.Connection to get DB connections for custom SQL. I needed to switch to DbContext.Database.GetDbConnection(). I made the wrong assumption that GetDbConnection() was a factory method and returned a new connection every time. Therefore I made sure I was disposing of each connection. Tests immediately started failing with “System.InvalidOperationException: ‘The ConnectionString property has not been initialized.'” After investing way too much time due to the complexity of the framework and my own stubbornness, I narrowed the issue down to the following scenario:

    using (var conn = context.Database.GetDbConnection())
    {
      conn.Open();
      using (var cmd = conn.CreateCommand())
      {
        cmd.CommandText = "SELECT * FROM sys.databases";
        cmd.ExecuteNonQuery();
      }
    }

    using (var conn = context.Database.GetDbConnection())
    {
      conn.Open();
      using (var cmd = conn.CreateCommand())
      {
        cmd.CommandText = "SELECT * FROM sys.databases";
        cmd.ExecuteNonQuery();
      }
    }

The real issue is the second call to GetDbConnection(). This does not in fact return a new instance, it appears to return the previous connection and the ConnectionString property has been set to an empty string causing the exception about ConnectionString not being initialized. You can test this yourself with the following:

    var conn2 = context.Database.GetDbConnection();
    Console.WriteLine(conn2.ConnectionString);
    conn2.Dispose();
    Console.WriteLine(conn2.ConnectionString);

The fix is to simply not dispose of your connections or commands. As indicated in this issue comment, disposing of the context will dispose of any connections created using GetDbConnection(). Therefore the correct implementation of this use case is as follows:

  using (var context = new MyContext())
  {
    var conn = context.Database.GetDbConnection();
    conn.Open();
    var cmd = conn.CreateCommand();
    cmd.CommandText = "SELECT * FROM sys.databases";
    cmd.ExecuteNonQuery();
    conn.Close();

    var conn2 = context.Database.GetDbConnection();
    conn2.Open();
    cmd = conn2.CreateCommand();
    cmd.CommandText = "SELECT * FROM sys.databases";
    cmd.ExecuteNonQuery();
  }
horse zebra and pony

Look for Horses, Not Zebras

I have two quotes from old friends that have stuck in my head in certain situations.  The first is, “dancing is like standing still, only faster.”  As a runner, I have adapted that to, “running is like falling down, only slower.”  The second is, “when you hear hoofbeats, think of horses, not zebras.”  As a programmer, I have found there is not much better advice than this second quote.

You would think that the more experience you have, the more likely you are to look for the obvious answer.  I’ve found that the opposite is true – while you are going to make “stupid” mistakes less frequently, the majority of your mistakes are still going to be “stupid”. Unfortunately with great power comes great ego, and the most difficult bugs are caused by the occurrence of something that was “never going to happen”.  I happened to stumble through 3 of these self-induced hair-pulling-out exercises in a matter of days.  To make matters worse, I don’t really code at work anymore.  This was a hobby project that I can only work on for an hour or two each night.

First was the infinite loop (and subsequent stack overflow) – one of the few errors that can’t be captured with standard error logging techniques.  This problem only happened on the production server so instead of doing the obvious and duplicating the data down to dev so I could debug, I decided to add debug logging until I found the problem.  This was the first symptom of my bravado.  My application is a state machine “engine” that runs in a loop. As things happen and the state changes, the loop eventually completes.  I narrowed down the problem to the state where the error occurred but had trouble getting closer than that.  I was convinced that some dark magic was at play and some data anomaly so I spent a few nights adding debugging, publishing, running, repeating.

Finally, I stopped looking for zebras.  Instead of continuing to try to zero in on the line of code through trial and error addition of debug logging, I put 3 lines of code in the most obvious places the problem could occur – the 3 while loops that could get called inside my main loop. That immediately pointed me to the loop in question so then I threw an exception when a condition causing an infinite loop occurred.  It probably took me all of 5 minutes to determine the state that caused the problem.

So next I had to figure out how the code got into that state.  All of the information I needed to solve the rest of the problem, I saw within minutes – and completely ignored.  I saw that the state in question could only come from the database because the key data was never set anywhere by code.  Therefore, the code that allowed the infinite loop condition must have been in the code that loaded the data from the database.  However, rather than looking right where all indications pointed, I looked for zebras.  Since I knew how to find the problem, I could now duplicate it on my dev box and debug.  That meant I could spend time inspecting variables at breakpoints to see why the problem was happening.  After all, this infinite loop was caused by a recursive linked-list relationship – something that was “never going to happen”.  There couldn’t have been an obvious reason for something like this to happen so I had dig for a needle in the haystack instead of following the thread attached to the needle.  In spite of myself, I did eventually figure out that the problem could absolutely be solved when the data was loaded into the engine.  Technically this is after the data is extracted from the database, but the recursive situation doesn’t matter until it gets to the engine so that’s where I put the fix to truncate the recursive linked list since the duplicate reference shouldn’t have been there anyway.

Fast forward a few days.  With my problem solved, I was able to run my engine for a few days and then started to look at the output.  Then I stumbled on a new error – “An error occurred while reading from the store provider’s data reader. See the inner exception for details” and the inner exception was “Arithmetic overflow error for data type tinyint, value = 2288”.  So, the expert programmer has managed to break the Entity Framework.  Of course I checked the “obvious” problem – that my model had the wrong data type.  Then I went off the deep end.  Unfortunately, there are people who have had legitimate problems with EF or linq-to-SQL mapping parameters incorrectly (2288 was a parameter, not a database value).  I converted my linq expression into a SQL command and then realized I was once again looking for zebras.  Instead of looking at the columns in my database that would map to tinyint, I assumed the problem was in the Entity Framework – code written by Microsoft and used by thousands if not millions of people – and not in MY code.  When I stepped through my code and looked at the parameters, I noticed there was a parameter that should’t have even been there.  That was my problem – I passed the parameters into my method in the wrong order!  Instead of passing “int, null, int, true” I was passing “int, int, null, true”.  The 2nd and 3rd parameters were both nullable ints so as far as the compiler was concerned this was all good.  Unfortunately it meant it passed my very large primary key into a parameter that was expecting a much smaller value – something that should have been very obvious from the error.

This post has been sitting in draft for so long that I have now had several more “events”.  The most recent was while doing a demo on AngularJS directives with my team at work.  I was trying to illustrate scope isolation and no matter what I tried I couldn’t figure out why the attribute from the directive specified in my HTML wasn’t making its way to my link function in the directive code.  When I turned off scope isolation and pointed at the parent scope, everything worked fine.  Even my more Angular-savvy team members were stumped.  Then came the “duh” moment – I actually had 2 directives nested inside each other.  I was looking at the code in the inner directive and passing data into the outer directive.  The HTML of the inner directive wasn’t passing the value from the parent.  Yet another zebra chase.

So in conclusion, drop the attitude!  When you have problems, they are most likely the most basic and amateur issues.  Reign in those horses and let the zebras smack you in the head, don’t go looking for them.

UPDATE: 29 January 2019

I have done it again! I was working on a new framework feature and couldn’t get my test to work. I just KNEW it was a problem with the third-party software I was using. More specifically, I wasn’t sure what I was trying to do was even supported. Then somehow I managed to get everything to work directly with said software. After days of struggling, I discovered I was writing the final portion of my test immediately after testing deleting the exact thing I am trying to test. As usual, my code was doing EXACTLY what I told it to do.

On a side note, since the original post, I was introduced to the concept of rubber duck code reviews. I highly recommend this and your coworkers will thank me.