CloudFormation Custom Resources With AWS Lambda
I've recently gotten my hands dirty with AWS CloudFormation Custom Resources. There was a very basic implementation in a CloudFormation[...]

I've recently gotten my hands dirty with AWS CloudFormation Custom Resources. There was a very basic implementation in a CloudFormation stack that I was working on and when it was changed in production, it didn't work the way I expected it to. In the middle of a production rollout, I had to wait 1 hour for CloudFormation to timeout before fixing some code on the fly and retrying, which was very frustrating. The main use case for this CloudFormation was for a B2B Whitelabel solution where DNS could be driven from configuration files. This blog post will talk you through what I would do differently this time around.
Table of Contents
What is a CustomResource?
AWS CloudFormation is Amazon's declarative resource provisioning and management tool. You define your resources in either JSON or YAML files, then interact with CloudFormation API's to declaratively enforce state. It has it's pro's and con's, but one con is that not every resource is supported by CloudFormation. For this, CustomResources can be used where you define the invocation of a lambda function, which then responds back to CloudFormation with information about the action you invoked. It provides a lot of flexibility for users but as described below, has certain pitfalls that you should avoid
The Problem
DNSConfigCustom:
DeletionPolicy: Retain
UpdateReplacePolicy: Retain
Type: Custom::DNSConfigCustom
Properties:
ServiceToken: !Ref LambdaArn
Action: getDNSConfig
AppName: !Ref AppName
Param1: !Ref Param1
Param2: !Ref Param2
CloudFrontDistribution:
Type: AWS::CloudFront::Distribution
Properties:
DistributionConfig:
Comment: !Sub '${AWS::StackName} ${Environment}'
Enabled: true
Aliases: !GetAtt DNSConfigCustom.CertificateArn
ViewerCertificate:
SslSupportMethod: sni-only
MinimumProtocolVersion: TLSv1.2_2021
AcmCertificateArn: !GetAtt DNSConfigCustom.CertificateArn
Logging:
Bucket: !Sub '${LogsS3Bucket}.s3.amazonaws.com'
Prefix: cloudfront-web
..... The above snippet seems simple enough right? I'm creating a CloudFront distribution and using a CustomResource backed with Lambda to get DNS Configuration & an ACM Cert ARN. Notice anything missing?
ServiceTimeout: 300This one missing line in the CustomResource has lost me hours in waiting periods. The default timeout if Lambda doesn't respond is 1 hour (3600).
Unfortunately, the Lambda backing this custom resource hasn't been coded defensively and has multiple points where it will fail without responding to CloudFormation. There was a library of sorts created in-house to handle custom resources in many different places but unfortunately, wasn't built consistently. There's many places where exceptions are thrown and not caught, leading to CloudFormation getting stuck waiting for a response that's never going to arrive.
The Solution(s)
I did an analysis on the setup and refactored a few things:
- Added the ServiceTimeout to calls to custom resources, Lambda's have a 15 minute timeout anyway (not sure why Cloudformation Landed on 60 minutes given this information). For my call's, I changed the timeout to 5 minutes as there was no heavy calls from what I could see.
- Structured the existing lambda better to handle exceptions and always respond to CloudFormation (Will go through some examples below)
- Structured the existing lambda to handle all CloudFormation Lifecycle events (not just Create/Update, it was missing Delete calls)
The Refactored Lambda Architecture
I broke out logic for the lambda into a few functions and wrapped the overall call stack in a try/catch. This try catch ensured that if an exception was thrown, as part of the handling of that exception, we'd still report back to CloudFormation so that we don't have to fallback on the ServiceTimeout.
lambda_handler
As you can see below, there's only 3 lines in the handler that aren't wrapped in a try/except. These have a very low likelihood of throwing an exception.
The exception clause will always call the send_response function which ensure CloudFormation isn't left hanging
def lambda_handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
"""
Main Lambda handler for CloudFormation custom resource.
Args:
event: CloudFormation event object
context: Lambda context object
Returns:
Response dictionary
"""
logger.info(f"Received event: {json.dumps(event)}")
request_type = event.get('RequestType')
physical_resource_id = event.get('PhysicalResourceId')
try:
# Route to appropriate handler based on request type
if request_type == 'Create':
response_data = handle_create(event, context)
# Generate physical resource ID for new resource
organization_name = event.get('ResourceProperties', {}).get('OrganizationName', 'unknown')
physical_resource_id = f"dns-acm-info-{organization_name}-{context.request_id[:8]}"
elif request_type == 'Update':
response_data = handle_update(event, context)
# Keep existing physical resource ID
if not physical_resource_id:
physical_resource_id = context.log_stream_name
elif request_type == 'Delete':
response_data = handle_delete(event, context)
# Keep existing physical resource ID
if not physical_resource_id:
physical_resource_id = context.log_stream_name
else:
raise ValueError(f"Unsupported request type: {request_type}")
# Send success response
send_response(
event=event,
context=context,
response_status='SUCCESS',
response_data=response_data,
physical_resource_id=physical_resource_id
)
return {
'statusCode': 200,
'body': json.dumps(response_data)
}
except Exception as e:
logger.error(f"Error processing request: {e}", exc_info=True)
# Always send response to CloudFormation to prevent stack from hanging
error_response_data = {
'Error': str(e),
'Message': 'Failed to process custom resource request'
}
send_response(
event=event,
context=context,
response_status='FAILED',
response_data=error_response_data,
physical_resource_id=physical_resource_id or context.log_stream_name,
reason=str(e)
)
# Return error response
return {
'statusCode': 500,
'body': json.dumps(error_response_data)
}
send_response
Having this function handle the CloudFormation response ensures consistency across the lambda invocation
def send_response(event: Dict[str, Any], context: Any, response_status: str,
response_data: Dict[str, Any], physical_resource_id: Optional[str] = None,
reason: Optional[str] = None) -> None:
"""
Send response to CloudFormation pre-signed URL.
Args:
event: CloudFormation event object
context: Lambda context object
response_status: SUCCESS or FAILED
response_data: Data to return to CloudFormation
physical_resource_id: Unique identifier for the custom resource
reason: Reason for failure (if applicable)
"""
response_url = event.get('ResponseURL')
if not response_url:
logger.error("No ResponseURL found in event")
return
# Use provided physical_resource_id or generate from context
if physical_resource_id is None:
physical_resource_id = context.log_stream_name
response_body = {
'Status': response_status,
'Reason': reason or f'See CloudWatch Log Stream: {context.log_stream_name}',
'PhysicalResourceId': physical_resource_id,
'StackId': event.get('StackId'),
'RequestId': event.get('RequestId'),
'LogicalResourceId': event.get('LogicalResourceId'),
'Data': response_data
}
json_response_body = json.dumps(response_body)
logger.info(f"Response body: {json_response_body}")
headers = {
'Content-Type': '',
'Content-Length': str(len(json_response_body))
}
try:
response = http.request(
'PUT',
response_url,
body=json_response_body.encode('utf-8'),
headers=headers
)
logger.info(f"CloudFormation response status: {response.status}")
except Exception as e:
logger.error(f"Failed to send response to CloudFormation: {e}")