Categories
AWS Compute

Lambda endpoint configuration

AWS Lambda is serverless managed service. Lambda code will run without managing any server and provisioning any hardware. This post is not for Lambda feature but to Lambda Endpoint configuration. I am intend to keep that way.

Endpoints are used to connect AWS service programmatically. This connection uses “AWS privatelink“. In case organization does not want to expose VPC to public internet to communicate with AWS Lambda, endpoint comes to rescue. Endpoint is regional service.

Endpoint are two types –

  • Interface
  • Gateway

I am using Endpoint Interface in my design. I have created two lambda function for testing. When endpoint is created and configured, we need to associate subnet with endpoint. Endpoint creates interface in environment and add allocate ip in all subnet where Endpoint is associated with. This interface\ip is been used for lambda invoking. Both lambda function will use same interface ip for communication.

Architecture Diagram image-1

Note : This is one of the usecase of endpoint. Main, usecase for endpoint is sharing lambda function or any other service as Software as service. Clients can share this service across AWS accounts or even across organization.

Above design, we can still implemented by defining VPC interface while creating Lamda function. Drawback here, if you have 10 lambda function to share in the VPC, we need 10 ip from subnet which is overkill. With endpoints you need just 1 ip to invoke all lambda function for given AWS account.

Source Code

Download “lambda-endpoint-connection.yaml” file from below link –

https://github.com/yogeshagrawal11/cloud/tree/master/aws/lambda/endpoint

Prerequisite

Create user with access to run formation template.

Install AWS SDK and configure AWS environment. I am using “us-west-2” region. In case if you want use different region please use appropriate AMI from parameter. (My apologies, I have added all regions ami in parameter)

Open file “lambda-endpoint-connection.yaml” in any text editor and change following ur systemip. default value is ssh is allowed from all instances

Change Default value to your ip/32 format – Image 2

Create and Download Ec2 instance keypair and update keypair name in this field. Download keypair key file same location as cloud formation template.

update correct EC2 keypair name – Image 3

Add correct VPC cidr information. If not, default will be used.

Implementation

Cloud formation template will create following resources(not all resource mentioned in list) –

  • VpcName : VPC for this test
  • Subnet1 : Subnet totally private except to ssh
  • AppInternetGateway : Internet gateway used just to connect my system with EC2 instance.
  • AppSecurityGroup : Allows port no 22 from my system to EC2 and allows all communication within VPC
  • EC2AccessLambdaRole : This role allows EC2 instance to invoke lambda function.
  • LambdaRole : This role allows Lambda function to create log groups in cloud watch to check print output in cloud watch
  • RootInstanceProfile : Instance profile for instance. Uses EC2AccessLambdaRole for assuming permission
  • EC2Instance : Instance to invoke lambda function
  • LambdaFunction : First lambda function
  • SecoundLambdaFunction : Second lambda function
  • LambdaVPCEndpoint : Lambda vpc endpoint

Run following command to validate template is working fine

aws cloudformation validate-template –template-body file://lambda-endpoint-connection.yaml

Create Stack by executing following command –

aws cloudformation create-stack –stack-name lambdaEndpoint –template-body –capabilities CAPABILITY_IAM file://lambda-endpoint-connection.yaml

Stack creation via CLI – Image 4

This will create stack in background it will take couple of minutes. Check your stack is created successful in Events section of cloud formation.

Stack creation events – Image 5

Ensure Stack is created successfully.

Stack creation completed – Image 6

Stack outputs are saved in key value pair. Take a note of Instance publicIP. We need this output for ssh into EC2 instance to check lambda access. Take note of FirstLambdaFunction and SecondLambdaFunction values we need these value to invoke Lambda function.

Stack output – Image 7

Ensure two lambda functions created successfully. Keep a note of both function name. We need function name for invoking from our EC2 instance.

Lambda function – Image 8

VPC – Endpoint configuration is created. EC2 instance internally created via private DNS name. That name is derived as <servicename>.<region_name>.amazonaws.com.

In out case servicename is “lambda”.

Endpoint configuration – Image 9

Endpoint assigns ip address in all allocated subnets. In our case we have assigned VPC to just one subnet so that it assigns single ip address. IP 10.1.1.78 is part of subnet where endpoint is associated with.

Endpoint Subnet association – Image 10

Assign security group to endpoint. In case need to stop access for any EC2 instances to access lambda function. This security group can be used for security reason. We can use Iam policy as well to restrict access from invoking Lambda function.

Endpoint security group configuration – Image 11

Policy definition. Full access allows any user or service to access lambda function. I highly recommend to restrict access from any services or EC2 instance via Endpoint policy and security group.

Endpoint policy – Image 12

Endpoint creates network interface in VPC environment. IP is assigned to this network interface.

Network interface – Image 13

Subnet ip count also shows ip counts is reduced for /24 masked subnet.

VPC subnet configuration – Image 14

Routed table has route with internet gateway to connect my system via ssh.

Subnet route table configuration – Image 15

Security group only allows access to port 22 from entire world and all ports are open within VPC communication for inbound and outbound traffic.

Security group inbound rule – Image 16
Security group outbound rule – Image 17

Login to newly created instance use same keypair that created during pre-req phase –

SSH to system – image 18

Configure AWS with region “us-west-2” or select any region you may like to select.

Configure AWS using us-west-2 – Image 19

To check list of functions using “aws lambda list-functions

Lambda function – Image 20

To invoke function use following command. We dont have access to any external https connection but we are still able to access lambda function.

  • aws lambda invoke –function-name <first_function_name> first_response.json
  • aws lambda invoke –function-name <second_function_name> second_response.json

Output from lambda function is saved into json format.

Invoke lambda function – Image 21

Reading output files. “body” key matches output from lambda function.

Output from lambda function – Image 22

Cloud watch events. If payload is defined while invoking function, this will be visible in cloud watch event as well.

Output from Cloud watch events – Image 23

Clean-up

Delete stack to cleanup. Enter following command –

  • aws cloudformation delete-stack –stack-name lambdaEndpoint

Conclusion

Lambda Endpoint is new feature and connects lambda via AWS privatelinks via AWS internal network. Again, security is elevated as no need to open your VPC to external traffic for lambda execution. Great way to use Lambda for function as service or using Lambda across multiple AWS accounts across organization.

Enjoy !!! Keep Building !!!!

Categories
AWS Networking

AWS Transit Gateway

Networking is a big challenge with growing demands on diversified environments and creating datacenter across the world. Limit is just imagination. Enterprises works around different sites, different geography but common vein that join those environments are Network. With growing demands, it’s getting complicated to manage routes between sites. AWS Transit Gateway(TGW) is born to make Network Engineers life easy. TGW helps with following features –

  • Connect multiple VPC network environment together for given account
  • Connect multiple VPC network across multiple AWS account
  • Inter region connectivity across multiple VPC
  • Connect on-premise datacenter with VPC network via VPN or
  • Connect multiple cloud environment via VPN using BGP.

Benefits of Transit Gateway

Easy Connectivity : AWS Transit Gateway is cloud router and help easy deployment of network. Routes can\will be easily propagated into environment after adding new network to TGW.

Better visibility and control : AWS Transit Gateway Network Manager used to monitor Amazon VPC’s and edge locations from central location. This will helps to identify and react on network issue quickly.

Flexible multicast : TGW supports multicast. Multicast helps sending same content to multiple destinations.

Better Security: Amazon VPC and TGW traffic always remain on Amazon private environment. Data is encrypted. Data is also protected for common network exploits.

Inter-region peering : AWS Transit Gateway inter-region peering allows customers to route traffic across AWS Regions using the AWS global network. Inter-region peering provides a simple and cost-effective way to share resources between AWS Regions or replicate data for geographic redundancy.

Transit Gateway Components

There are 4 major components in transit gateway –

Attachments : Attach network component to gateway. Attachments will be added to single route table. Following are network devices can be connected to TGW –

  • Amazon VPC
  • An AWS Direct Connect gateway
  • Peering connection with another transit gateway
  • VPN connection to on-prem or multi-cloud network

Transit gateway route table : Default route table will be created. TGW can have multiple route table. Route table defines boundary for connection. Attachments will be added to route table. Given route table can have multiple attachment where as attachment can only be added to single route table.

Route table includes dynamic and static routes. It determine next stop by given destination ip.

Association : To attach your attachment to route table we use association. Each attachment is associated with single route table but route table can have multiple attachments.

Route Propagation : All VPC and VPN associated to route table can dynamically propagate routes to route table. If VPN configured with BGP protocol then routes from VPN network can automatically propagated to transit gateway. For VPC one must create static routes to send traffic to transit gateway. Peering attachments does not dynamically added routes to route table so we need to add static routes.

Architecture Design

We are going to test following TGW scenarios. In this architecture design I am creating “management VPC” that will be shared for entire organization. This VPC can be used for Active Directory, DNS, DHCP or NTP like common services for organization.

Project_VPC1 and Project_VPC2 will be able communicate with each other and managent_vpc. Private_VPC is isolated network(private project) and will not be able to communicate with project VPC’s, but should be able to communicate with management_vpc.

Following is architecture for this design –

Design document – Image 1

Pre-requisites

  • Region name
  • Ami ID – “ami id” depend upon region
  • Instance Role. We dont need this one explicitly, as we are not accessing any services or environment from instances.
  • Instance key pair : Create instance key pair. Add key pair name in parameterstore. Parameter name should be – “ec2-keypair”. Value should be name of your keypair name.

Source code

https://github.com/yogeshagrawal11/cloud/tree/master/aws/Network/Transit%20Gateway

Cost

This implementation has cost associated with it. With attached configuration testing is done within 1 hr then it will not take more than 50 cents.

For latest charges, look for AWS price calculator

Implementation

First time to Terraform check this blog to get started –

https://cloudtechsavvy.com/2020/09/20/terraform-initial-setup/

Run following command to start terraform –

  • ./terraform init
  • ./terraform plan
  • ./terraform apply –auto-approve

I am using terraform for implementation. Following is output for terraform –

Total 47(not 45) devices are configured.

4 VPC created

4 subnet created if you observe available ips are 1 less because one of the ip in a subnet will be used by transit gateway for data transfer and routing.

4 Route table created. Each route table will use transit gateway as target for other VPC network.

Security group – These are most important configuration configuration in real world. For DNS you will allow port 53 or AD server open appropriate ports. In my case, I am using ping for checking communication.

Private VPC will only able to communicate with management network.

Project VPC will able to communicate with other VPC where as not able to communicate with Private VPC.

Note : We don’t have to explicitly, block network in project VPC this should be blocked by transit gateway as we are not going to add propagation.

Transit Gateway created. Remember if 64512 ASN is used by existing VPN then this can be added as parameter to change it.

DNS support enables help to reach out cloud with dns names rather than ip address ,certainly a useful feature.

Transit gateway can be shared with other transit gateway for inter regional data transfer for VPC’s over Amazon private network. Its advisable to “disable” auto accept shared for security reason.

Default route table is created and all VPC not explicitly attached will be attached to default route table.

Each VPN needed to add to transit gateway as attachment.

Each route table is created. Route table can be created as per segregation one needed into environment. In my case I am creating 3 route table for 4 VPC’s. Generally in Enterprise environment, we do create 5 route table. Separate route table to backup and security environment.

Since project VPC1 and project VPC2 should have same network requirement so I added them to same route table.

Management Route table

Management route table has management VPC attachment. Propagation added from all network which needed to communicate with management VPC. In this case, management VPC should be able to communicate with all other network so added propagation from all networks. This will add all routes propagated automatically.

Private Network Route table

Private VPC attached to private network route table. Private network should be able to communicated with management VPC so added propagation for management VPC. Also route for management VPC is added automatically after propagation.

Project Route table

Project route table do have attachment from both project VPC. Propagation added for other project VPC network and management network. Respective routes are added.

Testing Environment

Management server is able to ping both private and project environment instance.

Project VPC can talk to management VPC and other project VPC’s but not with private VPC

Private VPC able to talk to management vpc but not able to communicated with any project VPC’s. That makes private VPC private within organization.

Delete terraform configuration

To delete terrform configuration. Ensure all resources are destroyed

./terraform destroy –auto-approve

Conclusion

Transit gateway is tool to connect multiple VPC, VPN and direct connect network to make communication over private network. Transit gateway can be used to isolate network traffic. This makes routing comparatively easy.

SD-WAN partner solution can be used to automate adding new remote site into AWS network.

References

https://docs.aws.amazon.com/index.html

Categories
Featured GCP - Networking Google Cloud

Multi-Cloud site-to-site Network Connectivity

Multi-cloud Architecture is a smarter way to utilize public, private and hybrid environment. All enterprises wanted to have an option to choose multiple cloud providers for their usecases. Multi-cloud is now a days very popular for Enterprise and mid level companies. Following are benefits and considerations for selecting any multi-cloud environment.

Redundancy: Having more than one cloud provider, helps in redundancy. In case, if particular region for given cloud provider fails or any service fails we can configure redundancy by adding another cloud provider.

Scalability: This point may be not be that important but definitely worth to consider. Sometime, its lengthy process to increase resource limit for cloud account that can be safeguarded via having multiple cloud provider.

Cost: Cost can also be viewed at competition. Some of the services are cheaper on one cloud environment some or on another. This helps determine cheapest solution for enterprise.

Features: This is prime reason for multi-cloud environment. Having multicloud environment gives you flexibility to choose best suitable environment as per application needed than just to choose whatever is available at the time.

Customer Lock-in: Some of vendors have lock-in period for specific service. Enterprises always\rather mostly wish to avoid this lock-in time. Having multiple cloud we have more option on choosing correct vendor.

Nearest termination point/Customer Reach : Use of regional cloud provider will help enterprise to be near datacenter or user. This will improve performance and reduce latency issues. On top of that, each cloud provider’s global reach is different. So implement appropriate cloud provider whose reach is better for end user.

This procedure can be implemented for any vpn connection with BGP protocol. I am using Dynamic routing but static routing can be used as well. Below is architecture diagram for my VPN connectivity.

Architecture Diagram – Image 1

Pre-Requisites

Download Terraform software version

  • Terraform v0.12.26
  • provider.aws v2.70.0
  • provider.google v3.0.0-beta.1

AWS and Google account should be configured for terraform access.

I am using “us-west-2” region for AWS, “us-west1” region for google. If you are planning to use different region select appropriate instance image id and update image id

Create EC2 instance keypair and add keypairname inforamtion into parameterstore.

Change BGP IPs. I am using default one, these IPs should work in case those are not used for your existing environment.

Source Code

Please download all files from below location:

https://github.com/yogeshagrawal11/cloud/tree/master/google/Network

Implementation

Follow my Terraform initial setup guide in case new for terrform.

https://cloudtechsavvy.com/2020/09/20/terraform-initial-setup/

  • Download “aws_vpn.tf” and “google_vpn”
  • Run “./terraform init” to initialize Terraform setup
  • Run “./terraform plan” to verify connectivity to cloud and check for error
  • Execute “./terraform apply –auto-approve” to start implementation

Output for terraform. Take a note of IP address which will be used in later part for configuration.

IP address – Image 2.

IPSec Sharekey

I am using AWS parameter store to store password for VPN tunnels. Two parameters will be used. I am not encrypting this key but its advisable to encrypt those key as per security best practice.

AWS vpn shared key – image 3

Take a note of value from both “vpn_sharedkey_aws_to_gcp_tunnel1” and “vpn_sharedkey_aws_to_gcp_tunnel2” parameter. These values will be used while creating IPSec tunnel.

AWS VPN shared key value – Image 4

AWS & GCP network configuration

VPC with CIDR 10.1.0.0/16 created. Route table attached.

AWS VPC config – Image 5

GCP VPC configuration. Subnet is attached.

GCP VPC configuration – Image 6

GCP firewall allows traffic from AWS subnet to GCP subnet

GCP Firewall Allowing only ip from AWS subnet for icmp and SSH – image 7

Customer Gateway should have ip address from GCP VPN gateway(not AWS VPN gateway address). This ASN should be with range from 64496 till 65534. ASN used in AWS customer gateway is managed by GCP

AWS Customer Gateway Config = GCP VPN Gateway config

AWS Customer Gateway – Image 8

GCP VPN gateway ip information is matching with customer gateway. Forwarding rules are mandatory for tunnel creation. Terraform will create those rules automatically.

Google VPN gateway image 9

AWS Private Gateway will have next ASN no. Its advisable to use next number. I always follow best practice, to use Odd no for certain provider like GCP and even no for certain provider like AWS. This configuration will also works with on-premise network device in that you will define precedence. All on-premise devices will have lower ASN number and so on.

AWS Virtual private gateway with ASN 65002 – Image 10

Attach your Site-to-site VPN connection with Virtual private gateway and Customer gateway. This will create one vpn connection with Customer Gateway(GCP VPN Gateway) and AWS Virtual private gateway. I am using “ipsec.1” for connection type.

AWS – Site-to-Site-VPN Connection – Image 11

This will also create two tunnel. I am using dynamic routing. BGP protocols have limitation for 100 subnet that can be exchanged between vpn when this blog is posted. Tunnel information as follows –

AWS tunnels are down due to GCP configuration pending – Image 12
Tunnel IP address issue

Tunnels are configured properly but in down stage because corresponding GCP tunnels are not created. I tried to create those tunnel using Terraform but issue is happening that both AWS and GCP were taking own ip as first ip(169.254.1.9) from 169.254.1.8/30 subnet. And second ip will be allocated as peer ip(169.254.1.10). On contrary, we have AWS ip as first ip and second ip in subnet should be used by GCP cloud router.

Correct BGP IP for GCPs are

  • Tunnel 1 – Cloud router ip 169.254.1.10(second ip in subnet) and BGP peer ip(from AWS) = 169.254.1.9(which is correctly configured)
  • Tunnel 2 – Cloud router ip 169.254.1.14(second ip in subnet) and BGP peer ip(from AWS) = 169.254.1.13(which is correctly configured)

Create Tunnels in GCP

Now create two tunnels in GCP VPN gateway tunnel with following configuration –

  • Remote Peer IP address : 35.161.67.220 Value from Terrform output “aws_tunnel1_public_address
  • IKE Verion = 1
  • IKE pre-shared key = Copy value from “vpn_sharedkey_aws_to_gcp_tunnel1” parameter from AWS parameter store. Note. Do not copy trailing spaces.
  • Cloud Router = gcp-cloud-router
  • BGP session Information –
    • bgp name = bgp1
    • peer ASN = 65002
    • Cloud router BGP IP = 169.254.1.10 value of “aws_tunnel1_inside_gcp_address” from terraform output
    • BGP peer IP = 169.254.1.9 value of “aws_tunnel1_inside_aws_address” from terraform output
GCP BGP session config for tunnel1. – image 13
GCP VPN tunnel1 configuration – image 14

Perform same activity on tunnel-dynamic2 with following details –

  • Remote Peer IP address : 35.163.174.84 Value from Terrform output “aws_tunnel2_public_address
  • IKE Verion = 1
  • IKE pre-shared key = Copy value from “vpn_sharedkey_aws_to_gcp_tunnel2” parameter from AWS parameter store. Note. Do not copy trailing spaces.
  • Cloud Router = gcp-cloud-router
  • BGP session Information –
    • bgp name = bgp2
    • peer ASN = 65002
    • Cloud router BGP IP = 169.254.1.14 value of “aws_tunnel2_inside_gcp_address” from terraform output
    • BGP peer IP = 169.254.1.13 value of “aws_tunnel2_inside_aws_address” from terraform output

upon changing this configuration both tunnel should be ip and running at GCP and AWS environment. Try refreshing page in case status is not changed

GCP – Both tunnels are up – Image 15
AWS – Both Tunnels are up – Image 16

This completed our network connectivity between AWS and GCP environment.

Testing

To test, I am going to login to my AWS instance with my keyname that is defined in “parameterstore”. Use following ip from terraform output

EC2 and Instance IP – Image 17
Login into AWS instance via public ip from Image 17 – Image 18

We have allowed ICMP protocol for ping and “ssh” port from AWS environment to GCP environment so will test try to ping GCP instance’s private ip instance from AWS private ip address.

SSH and ping test to GCP via private ip – Image 19

Voila. I could not login to GCP because I have not copied instance json file to EC2 so that ssh will be correctly authenticated.

GCP instance access on external IP

GCP get Public ip – Image 20

Ping test from AWS EC2 over GCP public instance IP is failed as expected because of two reason, we dont have internet gateway setup on GCP VPC secondly we have not allowed ICMP and ssh in firewall from external world.

GCP test ssh and ICMP with public ip – Image 21

Test is successful.

Deletion of environment

Since we created GCP tunnel separately, we need to delete those tunnel before deleting infrastructure using terraform

Go to GCP > VPN > Cloud VPN Tunnels

Select both newly created tunnels and click “Delete”

GCP – Delete tunnel1 – Image 22

Once tunnel is deleted, Run following command from Terraform environment –

./terraform destroy –auto-approve 

Terraform Delete – Image 23

Make sure all 25 resources are deleted.

Conclusion

Multi cloud is new normal and private network connectivity that everyone wanted to have. I have given example of compute instances but this can be extended with multi level architecture. Try to get best of both world by implementing this solution.

Keep Building…

Categories
AWS Compute Database Featured Machine Learning Storage

Image Processing with Lambda/AWS API Gateway

We are clicking pics every day and the Image datastore industry is spreading its way to our lifestyle. Massive amounts of images are kept on adding every day. In this story, I like to present a tool to search for images of a given object or celebrity like Google images. Don’t get me wrong, this is nowhere near Google images. Google images crawl weblinks. This story just belongs to same object-store.

Architecture

Images will be copied and stored in the S3 bucket. I am using external tools to copy images. This external tool can be anything like S3 CLI or simple AWS. S3 state change event will trigger the Lambda function to perform image recognition analysis. I am performing two types of analysis, general analysis about environment/object, secondly celebrity analysis. Once the analysis is performed, data will be stored in Dynamodb. Dynamodb is using “keyname” from S3 as the primary key for the images. All labels generated from image recognition will be stored as an attribute in the newly assigned item. 

API gateway will be used to search for any images containing any value or celebrity. That will trigger the Lambda function will generate a pre-signed URL for each image and deliver it to the client. This pre-signed URL will expire in 10mins if the user will not download those images.

Download code 

Feel free to download code from my GitHub repo.

https://github.com/yogeshagrawal11/cloud/tree/master/aws/Image%20recognition

Prerequisites

Parameters to be added in parameter store — 

  • myregion = Region name where all environment is setup. Multiregion setup needed configuration change with Load balancer
  • imagedb = Dynamodb table name 
    • Create dynamodb table with the primary key as string and name of primary key attribute “s3key”. “s3key” attribute will store image S3 keyname.
  • s3bucket = S3 bucketname 
    • Create S3 bucket named specified in parameter store. Create “/image” folder where all images to be copied. 

Implementation

IAM Role

Create Two IAM roles. First IAM role used with Readwrite access to Dynamodb, Log steam, image recognition and S3 access. Function(image_process_function.py) will assigned this role. Policy information as below. I am using AWS managed policy for simplicity but ensure to use the appropriate role of minimum access. Use following AWS managed policy — 

  • AmazonS3FullAccess
  • AmazonDynamoDBFullAccess
  • AmazonRecognitionFullAccess
  • AWSLambdaBasicFullAccess
  • AmazonSSMReadOnlyAccess

Second role use for second lambda function (search_images_from_db_function.py) used read DynamoDB database for correct images and keyname. Following are AWS managed policy should be added into role 

  • AmazonDynamoDBReadOnlyAccess
  • AmazonS3readOnlyAccess
  • AmazonSSMReadOnlyAccess
  • AWSLambdaBasicFullAccess

Dynamodb Table

Create empty Dynamodb table “imagerek” to store all label information into database. Primary key for this database should be “s3key”. If primary key is not named as S3key this solution will not work.

S3key has keyname from S3 images datastore.

Imagerek table

Lambda Function (image_process_function.py)

Image function will get triggered after uploading images to S3 system. Function will perform two image recognition operations. First will verify all object and label all object discovered from image. Python function definition – “rek_labels”.

To Add, delete or update labels from imagerek database.

The second part of this function will check for images for any celebrity present. Python function definition – “rek_celebrities”.

Upon gathering information function will add this information into dynamodb table that has specified in the parameter store. The primary key for this image is “keyname” from S3 bucket. 

Lambda function (search_images_from_db_function.py)

Second Lambda function will be used to search images that input is provided by API gateway. Upon inputs are received, images will be searched for specific keywords in dynamodb database.

Once the file keyname is received same function will create “pre-signed” url for images and send those links back to API gateway as html page.

Image’s pre-signed url will be sent back to as html page that will be displayed by api gateway. In real life scenario, images will be processed and presented by application\web layer.

Images uploaded to S3

Use any technique to upload images to S3 storage. One can copy images to S3 storage via cli, boto sdk, Rest API or any other custom application. Once images are uploaded lambda function will be triggered. Ensure to create “image” folder into S3 bucket and upload all images to folder. Please ensure lambda functions are deployed before images are uploaded to S3 bucket.

API Gateway

An idea if this design mainly centered around solution designing than developing an application. So I am using API gateway to send inputs to the Lambda function. Currently, the application does not support for multiple inputs but certainly can be added. After receiving responses from Lambda function, API will display images. 

API gateway configuration

Search for API gateway in AWS console, Click on create api
Choose HTTP API type.
Integrate Lambda function “search_image_funct” with API gateway and select appropriate API name.
Add routes for “/”

Default stage will be used. For better CI/CD process, try using canary method for new version deployment.

We will use default stage.
Review API configuration

Selected url will be used to search for image.

Search link is api url then “?searchfor=” and things to search

<API gateway url>/?searchfor=<things to search>

Search Examples

I am going to search some of the images those are uploaded as testing images.

Searching for Jeff Bezos

Searching for Sundar pichai

Searching for water

Searching for roses

search for universe images

Architecture images
Disclaimer

Images are used for educational purpose. Anyway if its not appropriate to use images, please post comments I will remove it.

Categories
AWS Compute Featured Security & Identity Storage Terraform

EFS and EC2 instance creation using Terraform templating

Automating implementation and reducing time to deploy complex environments is key. In this story, I am planning to get one of the environments that fairly used in the industry to map NFS FS over multiple subnets. This is a very basic configuration but complexity starts when you wanted to use the same template for deploying the entire application in one go.

I am using the Terraform template function to achieve this. I am certainly can use “Ansible” or “Chef” or any other tool but I wanted to make it relatively simple and have things done by just using a single input file.

Architecture Diagram

I am creating a single EFS FS that will be part of a given region and will have a single mount target in that AZ. I am planning to use a maximum of 3 AZ in this document. AZ count can be increased in case needed for more redundancy.

Single instance started in each AZ and mounted newly created EFS using local IP. Internet gateway attached so that my local environment I could be able to access instances to check EFS is working fine.

Parameter store used to get a “keypair” name.

Image for post
Architecture Diagram. Image-1

Source Code

Download source code for this implementation from Github page —

https://github.com/yogeshagrawal11/cloud/tree/master/aws/EFS/MutiAZ%20EFS%20implementation

Download main.tf, terraform.tfvars and user_data_import.tpl file

user_data_import.tpl is user_data template file. You can add or modify any commands you like to execute during boot time. Mainly I am using this file to mount newly created EFS FS automatically on EC2 instance.

New EFS name is part of the input and UNIX mountpoint is also part of the input. If VPC and subnet already created and wanted to use same subnet make sure to add the “data” block in main.tf accordingly and change “EFS” and “instance” block accordingly.

Image for post

Please change localip parameter to your own domain subnet ip from where you need ssh access to each EC2 instance. Do not use default 0.0.0.0/0 which opens port 22 for all world.

Image for post

Execute Terraform job

To execute terraform job please download terraform file and entier following commands.

aws configure

terraform init

terraform plan

terraform apply

Please review terraform documentation for more information. You can send your questions as well.

This job will create total of 32 resources. Const be very minimum if you will use the attached configuration and upon testing perform the cleanup task.

Image for post

Output “efsip” are EFS IP for each Availability Zone. Since I am working on the first 3 availability zone, I did assign 3 IP for inter AZ communication. “instance_public_ip(typo)” is an IP address for each instance that I created in given AZ. I will use this public ip to connect to each EC2 instance.

Verify FS is mounted successfully. Each instance used its own EFS IP from AZ to connect. EFS is mounted successfully.

Image for post

Perform Read/Write test from each instance. I am creating new file from one of the instance and the file is visible from other two instances.

Image for post
Image for post

Tags are added as per EFS FS in case needed for local scripting purposes.

Image for post

Elastic Filesystem Configuration

EFS fs is created with 3 mount point

Image for post

Access point to used mount FS as “/” this can be easily changed as per need.

Image for post

FS is part of 3 Availability zone and each availability zone has a different IP address.

Image for post

Clean up

To cleanup enter following command

terraform destroy

Categories
AWS Compute

AWS EC2 instance management — Starting and Storage instances using tagging

Perform shutting and starting of instances during non-business hours to reduce operating cost.

Architecture Diagram

AWS Lambda function will be called using CloudWatch rule at regular intervals every hour. Lambda function will review tags of each instance and verify weather instance needed shutdown or startup depend upon local time. Lambda code can be placed on S3 for safekeeping to import from.

New Log stream and group created to keep track of shutdown and startup events. Amazon SNS topics can be used for event notification. A custom role is created to get EC2 instance information and write logs to the CloudWatch event.

Instance tagging will be used to determine the instance needed to shut down or startup.

Image for post
Architecture Diag. Image 1

Tagging format

Key name: “shutdown”

Values format :

weekday:<hours to shutdown separated by hours>;weekend: <hours to shutdown separated by hours>;

weekday: Monday to Friday

weekend: Saturday and Sunday

This format can be changed but code change needed for that.

PS. Keywords include colon(:) and semicolon(;)

Role Policy

Create Role policy with the following permissions.

Image for post
image 2

Create a role and attach it to the newly created policy.

Image for post
image 3

Lambda Function

Create a Lambda function with runtime python3.8. A select newly created role for the Lambda function.

Image for post
image 4

Download Archive.zip file code from my GitHub below link –

https://github.com/yogeshagrawal11/cloud/tree/master/aws/EC2/instance_shutdown_script

To add function code. Click on Action and upload a zip file.

Image for post
image 5

After uploading a zip file you would see “lambda_function.py” is an actual script for instance. I am using the default handler in the script. In case, handler information is changed please make appropriate changes either in function name or file name.

Filename: lambda_function.py

Python function name: lambda_handler

Image for post
image 6

I am using “pytz” module to get the correct timezone conversion so needed to upload that module code.

For regions to timezone mapping, I have created a dictionary. In case your region is not part of the list feel free to add a new key with region name and respective timezone.

Image for post
image 7

In our magical earth, sunset and sunrise are happening at the same time for different regions. So I am calling function to perform both activities simultaneously. “system_function” will generate a list of all instances that need to stop and start.

Image for post
image 8

In the case of the large instance count, I am using a paginator.

Image for post
image 9

Paginator will reduce the list by filtering tag value. If the “shutdown” tag is not set then the instance will not be part of the list. Using the same function for starting and shutting down instances. Hence, for shutting down instances I am also filtering for running instances. Or else to start instances I am filtering for stopped instances.

Image for post
image 10

Set a timeout value of more than 3sec. Normally it takes me 10 to 15 sec. This value can vary with the environment and no of instances.

Please use the following handler or make any appropriate changes.

Lambda handler = lambda_function.lambda_handler

Image for post
image 11

Lambda Permissions :

Lambda permission for creating log stream/log group and adding events in cloud watch.

Image for post
image 12

Lambda function also has permission to get instance status, able to read instance tags. Start and stop instances.

Image for post
image 13

Cloud watch Rule

Create a CloudWatch rule that will trigger the Lambda function every start of the hour every day.

Image for post
image 14
Image for post
image 15
Image for post
image 16

Testing

Current time: Fri 4:18 pm PST

In the Oregon region, I have instance1 will be shut down and instance 2 will be started.

Image for post
image 17

In North Virginia region, east 2 and east 3 should be shutdown(+3hr)

Image for post
image 18

Manually run Lambda function

Image for post
image 19

Logwatch events

Image for post
image 20

Oregon instances Started and stopped as expected

Image for post
image 21

North Virginia instances stopped

Image for post
image 22

Mumbai region instance closed as well

Image for post
image 23

Disclaimer :

SNS topic is not triggered but can be added for any stopping and starting instances.

Database can be used to track no of hours saved due to starting and stopping Ec2 instance script

Categories
Analytics AWS

AWS — ETL transformation

Introduction

ETL (Extract, Transform, and Load) data process to copy data from one or more sources into the destination system. Data-warehousing projects combine data from the different source systems or able to read a portion of data from a large set. Data is stored in a flat-file will be converted into Quarriable format at the presentation layer where analytics can be performed.

Meanwhile, AWS glue will be used for transforming data into the requested format.

Image for post
Architecture Design (image-1)

Extract

Data is placed in the S3 bucket as a flat-file with CSV format. These files or files will get transformed by glue.

The input file to test can be download from below link —

Transform

AWS Glue used to fetch data from flat files and create a Glue table using that file. I am working on a very small set of datasets with “US National Park” information. I have updated some of the data with additional fields that will break in normal transformation.

Input data — This has listed all US National Parks and its information

Image for post
image-2

Yellowstone Park has spanned across 3 states so quotes will be used to consider that field as part of a single-column “state”. The comma (“,”) as a separator will cause issues. State value can be just “‘Idaho” rather than “Idado, Montana. Wyoming”. So we need to transform this data into a meaningful one.

Image for post
image-3

Load

Upon creating a table from a flat-file. AWS Athena will be able to fetch data for analytics purposes or load it into a new environment. I am using “AwsDataCatalog” as an Athena data connector for AWS Glue.


Implementing using Terraform

Terraform software tools can be used to deploy infrastructure as code. We can change infrastructure efficiently and repeatedly with changing input parameters.

Following information needed –

· S3 bucket information and folder name where input data is present. As a best practice, I have added my bucket information into the parameter store rather than adding it into code. Make sure parameter “s3bucket” is created with value <s3://bucketname/>

Image for post

· Glue Database and table name

· Table Schema matching input parameter

Provider information. I have intentionally added the “access_key” and “secret_key” parameter and hashed it out to ensure no one should add that information in provider information instead create AWS profile using “aws configure” command. My profile name is “ya”. A profile can be different for different application or implementation.

Image for post

Download source code of Terraform from below location –

https://github.com/yogeshagrawal11/cloud/tree/master/aws/athena/s3_to_athena/

Once source code is downloaded in the same directory as terraforming software. run following command —

aws configure : To configure AWS profile. Make sure appropriate roles are assigned.

terraform init : to initialize terraform

terraform plan : To review plan

terraform apply : To deploy your infrastructure

Run terraforming code to deploy as Infrastructure as code. This code will create two resources, AWS Glue database and AWS Glue Table

Image for post

Implementing using AWS console

AWS Glue

I am not using the crawler here since this document is the first step in the ETL process. The crawler is needed in case input data is not static. We need to create a schedule to run crawler periodically for new data.

Create the AWS Glue database

Image for post
Image for post

Create the AWS Glue table. The location is the S3 input location. Select S3 bucket and folder name where input data is stored. Add Glue table name. Select the appropriate Glue database name.

Image for post
Image for post

Select S3 as a type of source and bucket name by clicking the folder image next to the text box

Image for post

Select classification and Delimiter as per inputs. Avro and Parquet is a new growing data format technique. JSON still widely excepted by many IoT devices worldwide as output technique. Select the appropriate classification and delimiter. I

Image for post

Add all column

Image for post
Image for post
Image for post

Table properties

Since my data do have a double quote, I have to transform that data from multiple fields into a single field using quote Character. The default “serde serialization” library is also changed to openCSVSerde.

Modify table property Add appropriate Serde Serialization and Serde Pameters as mentioned below. Each parameter will de define how each row is parsed. If quote char found in a field, that field will be a single entity until closing quotes. For ex. in case of “Yellowstone” national park.

Image for post
Image for post

Column and Data type can be defined or automatically generated. Defining manually normally does have better results.

Image for post

Athena Magic

Use data source at “AwsDataCatalog” to connect Athena with AWS Glue.

Image for post

Enjoy running queries on data.

Image for post

Most National Park by state.

Image for post

Athena is a good tool to get data from different connector like AWS Document DB, AWS DynamoDB, Redshift, or even Apache HBase. That will be another story.

Conclusion

Amazon glue is a great tool to perform different types of ETL operations. Use a crawler schedule for running and updating data periodically. AWS makes things movement seamless. New jobs can be scheduled via Scala or Python.

Resources :

https://www.terraform.io/intro/index.html

https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html

https://docs.aws.amazon.com/athena/latest/ug/what-is.html

Categories
AWS Compute

AWS Lambda 101

Lambda function Introduction

Lambda function is AWS offering commonly known for Function as a Service. AWS Lambda function help running code without provisioning or managing underline server. Many languages are supported by the Lambda function and the list is keep on growing. As of Jul2020, .Net, Go, Java, Node.js, Ruby, and my personal favorite Python among supported languages. Lambda is developed with high availability in mind and It is capable to scale during burstable requests.

We need to grant access to the Lambda function as per its use. Normally access is granted via the IAM role.

IAM Policy

Create a policy that will be able to create a log group and log stream. This is the basic execution rule required for the lambda function. Without this access, the Lambda function will not able to generate logs. Logging can be used for custom triggering events or tracking\debugging purposes.

Image for post
image-1
Image for post
image-2

IAM Role

The role is created to grant permission for specific tasks. In case, the function needed to access S3 get access to add appropriate policy into a role or create a custom policy. Always grant Least Privilege to function as per AWS security best practice.

Image for post

Select newly created policy

Image for post
image-4

Attach appropriate policy(image-5). Adding role description and tags are good practice in IAM. Click on Create Role.

Image for post
image-5

Lambda function

To create Lambda function, Goto Services and select Lambda.

Click the Lambda function.

Image for post
image-6

We have three options to choose from. Simplest on “Author from scratch”. In this option we will create “Hello world function” will also verify logging is working as per expectation.

“Use a blueprint”: AWS already created lots of useful functions that we can use to get started. Like, returns current status on AWS Batch job or retrieve an object from S3.

“Browse serverless app repository”: This will deploy sample Lambda application from different application repository. We also can use a private repository to pull code from.

Select an appropriate runtime environment. A select role that we have created in image-5.

Image for post
image-7

The designer will guide you on how the Lambda function is triggered. It can be triggered by different events like SNS topics, SQS, or event cloud watch logs. There are multiple different ways to trigger the Lambda function.

The Lambda function can be used for batch-oriented work or scripting purposes, you can use cloud watch rule to trigger cloud function using crontab at scheduled interval.

A destination can be an SQS event or SNS topic or event cloud watch log stream. We can also upload a read file from S3 and upload it to S3 after performing transformation within the Lambda function.

Image for post
image-8
Image for post
image-8a

This Hello world function is very simple, if it’s invoked from webhook it will return status code 200 with the body “Hello from Lamda!” It also writes log into the log stream. “event” and “context” been used to get input values as well as get HTTP context information that invokes the Lambda function.

An environment variable can be used to pass any static parameter to function like incase of downloading a file from S3 bucket, bucket name. Or while writing data into Dynamodb database. Its table name.

For security reasons, do not add your “Access Key” or “Secret key” values as an environment variable. One shall still use an encrypted parameter store for this purpose.

Image for post
image-9

The handler is the most important config parameter in the Lambda function. It has two parts separated by period (.)(image-10). The first part is nothing but a file name and the second part is a function definition that will read when the Lambda function is invoked(image-8a). I kept handler value default but I always recommend giving some meaning full name.

Memory is the amount of memory dedicated to Lambda function this depends upon activity you are performing and can be changed.

Timeout value determines how much time this function runs before times out. If the activity you like to perform will take more than the timeout value specified the Lambda function will be abruptly stopped. So, give some buffer time in the timeout value.

Image for post
image-10

Be default, the Lambda function does not require to be part of any VPC but in case Lambda functions needed to be able to communicate with EC2 instances or on-premise environment for data communication or invoking lambda function from EC2 instance we needed VPC configuration. You can still trigger the Lambda function using AWS SDK with VPC configuration.

It’s very common to store output data generated via Lambda function and store into Elastic Filesystem(EFS) since the Lambda function does not have static storage. We can use temporary ephemeral storage that lasted till the execution of function so EFS can be used to store all output.

Image for post
image-11

Permissions tab allows you to verify actions permitted for Lambda function on a given resource. The dropdown can be used to select on multiple resources. As per the below screenshot, the Lambda function can create Log group, Log stream, and able to put log messages on Amazon cloud watch logs.

Image for post
image-12

We can trigger the lambda function with different triggers. Here, I am creating a test event that will trigger the lambda function. Click on “Configure test events”. We are not sending any input value to function while invoking it. Key-value pair can be used (image-14) to send values to the Lambda function.

Image for post
image-13
Image for post
image-14

Once an event is created click on “Test” to invoke the Lambda function.

Image for post
image-15

The log output is shown below. Click on the “logs” URL to open the Cloud watch log group created by the Lambda function. This Log group will be available.

Image for post
image-16

Lambda event will create either a new Log stream or update into the existing log stream. Logs are put into the Log stream(image-17). Each log stream consists of many logs.

Image for post
image-16
Image for post
image-17

Deleting Lambda function

To delete the lambda function just select the Lambda function click on Actions and Delete.

Image for post
image-18
Image for post
image-19

The Lambda function successfully deleted.

Image for post
image-20

Cloud watch Log group and streams are not deleted by default. You can delete logs from cloud watch or exported into S3 for a cheaper cost.

To delete the log group. Go to Cloud watch select logs -> Log groups and select the appropriate log group names in the below format. Click on Action and delete Log groups.

/aws/lambda/<functionname>

Image for post
image-21
Image for post
image-22

Conclusion

Lambda function is the best way to run short jobs or create a script to run. This can each work with a webhook API or SNS/SQS environment. Have fun exploring multiple lambda function usage.