Categories
AWS Featured Networking

AWS Global Accelerator

Global Accelerator is a fully managed network traffic manager. Global Accelerator is a network layer service in which you create accelerators to improve availability and performance for internet applications used by a global audience. This network layer service is for a global customer using the Amazon network backend. As per Amazon documentation, customers have seen 60% network improvement while using this Global accelerator.

Global Accelerator provides “anycast” public IP from Amazon pool or you can provide one. With this static entry point, you will improve the availability and reliability of your application. Global Accelerator improves performance for all applications over TCP and UDP protocols by proxying packets at the edge. Accelerator ingress traffic to AWS closest to the end-user resulting in better performance.

Global Accelerator helps to create “Robust architecture“. It will help to increase network stability by using the AWS backend network. It provides health checking-based routing.

Traffic routing

Accelerator routes traffic to optimal AWS endpoint based on –

  • Endpoint Health (For Disaster recovery solution)
  • Client Location(Geo fencing)
  • User-configured weights(For active/active or active/passive) applications.

Endpoint Configuration

The accelerator can use a TCP/UDP-based protocol. It will use the following endpoint –

  • Application Load balancer
  • Network Load balancer
  • Elastic IP address

Use Cases

  • Gaming Industry (Use NLB instread of ALB)
  • IoT based data collection
  • Disaster recovery(Fast regional failover)
  • Voice over IP

Architecture for Disaster Recovery

I am using Global Accelerator for Disaster Recovery configuration. The same configuration can be used for Active/Active inter-regional application configuration.

The user will connect to Global Accelerator DNS address. Route 53 can easily be used for custom website links. Create DNS “A” record pointing to AWS Global Accelerator “Anycast” IP address. AWS global accelerator will transfer data user to appropriate Load balancer. In this blog, I am mainly focusing on the usage of AWS global accelerator, After ALB, you can create tiered software architecture. Take a look at my other Three-tier architecture blog for more details.

Source Code

If you like to have source code email me at info@cloudtechsavvy.com

Demonstration

I am using terraform to deploy my infrastructure. Following are the resources that will be deployed using terraform –

  • Working in two different region – “us-west-2(primary)” and “us-east-1(DR/secondary)”
  • Global Acceleretor with two endpoint each targeted to ALB of that region. Traffic distributed equally.
  • Application Load balancer (ALB) one at each region. Target group points to EC2.
  • Two EC2 instance with Apache installed on it. They will act as web/app server for my application. In production environment, auto-scaling will be used for better Elasticity.
  • one VPC per region
  • Two subnet per region in different AZ
  • Security group, Internet gateway, Route table, IAM role for instance profile etc.

Output from Terraform. This will create 46 resources total.

  • application_dns_name : Point this address to route 53 “A” record for customer website link
  • accelerator_static_ip_sets : anycast ip address
  • lb_dns_name : Load balaner IP for region 1
  • lb_dns_name_east : Load balaner IP for region 2
  • region1 : Region 1
  • region2_dr : Region 2
  • vpc_cidr : VPC CIDR Region 1
  • vpc_cidr_dr : VPC CIDR Region 2

I am pointing to “Application_dns_name” link that generated from terraform output. My webserver is connected to “us-west-2” region than “us-east-1” because “us-west-2” region is closed to me.

I am from Seattle so, I will point to “us-west-2b” region by default not “us-east-1”

Both Availability zones are used in nearly round-robin way.

The website will use both availability zone. Now request is coming from a different AZ.

AWS global accelerator instance is created with two static anycast IP address. You can point “Rote 53” to both IP addresses for redundancy.

Configured TCP port 80 listeners. That means this accelerator will listen on port 80. You can use licentner on any port with TCP and UDP protocol.

The endpoint group is per region. I have configured two regions one for “us-east-1” and “us-west-2”. Traffic dial determines how my load is distributed across regions.

For active/active configuration : use weight at same @ 100.

For Disaster recovery or “active/passive” configuration: use the first region as 100% and DR region uses 0%. Then DR region will not accept any traffic until the primary region is down.

Each endpoint is grouped together with different endpoints. One can use NLB, ALB, or Elastic IP endpoint as different endoiint. Weight will be defined as a priority. In my case, I am pointing my request to ALB. Those ALB intern transfer request to EC2 instance for web application. Both group are pointed to its own ALB from a its region.

Disaster Simulation

To simulate disaster in a region, I am deleting my ALB from “us-west-2” region(which is closest to me).

After deleting I am ALB, I am not able to access the website momentarily.

Endpoint configuration detected that “us-west-2” endpoint is unhealthy.

I have constantly, set my ping to “anycast” IP address. I am still able to get a response. Since I cant simulate region failover, I cant test anycast IP availability.

After “Ten Sec“, My website was redirected to the east region. Unfortunately, this could be better but I have not tested very well.

RPO = 10 sec

RTO will be different for different customers.

Both AZs are taking part into website service. In case of disaster recovery, you may just need one AZ.

Fallback

To simulate AWS region recovery, I am running my terraform environment again. This will create ALB and request listeners.

Accelerator was able to detect that “us-west-2” region Endpoint become healthy again.

My new traffic is diverted back to “us-west-2” region

Both DR Region failover and fallback worked successfully.

Conclusion

I am very impressed with the way the global accelerator worked. The multi-region complication was reduced. Failover and Failback were seamless. If you have a multi-region database like “AWS Aurora” this will be a very great option to reduce “RPO” and increase redundancy. This will definitely increase the user experience.

The next Step, for me to do testing on performance and user experience features for AWS Global Accelerator.

Categories
Analytics AWS miscellaneous

CSV, JSON conversion into Parquet

In this world of data explosion, every company working on consolidate data into common data format. Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

Parquet is built from the ground up with complex nested data structures in mind, and uses the record shredding and assembly algorithm described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spaces.

Parquet is built to support very efficient compression and encoding schemes. Multiple projects have demonstrated the performance impact of applying the right compression and encoding scheme to the data. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented.

Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data processing frameworks, and we are not interested in playing favorites. We believe that an efficient, well-implemented columnar storage substrate should be useful to all frameworks without the cost of extensive and difficult to set up dependencies.

AWS Glue

Source Data

I am using AWS Glue convert csv and json file to create parquet file. At this time I have some data in csv and some data in json format. CSV Data is stored in AWS S3 into source/movies/csv folder. JSON data is stored in AWS S3 into source/movies/json folder. All files are stored in those locations.

CSV input data

JSON input data

AWS Glue Implementation

Glue Classifier

A classifier reads the data in a data store. If it recognizes the format of the data, it generates a schema. The classifier also returns a certainty number to indicate how certain the format recognition was.

AWS Glue provides a set of built-in classifiers, but you can also create custom classifiers. AWS Glue invokes custom classifiers first, in the order that you specify in your crawler definition. Depending on the results that are returned from custom classifiers, AWS Glue might also invoke built-in classifiers. If a classifier returns certainty=1.0 during processing, it indicates that it’s 100 percent certain that it can create the correct schema. AWS Glue then uses the output of that classifier.

CSV Classifier

I am creating CSV Classifier. Column delimiter is “,” and quote symbols are double-quote. Will also have heading.

JSON Classifier

Create json classifier

AWS Job Studio

CSV file reading job. Point source location to S3 location where csv folder is located

Please change long datatype to integer datatype.

Enter location where your parquet file needed to be store.

Create IAM role with following permissions –

  • S3 – read, list and write permission
  • Cloud watch – Log group and log stream creation as well as log insert permission
  • Glue – Service role

Create json job and difference source. Target should be same folder.

Now Parquet file is generated and saved at below location.

Create Athena table to access parquet file and list your records.

PS. After carefully looking my Athena query does not support long integer value from my parquet that needed to be fixed.

After fixing that I am able to get integer information.

Conclusion

Parquet file is comparatively faster than csv and json due to its columnar data structure so most data lake in industry started using it. As long as

Categories
AWS Management

Randomizer template for Cloud formation

An old colleague of mine reached out to me for creating a random string within cloud formation. If one has not used it in past it can get tricky. I wish amazon created a function for the same but then how would I have showcased my love to SERVERLESS with this blog. I will be using Lambda function for creating random strings and Cloud formation custom function to call that lambda function.

Template Description

RandomizerTemplate.yaml

The parameter needed for a random string is the length of the string. By default, this template will create 6 char string.

Parameter : Image 1

Lambda function and respective execution role. I have created this lambda function in python. Since this will be created once and used many times in the future don’t need any expertise for python language. Just use this function as is that will suffice usage.

Lambda function to generate parameter string : Image 2

This template will export Lambda function ARN (RandomizerLambdaArn). This Lambda function arn will be used in the custom resource into a service token section.


Image 3

Calling Randomizer in your template (CreateBucketTemplate.yaml)

This is a way to call randomizer function in the template. Copy-paste this function into a template where a random string is needed.

Calling custom resource : Image 4

To get random string use the following commands –

  • RandomizerLambda.RandomString for random string
  • RandomizerLambda.Lower_RandomString for lower case random string
  • RandomizerLambda.Upper_RandomString for upper case random string
  • RandomizerLambda.RandomNumber for numeric string
Use of ramdom string in cloudformation resource : Image 5

Download Randomizer template

Download code from the following GitHub link.

https://github.com/yogeshagrawal11/cloud/tree/master/aws/Cloud%20Formation/Randomizer

  • RandomizerTemplate.yaml : Template to create a Randomizer lambda function.
  • CreateBucketTemplate.yaml : Template to create a S3 bucket using Random string.

Prereq

Download template to a folder. Please ensure id used for implementing the template do have access to create the following resources –

  • IAM Role
  • IAM Policy
  • Lambda function. Lambda function indeed will create Log group, Log stream for events.
  • A custom resource to call the Lambda function for a random string.

Ensure cloud formation export name is not present in your environment. Export name – “RandomizerLambdaArn”

Implementation

I am working on CLI for implementation. This template can easily be deployed from the AWS console as well.

Download AWS SDK and configure it. Make sure to have proper access. Run the following command to implement the randomizer template.

  • To configure aws SDK environment run –
    • aws configure
  • To validate randomizer template is good
    • aws cloudformation validate-template –template-body “file://RandomizerTemplate.yaml”
Validate Randomizer template – Image 1
  • To install randomizer stack
    • aws cloudformation create-stack –stack-name randomizerStack –disable-rollback –capabilities CAPABILITY_IAM –template-body “file://RandomizerTemplate.yaml”
Creating Randomizer template – Image 2

Ensure Stack is configured successfully

Template status – Image 3

Lambda function is created. Lambda execution role and policy created you can use existing role as well if needed to reduce role count. Lambda function will create AWS Cloudwatch loggroup and logstream for Lambda function metrics and output information. This is very useful. One can use this lambda function and parameters like a project, stackname, application name in stack output which can be tracked as well for accounting or analysis purposes.

Randomizerstack has a default input character length as 6 but it can be changed upon request of the stack.

Outputs do have random string-like, alphanumeric character, numeric character or just lower alphabets(used for S3 bucket name)

Validating bucket creation template.

Creating a bucket using a template. I am passing parameter value of 10 is nothing but I needed 10 character string for bucketname

  • aws cloudformation create-stack –stack-name CreateBucketStack –parameters ParameterKey=RandomStringLength,ParameterValue=10 –template-body “file://CreateBucketTemplate.yaml”

A new bucket is created with 10 random string characters.

Creating a new bucket with default 6 char length string.

All 3 stacks are created.

New bucket with default 6 length character.

Default 6 value is assigned to the “RandomStringLength” parameter via RandomizerStack.

Both buckets are created. First bucket created with 10 char string whereas the second one with 6 char.

Clean up

Delete all 3 stacks via cli or GUI

CLI command to delete all 3 stacks

  • aws cloudformation delete-stack –stack-name CreateBucketStack1
  • aws cloudformation delete-stack –stack-name CreateBucketStack
  • aws cloudformation delete-stack –stack-name randomizerStack

Lambda function will create Cloudwatch loggroup and log stream. Delete those log group by going to Cloudwatch -> Log groups -> Select appropriate log group by filtering “randomizer”. Select checkbox. Go to action and click delete.

Conclusion

Use this randomizer template for the need of a randomizer string. Very useful for ami name, autoscaling group name, and S3 bucket names.

PS. Security is not in mind with this blog. The intention is purely to kickstart my builders.

Enjoy !!!!!

Categories
AWS Compute

Lambda endpoint configuration

AWS Lambda is serverless managed service. Lambda code will run without managing any server and provisioning any hardware. This post is not for Lambda feature but to Lambda Endpoint configuration. I am intend to keep that way.

Endpoints are used to connect AWS service programmatically. This connection uses “AWS privatelink“. In case organization does not want to expose VPC to public internet to communicate with AWS Lambda, endpoint comes to rescue. Endpoint is regional service.

Endpoint are two types –

  • Interface
  • Gateway

I am using Endpoint Interface in my design. I have created two lambda function for testing. When endpoint is created and configured, we need to associate subnet with endpoint. Endpoint creates interface in environment and add allocate ip in all subnet where Endpoint is associated with. This interface\ip is been used for lambda invoking. Both lambda function will use same interface ip for communication.

Architecture Diagram image-1

Note : This is one of the usecase of endpoint. Main, usecase for endpoint is sharing lambda function or any other service as Software as service. Clients can share this service across AWS accounts or even across organization.

Above design, we can still implemented by defining VPC interface while creating Lamda function. Drawback here, if you have 10 lambda function to share in the VPC, we need 10 ip from subnet which is overkill. With endpoints you need just 1 ip to invoke all lambda function for given AWS account.

Source Code

Download “lambda-endpoint-connection.yaml” file from below link –

https://github.com/yogeshagrawal11/cloud/tree/master/aws/lambda/endpoint

Prerequisite

Create user with access to run formation template.

Install AWS SDK and configure AWS environment. I am using “us-west-2” region. In case if you want use different region please use appropriate AMI from parameter. (My apologies, I have added all regions ami in parameter)

Open file “lambda-endpoint-connection.yaml” in any text editor and change following ur systemip. default value is ssh is allowed from all instances

Change Default value to your ip/32 format – Image 2

Create and Download Ec2 instance keypair and update keypair name in this field. Download keypair key file same location as cloud formation template.

update correct EC2 keypair name – Image 3

Add correct VPC cidr information. If not, default will be used.

Implementation

Cloud formation template will create following resources(not all resource mentioned in list) –

  • VpcName : VPC for this test
  • Subnet1 : Subnet totally private except to ssh
  • AppInternetGateway : Internet gateway used just to connect my system with EC2 instance.
  • AppSecurityGroup : Allows port no 22 from my system to EC2 and allows all communication within VPC
  • EC2AccessLambdaRole : This role allows EC2 instance to invoke lambda function.
  • LambdaRole : This role allows Lambda function to create log groups in cloud watch to check print output in cloud watch
  • RootInstanceProfile : Instance profile for instance. Uses EC2AccessLambdaRole for assuming permission
  • EC2Instance : Instance to invoke lambda function
  • LambdaFunction : First lambda function
  • SecoundLambdaFunction : Second lambda function
  • LambdaVPCEndpoint : Lambda vpc endpoint

Run following command to validate template is working fine

aws cloudformation validate-template –template-body file://lambda-endpoint-connection.yaml

Create Stack by executing following command –

aws cloudformation create-stack –stack-name lambdaEndpoint –template-body –capabilities CAPABILITY_IAM file://lambda-endpoint-connection.yaml

Stack creation via CLI – Image 4

This will create stack in background it will take couple of minutes. Check your stack is created successful in Events section of cloud formation.

Stack creation events – Image 5

Ensure Stack is created successfully.

Stack creation completed – Image 6

Stack outputs are saved in key value pair. Take a note of Instance publicIP. We need this output for ssh into EC2 instance to check lambda access. Take note of FirstLambdaFunction and SecondLambdaFunction values we need these value to invoke Lambda function.

Stack output – Image 7

Ensure two lambda functions created successfully. Keep a note of both function name. We need function name for invoking from our EC2 instance.

Lambda function – Image 8

VPC – Endpoint configuration is created. EC2 instance internally created via private DNS name. That name is derived as <servicename>.<region_name>.amazonaws.com.

In out case servicename is “lambda”.

Endpoint configuration – Image 9

Endpoint assigns ip address in all allocated subnets. In our case we have assigned VPC to just one subnet so that it assigns single ip address. IP 10.1.1.78 is part of subnet where endpoint is associated with.

Endpoint Subnet association – Image 10

Assign security group to endpoint. In case need to stop access for any EC2 instances to access lambda function. This security group can be used for security reason. We can use Iam policy as well to restrict access from invoking Lambda function.

Endpoint security group configuration – Image 11

Policy definition. Full access allows any user or service to access lambda function. I highly recommend to restrict access from any services or EC2 instance via Endpoint policy and security group.

Endpoint policy – Image 12

Endpoint creates network interface in VPC environment. IP is assigned to this network interface.

Network interface – Image 13

Subnet ip count also shows ip counts is reduced for /24 masked subnet.

VPC subnet configuration – Image 14

Routed table has route with internet gateway to connect my system via ssh.

Subnet route table configuration – Image 15

Security group only allows access to port 22 from entire world and all ports are open within VPC communication for inbound and outbound traffic.

Security group inbound rule – Image 16
Security group outbound rule – Image 17

Login to newly created instance use same keypair that created during pre-req phase –

SSH to system – image 18

Configure AWS with region “us-west-2” or select any region you may like to select.

Configure AWS using us-west-2 – Image 19

To check list of functions using “aws lambda list-functions

Lambda function – Image 20

To invoke function use following command. We dont have access to any external https connection but we are still able to access lambda function.

  • aws lambda invoke –function-name <first_function_name> first_response.json
  • aws lambda invoke –function-name <second_function_name> second_response.json

Output from lambda function is saved into json format.

Invoke lambda function – Image 21

Reading output files. “body” key matches output from lambda function.

Output from lambda function – Image 22

Cloud watch events. If payload is defined while invoking function, this will be visible in cloud watch event as well.

Output from Cloud watch events – Image 23

Clean-up

Delete stack to cleanup. Enter following command –

  • aws cloudformation delete-stack –stack-name lambdaEndpoint

Conclusion

Lambda Endpoint is new feature and connects lambda via AWS privatelinks via AWS internal network. Again, security is elevated as no need to open your VPC to external traffic for lambda execution. Great way to use Lambda for function as service or using Lambda across multiple AWS accounts across organization.

Enjoy !!! Keep Building !!!!

Categories
AWS Networking

AWS Transit Gateway

Networking is a big challenge with growing demands on diversified environments and creating datacenter across the world. Limit is just imagination. Enterprises works around different sites, different geography but common vein that join those environments are Network. With growing demands, it’s getting complicated to manage routes between sites. AWS Transit Gateway(TGW) is born to make Network Engineers life easy. TGW helps with following features –

  • Connect multiple VPC network environment together for given account
  • Connect multiple VPC network across multiple AWS account
  • Inter region connectivity across multiple VPC
  • Connect on-premise datacenter with VPC network via VPN or
  • Connect multiple cloud environment via VPN using BGP.

Benefits of Transit Gateway

Easy Connectivity : AWS Transit Gateway is cloud router and help easy deployment of network. Routes can\will be easily propagated into environment after adding new network to TGW.

Better visibility and control : AWS Transit Gateway Network Manager used to monitor Amazon VPC’s and edge locations from central location. This will helps to identify and react on network issue quickly.

Flexible multicast : TGW supports multicast. Multicast helps sending same content to multiple destinations.

Better Security: Amazon VPC and TGW traffic always remain on Amazon private environment. Data is encrypted. Data is also protected for common network exploits.

Inter-region peering : AWS Transit Gateway inter-region peering allows customers to route traffic across AWS Regions using the AWS global network. Inter-region peering provides a simple and cost-effective way to share resources between AWS Regions or replicate data for geographic redundancy.

Transit Gateway Components

There are 4 major components in transit gateway –

Attachments : Attach network component to gateway. Attachments will be added to single route table. Following are network devices can be connected to TGW –

  • Amazon VPC
  • An AWS Direct Connect gateway
  • Peering connection with another transit gateway
  • VPN connection to on-prem or multi-cloud network

Transit gateway route table : Default route table will be created. TGW can have multiple route table. Route table defines boundary for connection. Attachments will be added to route table. Given route table can have multiple attachment where as attachment can only be added to single route table.

Route table includes dynamic and static routes. It determine next stop by given destination ip.

Association : To attach your attachment to route table we use association. Each attachment is associated with single route table but route table can have multiple attachments.

Route Propagation : All VPC and VPN associated to route table can dynamically propagate routes to route table. If VPN configured with BGP protocol then routes from VPN network can automatically propagated to transit gateway. For VPC one must create static routes to send traffic to transit gateway. Peering attachments does not dynamically added routes to route table so we need to add static routes.

Architecture Design

We are going to test following TGW scenarios. In this architecture design I am creating “management VPC” that will be shared for entire organization. This VPC can be used for Active Directory, DNS, DHCP or NTP like common services for organization.

Project_VPC1 and Project_VPC2 will be able communicate with each other and managent_vpc. Private_VPC is isolated network(private project) and will not be able to communicate with project VPC’s, but should be able to communicate with management_vpc.

Following is architecture for this design –

Design document – Image 1

Pre-requisites

  • Region name
  • Ami ID – “ami id” depend upon region
  • Instance Role. We dont need this one explicitly, as we are not accessing any services or environment from instances.
  • Instance key pair : Create instance key pair. Add key pair name in parameterstore. Parameter name should be – “ec2-keypair”. Value should be name of your keypair name.

Source code

https://github.com/yogeshagrawal11/cloud/tree/master/aws/Network/Transit%20Gateway

Cost

This implementation has cost associated with it. With attached configuration testing is done within 1 hr then it will not take more than 50 cents.

For latest charges, look for AWS price calculator

Implementation

First time to Terraform check this blog to get started –

https://cloudtechsavvy.com/2020/09/20/terraform-initial-setup/

Run following command to start terraform –

  • ./terraform init
  • ./terraform plan
  • ./terraform apply –auto-approve

I am using terraform for implementation. Following is output for terraform –

Total 47(not 45) devices are configured.

4 VPC created

4 subnet created if you observe available ips are 1 less because one of the ip in a subnet will be used by transit gateway for data transfer and routing.

4 Route table created. Each route table will use transit gateway as target for other VPC network.

Security group – These are most important configuration configuration in real world. For DNS you will allow port 53 or AD server open appropriate ports. In my case, I am using ping for checking communication.

Private VPC will only able to communicate with management network.

Project VPC will able to communicate with other VPC where as not able to communicate with Private VPC.

Note : We don’t have to explicitly, block network in project VPC this should be blocked by transit gateway as we are not going to add propagation.

Transit Gateway created. Remember if 64512 ASN is used by existing VPN then this can be added as parameter to change it.

DNS support enables help to reach out cloud with dns names rather than ip address ,certainly a useful feature.

Transit gateway can be shared with other transit gateway for inter regional data transfer for VPC’s over Amazon private network. Its advisable to “disable” auto accept shared for security reason.

Default route table is created and all VPC not explicitly attached will be attached to default route table.

Each VPN needed to add to transit gateway as attachment.

Each route table is created. Route table can be created as per segregation one needed into environment. In my case I am creating 3 route table for 4 VPC’s. Generally in Enterprise environment, we do create 5 route table. Separate route table to backup and security environment.

Since project VPC1 and project VPC2 should have same network requirement so I added them to same route table.

Management Route table

Management route table has management VPC attachment. Propagation added from all network which needed to communicate with management VPC. In this case, management VPC should be able to communicate with all other network so added propagation from all networks. This will add all routes propagated automatically.

Private Network Route table

Private VPC attached to private network route table. Private network should be able to communicated with management VPC so added propagation for management VPC. Also route for management VPC is added automatically after propagation.

Project Route table

Project route table do have attachment from both project VPC. Propagation added for other project VPC network and management network. Respective routes are added.

Testing Environment

Management server is able to ping both private and project environment instance.

Project VPC can talk to management VPC and other project VPC’s but not with private VPC

Private VPC able to talk to management vpc but not able to communicated with any project VPC’s. That makes private VPC private within organization.

Delete terraform configuration

To delete terrform configuration. Ensure all resources are destroyed

./terraform destroy –auto-approve

Conclusion

Transit gateway is tool to connect multiple VPC, VPN and direct connect network to make communication over private network. Transit gateway can be used to isolate network traffic. This makes routing comparatively easy.

SD-WAN partner solution can be used to automate adding new remote site into AWS network.

References

https://docs.aws.amazon.com/index.html

Categories
AWS Compute Database Featured Machine Learning Storage

Image Processing with Lambda/AWS API Gateway

We are clicking pics every day and the Image datastore industry is spreading its way to our lifestyle. Massive amounts of images are kept on adding every day. In this story, I like to present a tool to search for images of a given object or celebrity like Google images. Don’t get me wrong, this is nowhere near Google images. Google images crawl weblinks. This story just belongs to same object-store.

Architecture

Images will be copied and stored in the S3 bucket. I am using external tools to copy images. This external tool can be anything like S3 CLI or simple AWS. S3 state change event will trigger the Lambda function to perform image recognition analysis. I am performing two types of analysis, general analysis about environment/object, secondly celebrity analysis. Once the analysis is performed, data will be stored in Dynamodb. Dynamodb is using “keyname” from S3 as the primary key for the images. All labels generated from image recognition will be stored as an attribute in the newly assigned item. 

API gateway will be used to search for any images containing any value or celebrity. That will trigger the Lambda function will generate a pre-signed URL for each image and deliver it to the client. This pre-signed URL will expire in 10mins if the user will not download those images.

Download code 

Feel free to download code from my GitHub repo.

https://github.com/yogeshagrawal11/cloud/tree/master/aws/Image%20recognition

Prerequisites

Parameters to be added in parameter store — 

  • myregion = Region name where all environment is setup. Multiregion setup needed configuration change with Load balancer
  • imagedb = Dynamodb table name 
    • Create dynamodb table with the primary key as string and name of primary key attribute “s3key”. “s3key” attribute will store image S3 keyname.
  • s3bucket = S3 bucketname 
    • Create S3 bucket named specified in parameter store. Create “/image” folder where all images to be copied. 

Implementation

IAM Role

Create Two IAM roles. First IAM role used with Readwrite access to Dynamodb, Log steam, image recognition and S3 access. Function(image_process_function.py) will assigned this role. Policy information as below. I am using AWS managed policy for simplicity but ensure to use the appropriate role of minimum access. Use following AWS managed policy — 

  • AmazonS3FullAccess
  • AmazonDynamoDBFullAccess
  • AmazonRecognitionFullAccess
  • AWSLambdaBasicFullAccess
  • AmazonSSMReadOnlyAccess

Second role use for second lambda function (search_images_from_db_function.py) used read DynamoDB database for correct images and keyname. Following are AWS managed policy should be added into role 

  • AmazonDynamoDBReadOnlyAccess
  • AmazonS3readOnlyAccess
  • AmazonSSMReadOnlyAccess
  • AWSLambdaBasicFullAccess

Dynamodb Table

Create empty Dynamodb table “imagerek” to store all label information into database. Primary key for this database should be “s3key”. If primary key is not named as S3key this solution will not work.

S3key has keyname from S3 images datastore.

Imagerek table

Lambda Function (image_process_function.py)

Image function will get triggered after uploading images to S3 system. Function will perform two image recognition operations. First will verify all object and label all object discovered from image. Python function definition – “rek_labels”.

To Add, delete or update labels from imagerek database.

The second part of this function will check for images for any celebrity present. Python function definition – “rek_celebrities”.

Upon gathering information function will add this information into dynamodb table that has specified in the parameter store. The primary key for this image is “keyname” from S3 bucket. 

Lambda function (search_images_from_db_function.py)

Second Lambda function will be used to search images that input is provided by API gateway. Upon inputs are received, images will be searched for specific keywords in dynamodb database.

Once the file keyname is received same function will create “pre-signed” url for images and send those links back to API gateway as html page.

Image’s pre-signed url will be sent back to as html page that will be displayed by api gateway. In real life scenario, images will be processed and presented by application\web layer.

Images uploaded to S3

Use any technique to upload images to S3 storage. One can copy images to S3 storage via cli, boto sdk, Rest API or any other custom application. Once images are uploaded lambda function will be triggered. Ensure to create “image” folder into S3 bucket and upload all images to folder. Please ensure lambda functions are deployed before images are uploaded to S3 bucket.

API Gateway

An idea if this design mainly centered around solution designing than developing an application. So I am using API gateway to send inputs to the Lambda function. Currently, the application does not support for multiple inputs but certainly can be added. After receiving responses from Lambda function, API will display images. 

API gateway configuration

Search for API gateway in AWS console, Click on create api
Choose HTTP API type.
Integrate Lambda function “search_image_funct” with API gateway and select appropriate API name.
Add routes for “/”

Default stage will be used. For better CI/CD process, try using canary method for new version deployment.

We will use default stage.
Review API configuration

Selected url will be used to search for image.

Search link is api url then “?searchfor=” and things to search

<API gateway url>/?searchfor=<things to search>

Search Examples

I am going to search some of the images those are uploaded as testing images.

Searching for Jeff Bezos

Searching for Sundar pichai

Searching for water

Searching for roses

search for universe images

Architecture images
Disclaimer

Images are used for educational purpose. Anyway if its not appropriate to use images, please post comments I will remove it.

Categories
Application AWS Database Featured Terraform

Three Tier Architecture with AWS

In this story, I am planning to create three-tier architecture with the help of AWS resources. First-tier Load Balancer, Second tier(webserver) considered as application logic and last tier Database. I am using Dynamodb for the NoSQL database.

Architecture

An auto-scaling group is created with a minimum of 2 instances. ASG has two subnets both in a different availability zone. This auto-scaling group will be used as a target group for application Load Balancer. In my configuration, instances are not reached directly via there public address over port 80 will only application load balancer will be forwarding a request to EC2 instance. Session get terminated at the application load balancer.

Two buckets are needed, the first S3 bucket used to store userdata and AWS Dynamodb script in S3. The second bucket will be used for ALB to store logs. IAM roles.

Configuration list

  • data.aws_ssm_parameter.s3bucket: S3 bucket information to storage scripts
  • aws_vpc.app_vpc: VPC for environment
  • aws_eip.lb_eip: Elastic IP address for Load balancer
  • aws_iam_role.app_s3_dynamodb_access_role: Role for EC2 instnace profile
  • data.aws_availability_zones.azs: To get list of all availability zones
  • data.aws_ssm_parameter.accesslogbucket: S3 bucketname to storage ALB logs
  • aws_dynamodb_table.app-dynamodb-table: Dynamoddb Table resource
  • aws_iam_role_policy.app_s3_dynamodb_access_role_policy: Policy to attach on role “app_s3_dynamodb_access_role”. Dynamodb full access is granted please grant appropriate access for your application need
  • aws_iam_instance_profile.app_instance_profile: EC2 instance profile to access S3 storage and Dynamodb table
  • aws_subnet.app_subnets: Multiple subnets are created with VPC per Availability zone in region
  • aws_lb_target_group.app-lb-tg: Target group for ALB
  • aws_security_group.app_sg_allow_public: Security group for LB. Port 80 open from all world.
  • aws_internet_gateway.app_ig: Internet gateway
  • aws_lb.app-lb: Application load balancer
  • app_s3_dynamodb_access_role : To access Dynamodb and S3 account from Lambda function
  • aws_route_table.app_rt: Route table
  • aws_security_group.app_sg_allow_localip: Security group to allow ssh access from “localip” from variables file and ALB to access EC2 instance over port 80
  • aws_instance.app-web: This is template instance will be used for AMI creation used for Launch configuration and Autoscaling group (ASG)
  • aws_lb_listener.app-lb_listner: ALB Listner for healthcheck
  • aws_ami_from_instance.app-ami: AMI resource will create ami from “app-web” instance. Will use this ami to create launch configuration.
  • aws_launch_configuration.app-launch-config: EC2 instnace launch configuration used to create Autoscalling group.
  • aws_autoscaling_group.app-asg: Autoscaling group used create two instance in different availability zone. ALB will send request on these ASG.

Source code

Please download source code from my GitHub Repo —

https://github.com/yogeshagrawal11/cloud/tree/master/aws/3%20Tier%20app

  • aws-userdata-script.sh : This will will run during userdata is executed. File will get information list instance-id, Publicip, lcoal ip and Availability zone name from metadata server and copy that to “/var/www/html/index.html” file.
  • nps_parks.csv : Is input file to copy data from S3 and add into dynamodb table
  • dynamodb.py : file used above input file and create new table and insert a record into the table. This table now used to sorting and output is store again in “/var/www/html/index.html” for future view. Objecting is to ensure instances from different availability zones able to comminited to Database our 3rd layer.
  • user_data.tpl : Userdata template file used by terraform
  • terraform.tfvars : Terraform varible file
  • main.tf : Terraform program file

PS. I don’t want to use this story to create a full-blown application.

Prerequisites

Download all files from the Github repository.

Download “terraform” software and copy at same downloaded location

Create S3 bucket to store scripts. Create “userdata” directory in bucket toplevel and upload “aws-userdata-script.sh”, “nps_parks.csv” and “dynamodb.py” file at that location. script “EC2 instance will copy these script using user-data template file.

Create key pair for EC2 instance.

Create the following parameter —

accesslogbucket : <buckname for ALB logs> You can use same bucket name as userdata.

ec2_keyname : <Key pair name>

s3bucket : s3://<bucketname>. Please ensure to prefix “s3://” before bucket name in value parameter.

Image for post
image -2

Configuration Output

Starting running terraform template you will see below output

The output is the Load balancer output link. You can add this output to DNS records for future access. For this exercise, we will use this address directly to access our application.

Image for post
image -3

Load balancer configuration. DNS name to access your ALB endpoint. VPc, Availability zone, and security group configuration. The public security group will be used to get traffic from world to ALB on port 80. image-5 has information about S3 location where ALB will going to save logs.

Image for post
image — 4
Image for post
image — 5

ALB target group configuration and health check details. Healthcheck is performed on “/” parent page. This can be changed as per different application endpoints. Image-7 has information about instances registered to the target group via the Autoscaling group.

Image for post
image — 6
Image for post
image — 7

I am first creating a sample instance “ya-web”. Using this application to create “golden-ami”. This AMI is been used for launch configuration and to create the Autoscaling Group(ASG). Normally golden AMI already created. That AMI information can be inputted as a variable in “terraform.tfvars” files. image — 9 is the Autoscaling group configuration. Minimum/maximum capacity can be altered as part of input as well.

Image for post
image — 8
Image for post
image — 9

Instance information. “ya-web” is a template vm. Other two vis are part of autoscaling group.

Image for post
image — 10

Accessing application with a Load Balancer. LB transferred the request to the first instance in AZ “us-west-2a”. Instance able to pull data from DynamoDB using boto API and because of instance profile, we created in our resource file. The image-12 request is transferred to a 2nd instance for different AZ “us-west-2b”. I am using stickiness for 20sec. This can be managed via cookies as well. My idea of the application is make it a simple kind of “hello world” application to get the bare minimum configuration.

Image for post
ALB transferring request to First instance, image-11
Image for post
ALB transferring request to First instance, image-12

Instance public IPs are not able to access from outside world(image — 13). Only ssh and ping(icmp) are allowed from localip defined variables file.

Image for post
image-13(a)
Image for post
image — 13(b)

Disclaimer

Network security and Identity security needed to be improved for production use.

Categories
AWS Compute Featured Security & Identity Storage Terraform

EFS and EC2 instance creation using Terraform templating

Automating implementation and reducing time to deploy complex environments is key. In this story, I am planning to get one of the environments that fairly used in the industry to map NFS FS over multiple subnets. This is a very basic configuration but complexity starts when you wanted to use the same template for deploying the entire application in one go.

I am using the Terraform template function to achieve this. I am certainly can use “Ansible” or “Chef” or any other tool but I wanted to make it relatively simple and have things done by just using a single input file.

Architecture Diagram

I am creating a single EFS FS that will be part of a given region and will have a single mount target in that AZ. I am planning to use a maximum of 3 AZ in this document. AZ count can be increased in case needed for more redundancy.

Single instance started in each AZ and mounted newly created EFS using local IP. Internet gateway attached so that my local environment I could be able to access instances to check EFS is working fine.

Parameter store used to get a “keypair” name.

Image for post
Architecture Diagram. Image-1

Source Code

Download source code for this implementation from Github page —

https://github.com/yogeshagrawal11/cloud/tree/master/aws/EFS/MutiAZ%20EFS%20implementation

Download main.tf, terraform.tfvars and user_data_import.tpl file

user_data_import.tpl is user_data template file. You can add or modify any commands you like to execute during boot time. Mainly I am using this file to mount newly created EFS FS automatically on EC2 instance.

New EFS name is part of the input and UNIX mountpoint is also part of the input. If VPC and subnet already created and wanted to use same subnet make sure to add the “data” block in main.tf accordingly and change “EFS” and “instance” block accordingly.

Image for post

Please change localip parameter to your own domain subnet ip from where you need ssh access to each EC2 instance. Do not use default 0.0.0.0/0 which opens port 22 for all world.

Image for post

Execute Terraform job

To execute terraform job please download terraform file and entier following commands.

aws configure

terraform init

terraform plan

terraform apply

Please review terraform documentation for more information. You can send your questions as well.

This job will create total of 32 resources. Const be very minimum if you will use the attached configuration and upon testing perform the cleanup task.

Image for post

Output “efsip” are EFS IP for each Availability Zone. Since I am working on the first 3 availability zone, I did assign 3 IP for inter AZ communication. “instance_public_ip(typo)” is an IP address for each instance that I created in given AZ. I will use this public ip to connect to each EC2 instance.

Verify FS is mounted successfully. Each instance used its own EFS IP from AZ to connect. EFS is mounted successfully.

Image for post

Perform Read/Write test from each instance. I am creating new file from one of the instance and the file is visible from other two instances.

Image for post
Image for post

Tags are added as per EFS FS in case needed for local scripting purposes.

Image for post

Elastic Filesystem Configuration

EFS fs is created with 3 mount point

Image for post

Access point to used mount FS as “/” this can be easily changed as per need.

Image for post

FS is part of 3 Availability zone and each availability zone has a different IP address.

Image for post

Clean up

To cleanup enter following command

terraform destroy

Categories
AWS Compute Database Featured Management Security & Identity

Convert object-oriented data to Nosql Dynamodb — 101

The IoT Ecosystem is buzz words and needed lots of data management. We receive data but how to make use of data is the most important. This design is a very small portion of a bigger portfolio. Much more application can be integrated into this design. There are many ways to perform this transformation. Athena and Glue certainly can be used here.

Design overview

Consider this design is a bare minimum requirement to convert object-oriented data into data used for analytics. I am trying to use managed service as much as possible in this design but the 3rd party tool can be used for this design.

Application or IoT device will dump data into the S3 bucket. Data can have a variable field as long as the name file is common. S3 will trigger the Lambda function upon put request completion. This Lambda function will download files from the S3 bucket and copy in its temporary storage configured via lambda. To make historic trending of time-series data, Dynamodb hash key can be used with a combination of “name” and “timestamp”.

Lambda will convert the CSV file into JSON and add each row as an item into Amazon DynamoDB Table. Upon success, lambda will send a notification to SNS topics. SNS topic is configured with two types of subscriptions “SMS” and “SQS”.

Failure events can be sent to another topic for reiteration.

Image for post
Architechture Diag (image 1)

Use Case:

This code can be used with a little tweak to get a set of S3 data and perform analysis on them (Instead of put trigger use copy or post-event). Say some team likes to perform analytics from all data for last month. We can use this type of environment and provide the DynamoDB database for this specific analysis. Once work is done all configuration will be done.

Infrastructure as code (IaC)

IaC is one of the most important application deployment tools. It will reduce errors and provide highly repeatable infrastructure. This will help me not to manually configuring parameters. All parameter resource names are prefixed with “appname” variable. Thus, the same configuration can be used for different application environments or teams.

I chose Terraform to implement this so that a hybrid implementation is an option in case of any customer requirement. Terraform support all major cloud environment. Obviously, we need to appropriately change the resources.

Terraform Provider information. I highly recommend setting up a profile while running the “terraform init” command so that different environments used with different access.

Image for post

Avoid using “access_key” and “secret_key”. You can also create ec2 instance with proper IAM role for Terraform deployment.

Following resource configuration will be added into the environment for this implementation –

  • app_lambda_role

Lambda function will use this role for internal usage. Mainly, this role should include S3 read access, Cloud watch log group and stream write access, Dynamodb add/read/update item access and SNS publish access.

  • app-lambda-cloud-watch-policy

Policy created with the above role access.

  • app-lambda-cloud-watch-policy-attachment

Attach the policy to the role.

  • allow_bucket

This will be used to trigger the lambda function from S3.

  • app-lambda-func

Lambda function will be run after triggered by S3.

  • bucket_notification

S3 notification resource that will trigger lambda function on events specified in the notification. “prefix” and “suffix” configuration can be used for different types of environments.

  • app-snstopic

SNS topic where lambda function will send notification of successful events. PS. I have not configured notification on failure events. Create another topic for the same and update the lambda code accordingly.

  • app-sns-target

SNS-target will connect “SQS” as a subscription for “app-snstopic”

  • app-snstopic-sms

SMS topic is created. We can club topics with just another SMS subscription. I wanted to ensure we have different topics to send different kinds of data. Like for SQS, we can send information about which rows are failed and try that row information. SMS topic will have concise information.

  • app-sms-target

Sms-target topic will connect SMS phone no or list of phone nowhere an event is sent.

  • app-sqs

Queue with information. This can be used to notify topics that are not successful. Lambda function can be triggered to resolve those issues or try again. I have not added that functionality.

  • app-dynamodb-table

The table will be created as per input schema. Hash-key is important and all input data should have hash-key. If hash-key is not present then the item will not be inserted into the Dynamodb NoSQL database. In my input, “name” field is used as hash-key

Source code

Download source code from below Github link –

https://github.com/yogeshagrawal11/cloud/tree/master/aws/DynamoDB/S3%20to%20Dynamodb

Download zip file and main.tf and terraform.tfvars. Change appropriate values in “terraform.tfvars” file.

Image for post

Download zip file at the same location as the terraform key. Lambda function will be created by the below terraform resource.

Image for post

Terraform apply command Output

The following resource will be created using

“terraform apply -auto-approve“ Command. All 12 resources will be created.

Image for post

Lambda function created.

Image for post

Input file uploaded to s3.

Image for post

Input file format.

Image for post

Dynamodb table created by terraform.

Image for post

Lambda function triggers after uploading the input file.

Image for post

Data inserted into nps_parks table by insertS3intoDynamodb lambda function

Image for post

SNS topic created

Image for post

SQS queue

The message is posted into SQS queue

Image for post

Next Step

  • Add an application to analyze data from DynamoDB and present virtualization information.
  • Add realist changeable data, not static data that I used in my case study.

Disclaimer

1. Code is available with Apache license agreement

2. Do not use this code for production. Educational purposes only.

3. Needed to improve security around the environment

4. Tighten IAM policy required for production use

5. I have not created a topic for the failure event

6. A failure domain is not considered in this design

7. Lambda function is created with base minimum code and not performing data validation

Categories
AWS Compute

AWS EC2 instance management — Starting and Storage instances using tagging

Perform shutting and starting of instances during non-business hours to reduce operating cost.

Architecture Diagram

AWS Lambda function will be called using CloudWatch rule at regular intervals every hour. Lambda function will review tags of each instance and verify weather instance needed shutdown or startup depend upon local time. Lambda code can be placed on S3 for safekeeping to import from.

New Log stream and group created to keep track of shutdown and startup events. Amazon SNS topics can be used for event notification. A custom role is created to get EC2 instance information and write logs to the CloudWatch event.

Instance tagging will be used to determine the instance needed to shut down or startup.

Image for post
Architecture Diag. Image 1

Tagging format

Key name: “shutdown”

Values format :

weekday:<hours to shutdown separated by hours>;weekend: <hours to shutdown separated by hours>;

weekday: Monday to Friday

weekend: Saturday and Sunday

This format can be changed but code change needed for that.

PS. Keywords include colon(:) and semicolon(;)

Role Policy

Create Role policy with the following permissions.

Image for post
image 2

Create a role and attach it to the newly created policy.

Image for post
image 3

Lambda Function

Create a Lambda function with runtime python3.8. A select newly created role for the Lambda function.

Image for post
image 4

Download Archive.zip file code from my GitHub below link –

https://github.com/yogeshagrawal11/cloud/tree/master/aws/EC2/instance_shutdown_script

To add function code. Click on Action and upload a zip file.

Image for post
image 5

After uploading a zip file you would see “lambda_function.py” is an actual script for instance. I am using the default handler in the script. In case, handler information is changed please make appropriate changes either in function name or file name.

Filename: lambda_function.py

Python function name: lambda_handler

Image for post
image 6

I am using “pytz” module to get the correct timezone conversion so needed to upload that module code.

For regions to timezone mapping, I have created a dictionary. In case your region is not part of the list feel free to add a new key with region name and respective timezone.

Image for post
image 7

In our magical earth, sunset and sunrise are happening at the same time for different regions. So I am calling function to perform both activities simultaneously. “system_function” will generate a list of all instances that need to stop and start.

Image for post
image 8

In the case of the large instance count, I am using a paginator.

Image for post
image 9

Paginator will reduce the list by filtering tag value. If the “shutdown” tag is not set then the instance will not be part of the list. Using the same function for starting and shutting down instances. Hence, for shutting down instances I am also filtering for running instances. Or else to start instances I am filtering for stopped instances.

Image for post
image 10

Set a timeout value of more than 3sec. Normally it takes me 10 to 15 sec. This value can vary with the environment and no of instances.

Please use the following handler or make any appropriate changes.

Lambda handler = lambda_function.lambda_handler

Image for post
image 11

Lambda Permissions :

Lambda permission for creating log stream/log group and adding events in cloud watch.

Image for post
image 12

Lambda function also has permission to get instance status, able to read instance tags. Start and stop instances.

Image for post
image 13

Cloud watch Rule

Create a CloudWatch rule that will trigger the Lambda function every start of the hour every day.

Image for post
image 14
Image for post
image 15
Image for post
image 16

Testing

Current time: Fri 4:18 pm PST

In the Oregon region, I have instance1 will be shut down and instance 2 will be started.

Image for post
image 17

In North Virginia region, east 2 and east 3 should be shutdown(+3hr)

Image for post
image 18

Manually run Lambda function

Image for post
image 19

Logwatch events

Image for post
image 20

Oregon instances Started and stopped as expected

Image for post
image 21

North Virginia instances stopped

Image for post
image 22

Mumbai region instance closed as well

Image for post
image 23

Disclaimer :

SNS topic is not triggered but can be added for any stopping and starting instances.

Database can be used to track no of hours saved due to starting and stopping Ec2 instance script