We are clicking pics every day and the Image datastore industry is spreading its way to our lifestyle. Massive amounts of images are kept on adding every day. In this story, I like to present a tool to search for images of a given object or celebrity like Google images. Don’t get me wrong, this is nowhere near Google images. Google images crawl weblinks. This story just belongs to same object-store.
Images will be copied and stored in the S3 bucket. I am using external tools to copy images. This external tool can be anything like S3 CLI or simple AWS. S3 state change event will trigger the Lambda function to perform image recognition analysis. I am performing two types of analysis, general analysis about environment/object, secondly celebrity analysis. Once the analysis is performed, data will be stored in Dynamodb. Dynamodb is using “keyname” from S3 as the primary key for the images. All labels generated from image recognition will be stored as an attribute in the newly assigned item.
API gateway will be used to search for any images containing any value or celebrity. That will trigger the Lambda function will generate a pre-signed URL for each image and deliver it to the client. This pre-signed URL will expire in 10mins if the user will not download those images.
Feel free to download code from my GitHub repo.
Parameters to be added in parameter store —
- myregion = Region name where all environment is setup. Multiregion setup needed configuration change with Load balancer
- imagedb = Dynamodb table name
- Create dynamodb table with the primary key as string and name of primary key attribute “s3key”. “s3key” attribute will store image S3 keyname.
- s3bucket = S3 bucketname
- Create S3 bucket named specified in parameter store. Create “/image” folder where all images to be copied.
Create Two IAM roles. First IAM role used with Readwrite access to Dynamodb, Log steam, image recognition and S3 access. Function(image_process_function.py) will assigned this role. Policy information as below. I am using AWS managed policy for simplicity but ensure to use the appropriate role of minimum access. Use following AWS managed policy —
Second role use for second lambda function (search_images_from_db_function.py) used read DynamoDB database for correct images and keyname. Following are AWS managed policy should be added into role
Create empty Dynamodb table “imagerek” to store all label information into database. Primary key for this database should be “s3key”. If primary key is not named as S3key this solution will not work.
S3key has keyname from S3 images datastore.
Lambda Function (image_process_function.py)
Image function will get triggered after uploading images to S3 system. Function will perform two image recognition operations. First will verify all object and label all object discovered from image. Python function definition – “rek_labels”.
The second part of this function will check for images for any celebrity present. Python function definition – “rek_celebrities”.
Upon gathering information function will add this information into dynamodb table that has specified in the parameter store. The primary key for this image is “keyname” from S3 bucket.
Lambda function (search_images_from_db_function.py)
Second Lambda function will be used to search images that input is provided by API gateway. Upon inputs are received, images will be searched for specific keywords in dynamodb database.
Once the file keyname is received same function will create “pre-signed” url for images and send those links back to API gateway as html page.
Image’s pre-signed url will be sent back to as html page that will be displayed by api gateway. In real life scenario, images will be processed and presented by application\web layer.
Images uploaded to S3
Use any technique to upload images to S3 storage. One can copy images to S3 storage via cli, boto sdk, Rest API or any other custom application. Once images are uploaded lambda function will be triggered. Ensure to create “image” folder into S3 bucket and upload all images to folder. Please ensure lambda functions are deployed before images are uploaded to S3 bucket.
An idea if this design mainly centered around solution designing than developing an application. So I am using API gateway to send inputs to the Lambda function. Currently, the application does not support for multiple inputs but certainly can be added. After receiving responses from Lambda function, API will display images.
API gateway configuration
Default stage will be used. For better CI/CD process, try using canary method for new version deployment.
Selected url will be used to search for image.
Search link is api url then “?searchfor=” and things to search
<API gateway url>/?searchfor=<things to search>
I am going to search some of the images those are uploaded as testing images.
Images are used for educational purpose. Anyway if its not appropriate to use images, please post comments I will remove it.