Introduction to Edge AI Inference with AWS IoT Greengrass V2/SageMaker
I held a study meeting to introduce AWS IoT Greengrass V2 and SageMaker, “Introduction to Edge AI Inference with AWS IoT Greengrass V2/SageMaker”.
I would like to share the content of the presentation, in the hopes that it will provide an opportunity for you to know edge inference with AWS. You can pull an example code used in this post from my GitHub repository.
Prerequisites
Participants are expected to:
- Be interested in edge AI inference on AWS.
- Have basic knowledge of machine learning and AWS.
Goals in this post:
- Provide an overview of edge AI inference on AWS.
- Demonstrate examples using IoT Greengrass V2, SageMaker, SageMaker Neo, and SageMaker Edge Manager.
Edge Computing Overview
Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data. This is expected to improve response times and save bandwidth.
Edge computing provides many merits like the following:
- Low latency (especially important for real-time applications such as driving automation)
- Decreased security risks
- Lower communication costs
However, edge computing may have some disadvantages, such as:
- Challenges in vertical scaling, which is easier in cloud environments
- Requirements for capacity planning
- Infrastructure management
Edge computing is not only utilized for AI but also for content delivery networks (CDN).
One of the current challenges with edge AI is its computing capacity. Modern machine learning and deep learning algorithms involve numerous hyperparameters, demanding substantial computing resources. Various efforts have been made to optimize ML/DL models for edge AI. 1
IoT Greengrass V2
Overview
AWS IoT Greengrass is an open source Internet of Things (IoT) edge runtime and cloud service that helps you build, deploy and manage IoT applications on your devices.
IoT Greengrass V2 offers:
- Running on Windows 2, Linux (Ubuntu, Raspberry Pi OS, and so on).
- Architecture for both x86 and ARM.
- Support for Lambda Functions.
- Deep Learning Runtime (DLR) for edge AI inference.
Concept
The IoT Greengrass V2 concept includes the following:
- Greengrass Core Devices:
- Run Greengrass Core on your edge.
- Are registered as AWS IoT Things.
- Communicates with AWS.
- Greengrass Client Devices:
- Communicates with Greengrass Core Device using MQTT.
- Are registered as AWS IoT Things.
- Communicates with other client devices when a Greengrass Core Devices is used as a message broker.
- Greengrass Components:
- Are software running on Greengrass Core Devices.
- Are implemented and registered by users.
- Deployment:
- Consists of instructions from AWS to Greengrass Core Devices.
SageMaker
SageMaker Overview
SageMaker is a managed service for machine learning in AWS. Users can use 17 built-in algorithms to perform machine learning with less code 3. It supports main deep learning frameworks such as TensorFlow or PyTorch 4.
SageMaker offers the following type of inference endpoints:
- SageMaker Hosting Services (Similar to EC2)
- SageMaker Serverless Endpoints (Preview) (Similar to Lambda)
- Asynchronous Inference (Similar to SQS, EC2 and Autoscaling)
- Batch Transform (Similar to AWS Batch)
SageMaker Neo Overview
Neo is a capability of Amazon SageMaker that enables machine learning models to train once and run anywhere in the cloud and at the edge.
With a single click, SageMaker Neo optimizes the trained model and compiles it into an executable. The compiler uses a machine learning model to apply the performance optimizations that extract the best available performance for your model on the cloud instance or edge device.
SageMaker Edge Manager Overview
Amazon SageMaker Edge Manager provides model management for edge devices so you can optimize, secure, monitor, and maintain machine learning models on fleets of edge devices such as smart cameras, robots, personal computers, and mobile devices.
The example in this post does not use the monitoring feature.
Example of Edge AI Inference
Overview
In this example, an EC2 instance is considered an edge device.
According to the following steps, you would see the edge inference.
- Setting up
- Preparing AWS resources
- Implementing training script
- Implementing inference script
- SageMaker
- Training with SageMaker
- Compiling model with SageMaker Neo
- Packaging model with SageMaker Edge Manager
- Greengrass
- Setting up Greengrass Core
- Registering Greengrass Component for edge inference
- Deploying Greengrass Component
- Testing
Preparing AWS resources
Prepare the following AWS resources beforehand. For specific details on each resource, check the CloudFormation template provided by my GitHub repository.
Resource | Name | Description |
---|---|---|
IAM User | greengrass-core-setup-user | For setting up Greengrass Core |
IAM Role | sagemaker-execution-role | SageMaker execution role |
IAM Role | GreengrassV2TokenExchangeRole | Greengrass Core role |
S3 | sagemaker-ml-model-artifacts-{account_id}-{region} | Bucket for ML models |
You can create these resources by running the following command.
% aws cloudformation deploy --template-file ./cfn.yaml --stack-name greengrass-sample --capabilities CAPABILITY_NAMED_IAM
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - greengrass-sample
Implementing training script
This example uses PyTorch’s pre-trained VGG16 model, so install it with the following command.
% pip install torch torchvision
Write training.py
with the following content, which will run in SageMaker.
import argparse
import os
from datetime import datetime
import torch
from torchvision import models
def fit(model: torch.nn.modules.Module) -> None:
# Write some training codes...
pass
def save(model: torch.nn.modules.Module, path: str) -> None:
suffix = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
path = os.path.join(path, f'model-{suffix}.pt')
# If you use `model.state_dict()`, SageMaker compilation will fail.
torch.save(model, path)
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script.
# input data and model directories
parser.add_argument('--model_dir', type=str)
parser.add_argument('--sm_model_dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
args, _ = parser.parse_known_args()
return args
if __name__ == '__main__':
args = parse_args()
vgg16 = models.vgg16(pretrained=True)
fit(vgg16)
save(vgg16, args.sm_model_dir)
When implementing training scripts for SageMaker, you can follow the same approach as you would in your local environment. However, there are the following points to consider. Please refer to the official documentation for further information.
- Runtime arguments
- Environment variables
- Training dataset
- If you are using
FILE
mode, the training dataset will be replicated automatically from S3 to your SageMaker instance. - If you are using
PIPE
mode, you will need to implement code for reading streamed training data. For more information, please read the official documentation
- If you are using
- Model saving directory
SM_MODEL_DIR
environment variable can be used for saving your trained model, which will be automatically uploaded to S3.
The example above does not include the training code.
Implementing inference script
You can load the model compiled by SageMaker Neo using Deep Learning Runtime (DLR). Install DLR with the following command.
% pip install dlr
Write inference.py
with the following content, which will run in your Greengrass Core.
import argparse
import glob
import json
import os
import time
import numpy as np
from dlr import DLRModel
def load_model() -> DLRModel:
return DLRModel('/greengrass/v2/work/vgg16-component')
def load_labels() -> dict:
path = os.path.dirname(os.path.abspath(__file__))
# See https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
path = os.path.join(path, 'imagenet_class_index.json')
with open(path, 'r') as f:
labels = json.load(f)
return labels
def iter_files(path: str) -> str:
path = path[:-1] if path.endswith('/') else path
files = glob.glob(f'{path}/*.npy')
for file in files:
yield file
def predict(model: DLRModel, image: np.ndarray) -> np.ndarray:
return model.run(image)[0]
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser()
parser.add_argument('--test_dir', type=str)
parser.add_argument('--interval', type=int, default=300)
args, _ = parser.parse_known_args()
return args
def start(model: DLRModel, path: str, labels: dict) -> None:
for file in iter_files(path):
image = np.load(file)
y = predict(model, image)
index = int(np.argmax(y))
label = labels.get(str(index), '')
print(f'Prediction result of {file}: {label}')
if __name__ == '__main__':
args = parse_args()
print(f'args: {args}')
model = load_model()
labels = load_labels()
if args.interval == 0:
start(model, args.test_dir, labels)
else:
while True:
start(model, args.test_dir, labels)
print(f'Sleep in {args.interval} seconds...')
time.sleep(args.interval)
Please note that PyTorch expects torch.Tensor
as input data, while models compiled by SageMaker Neo expect numpy.ndarray
.
For information on the input shapes of PyTorch’s pre-trained models, refer to the official documentation.
In this example, inference is periodically performed for /greengrass/v2/work/vgg16-inference-component/images/*.npy
placed in your Greengrass Core using the --interval
argument.
You may also refer to aws.greengrass.DLRImageClassification
or aws.greengrass.DLRObjectDetection
, which are AWS-provided components.
To register an inference Greengrass Component, upload a zip file containing the inference script and associated files to your S3 bucket.
% cd vgg16-inference-component
% zip vgg16-inference-component-1.0.0.zip inference.py imagenet_class_index.json
% aws s3 cp vgg16-inference-component-1.0.0.zip s3://{YOUR_BUCKET}/artifacts/
Training with SageMaker
Install SageMaker Python SDK with the following command.
% pip install sagemaker
To queue a SageMaker training job within your local environment, write training_job.py
with the following content.
Of course, you can also use SageMaker management console instead, which is not covered in this post.
from sagemaker.pytorch import PyTorch
AWS_ACCOUNT_ID = '123456789012'
S3_BUCKET = f's3://sagemaker-ml-model-artifacts-{AWS_ACCOUNT_ID}-ap-northeast-1’
if __name__ == '__main__':
pytorch_estimator = PyTorch(
entry_point='training.py',
source_dir='./',
role='sagemaker-execution-role',
instance_count=1,
instance_type='ml.m5.large',
framework_version='1.10.0',
py_version='py38',
output_path=f'{S3_BUCKET}/models/trained',
hyperparameters={}
)
pytorch_estimator.fit()
After running the script, you would see billing information in the execution log. For example, 255 seconds would be billed in the following log.
% python training_job.py
…
2022-02-16 15:41:56 Uploading - Uploading generated training model
2022-02-16 15:42:56 Completed - Training job completed
ProfilerReport-1645025749: NoIssuesFound
Training seconds: 255
Billable seconds: 255
The model will be saved in your S3 bucket (output/model.tar.gz
).
It will then be compiled and optimized using SageMaker Neo.
Compiling model with SageMaker Neo
Create a SageMaker compilation job. In this example, the job took about 4 minutes to complete.
Specify the input configuration according to the following:
Field | Value |
---|---|
Artifact | S3 URI of model.tar.gz |
Input shape | Model input shape |
Framework | PyTorch |
Framework version | 1.8 |
For input shape, the official documentation describes the following.
All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224.
Specify the output configuration as you like.
rasp4b
as Target device
, the models will be compiled for a 64-bit architecture. Therefore, you will not be able to load the models onto a 32-bit OS such as the Raspberry Pi 32-bit OS. In this case, you need to use a 64-bit OS instead. Although I could not find the information in AWS official documentation, it was commented in the AWS Forum.
You will see the following at the bottom of the page.
The library libdlr.so compiled by Sagemaker Neo with target rasp4b returns “ELF-64 bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked,
You can leave the following default.
Packaging model with SageMaker Edge Manager
Create a SageMaker Edge Packaging Job.
Enter the SageMaker Neo compilation job name.
If you choose Greengrass V2 component as deploy preset, the compiled model will be:
- Registered as Greengrass V2 component by SageMaker Edge.
- Saved to
/greengrass/v2/work/vgg16-component/
on the Greengrass Core.
Setting up Greengrass Core
Set up Greengrass Core on your edge device. This post uses an EC2 instance running Ubuntu 20.04.03. For detailed instructions on how to install Greengrass Core, please refer to the official documentation.
Please note that MQTT over TLS uses port 8883. If the port is not open, you will need to follow the manual setup guide.
Install JDK.
% sudo apt install default-jdk
% java -version
Add an user and group for Greengrass Core.
% sudo useradd --system --create-home ggc_user
% sudo groupadd --system ggc_group
Configure an AWS credential.
% # Set the credential of greengrass-core-setup-user already provisioned by CloudFormation
% export AWS_ACCESS_KEY_ID=
% export AWS_SECRET_ACCESS_KEY=
Install Greengrass Core.
% curl -s https://d2s8p88vqu9w66.cloudfront.net/releases/greengrass-nucleus-latest.zip > greengrass-nucleus-latest.zip
% unzip greengrass-nucleus-latest.zip -d GreengrassInstaller && rm greengrass-nucleus-latest.zip
% sudo -E java -Droot="/greengrass/v2" -Dlog.store=FILE \
-jar ./GreengrassInstaller/lib/Greengrass.jar \
--aws-region ap-northeast-1 \
--thing-name MyGreengrassCore \
--thing-group-name MyGreengrassCoreGroup \
--thing-policy-name GreengrassV2IoTThingPolicy \
--tes-role-name GreengrassV2TokenExchangeRole \
--tes-role-alias-name GreengrassCoreTokenExchangeRoleAlias \
--component-default-user ggc_user:ggc_group \
--provision true \
--setup-system-service true
Check the Greengrass Core service. Considering the memory usage, it is better that the core device has 2GB or more memory.
% sudo systemctl status greengrass
● greengrass.service - Greengrass Core
Loaded: loaded (/etc/systemd/system/greengrass.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2022-02-16 05:09:16 UTC; 1 day 2h ago
Main PID: 1454 (sh)
Tasks: 51 (limit: 2197)
Memory: 734.2M
CGroup: /system.slice/greengrass.service
This post uses automatic resource provisioning, so the following AWS resources have been automatically provisioned. Instead, you can set up by manual resource provisioning.
Resource | Name |
---|---|
Thing | MyGreengrassCore |
Thing Group | MyGreengrassCoreGroup |
Thing Policy | GreengrassV2IoTThingPolicy |
Token Exchange Role | GreengrassV2TokenExchangeRole |
Token Exchange Role Alias | GreengrassCoreTokenExchangeRoleAlias |
Registering Greengrass Component for edge inference
Create recipe.yaml
to register a Greengrass Component for inference.
For more information on the component recipe, please refer to the official documentation.
RecipeFormatVersion: '2020-01-25'
ComponentName: vgg16-inference-component
ComponentVersion: 1.0.0
ComponentDescription: Inference component for VGG16
ComponentPublisher: Iret
# Arguments to be passed.
ComponentConfiguration:
DefaultConfiguration:
Interval: 60
# Dependencies which will be installed with this component.
ComponentDependencies:
variant.DLR:
VersionRequirement: ">=1.6.5 <1.7.0"
DependencyType: HARD
vgg16-component:
VersionRequirement: ">=1.0.0"
DependencyType: HARD
Manifests:
- Name: Linux
Platform:
os: linux
Lifecycle:
Run:
RequiresPrivilege: true
Script: |
. {variant.DLR:configuration:/MLRootPath}/greengrass_ml_dlr_venv/bin/activate
python3 -u {artifacts:decompressedPath}/vgg16-inference-component-1.0.0/inference.py --interval {configuration:/Interval} --test_dir {work:path}/images/
Artifacts:
- Uri: s3://sagemaker-ml-model-artifacts-123456789012-ap-northeast-1/artifacts/vgg16-inference-component-1.0.0.zip
Unarchive: ZIP
Interval
in the above example means inference interval.
You can specify component dependencies in ComponentDependencies
.
In this example, the following must be specified.
variant.DLR
- Needed for loading models compiled by SageMaker Neo. For more details, please refer to the official documentation.
- Has Python virtual environment in
/greengrass/v2/work/variant.DLR/greengrass_ml/greengrass_ml_dlr_venv
on your Greengrass Core device.
vgg16-component
- Model compiled by SageMaker Neo
- Registered by SageMaker Edge Manager
After creating recipe.yaml
, create the Greengrass component.
When deploying the component, Greengrass Core will download and extract artifacts from S3. Greengrass Core validates checksum of artifacts, so if the artifacts are directly overwritten, the component status will be broken. Please refer to the official documentation for more details.
Deploying Greengrass Component
In this step, you can choose components to deploy.
For My components, specify the following.
vgg16-component
will be installed even if you do not choose because vgg16-inference-component
recipe has the HARD
dependency for it.
Name | Description |
---|---|
vgg16-component | The VGG16 component packaged by SageMaker Edge Manager |
vgg16-inference-component | The inference component |
For Public components, specify the following.
variant.DLR
will be installed even if you do not choose because vgg16-inference-component
recipe has the HARD
dependency for it.
Name | Description |
---|---|
variant.DLR | The component necessary for loading the model |
aws.greengrass.Nucleus | The component necessary for Greengrass Core |
Press Next
with no configuration changes.
Press Next
with no configuration changes.
After reviewing, press Deploy
to start deploying components.
Testing
To test inference on your Greengrass Core, follow the steps:
- Pre-trained PyTorch models expect a 4-dimensional tensor (N, C, H, W) as input shape, so convert images for inference into Numpy Array. For more information, please refer to the official documentation.
- Transfer the converted data to
/greengrass/v2/work/vgg16-inference-component/images/
on your Greengrass Core device. - Check the
/greengrass/v2/logs/vgg16-inference-component.log
file on your Greengrass Core device.
You can use the following python script to convert images into Numpy Array.
import argparse
import os
from PIL import Image
import numpy as np
import torch
from torchvision import transforms
def load_image_to_tensor(path: str) -> torch.Tensor:
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
img = Image.open(path)
tensor_3d = preprocess(img)
return torch.unsqueeze(tensor_3d, 0)
def save(tensor: torch.Tensor, path: str) -> None:
np.save(path, tensor.numpy())
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser()
parser.add_argument('image', type=str)
args, _ = parser.parse_known_args()
return args
if __name__ == '__main__':
args = parse_args()
image = args.image
tensor = load_image_to_tensor(image)
save(tensor, os.path.basename(image) + '.npy')
Run the script with the following command.
% python convert_img_to_npy.py <YOUR_IMAGE>
Transfer the converted Numpy Array data to your Greengrass Core device.
Then, you can see the inference result in /greengrass/v2/logs/vgg16-inference-component.log
.
% scp xxx.jpg.npy <GREENGRASS_HOST>://greengrass/v2/work/vgg16-inference-component/images/
% ssh <GREENGRASS_HOST>
% tail -f /greengrass/v2/logs/vgg16-inference-component.log
…
2022-02-19T21:32:21.993Z [INFO] (Copier) vgg16-inference-component: stdout. Prediction result of /greengrass/v2/work/vgg16-inference-component/images/keyboard.jpg.npy: ['n03085013', 'computer_keyboard']. {scriptName=services.vgg16-inference-component.lifecycle.Run.Script, serviceName=vgg16-inference-component, currentState=RUNNING}
2022-02-19T21:32:22.257Z [INFO] (Copier) vgg16-inference-component: stdout. Prediction result of /greengrass/v2/work/vgg16-inference-component/images/pen.jpg.npy: ['n03388183', 'fountain_pen']. {scriptName=services.vgg16-inference-component.lifecycle.Run.Script, serviceName=vgg16-inference-component, currentState=RUNNING}
This post used the following two images.
The inference result is computer_keyboard
.
The inference result is fountain_pen
.
Conclusion
AWS users can easily implement the edge inference feature provided by IoT Greengrass V2 and the SageMaker ecosystem.
I hope you will find this post useful.
Footnotes
-
https://cacm.acm.org/magazines/2022/1/257437-shrinking-artificial-intelligence/fulltext CACM. Shrinking Artificial Intelligence. ↩
-
https://aws.amazon.com/about-aws/whats-new/2021/11/aws-iot-greengrass-support-windows-devices/ ↩
-
https://docs.aws.amazon.com/sagemaker/latest/dg/common-info-all-im-models.html ↩
-
https://docs.aws.amazon.com/sagemaker/latest/dg/frameworks.html ↩