Real-Time Face Detection with Raspberry Pi, Kinesis Video Streams, and AWS Rekognition

In this note, we will implement a face detection system using a USB camera connected to a Raspberry Pi, leveraging Kinesis Video Streams and Rekognition Video for processing and detecting faces in real-time.
Requirements
Hardware Requirements
- Raspberry Pi 4B with 4GB RAM
- Running Ubuntu 23.10 (installed via Raspberry Pi Imager)
- USB Camera
Software Requirements
- GStreamer
- Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI
- AWS SAM CLI
- Python 3.11
Building AWS Resources
AWS SAM Template
AWSTemplateFormatVersion: 2010-09-09Transform: AWS::Serverless-2016-10-31Description: face-detector-using-kinesis-video-streams
Resources: Function: Type: AWS::Serverless::Function Properties: FunctionName: face-detector-function CodeUri: src/ Handler: app.lambda_handler Runtime: python3.11 Architectures: - arm64 Timeout: 3 MemorySize: 128 Role: !GetAtt FunctionIAMRole.Arn Events: KinesisEvent: Type: Kinesis Properties: Stream: !GetAtt KinesisStream.Arn MaximumBatchingWindowInSeconds: 10 MaximumRetryAttempts: 3 StartingPosition: LATEST
FunctionIAMRole: Type: AWS::IAM::Role Properties: RoleName: face-detector-function-role AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Principal: Service: lambda.amazonaws.com Action: sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole - arn:aws:iam::aws:policy/service-role/AWSLambdaKinesisExecutionRole Policies: - PolicyName: policy PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - kinesisvideo:GetHLSStreamingSessionURL - kinesisvideo:GetDataEndpoint Resource: !GetAtt KinesisVideoStream.Arn
KinesisVideoStream: Type: AWS::KinesisVideo::Stream Properties: Name: face-detector-kinesis-video-stream DataRetentionInHours: 24
RekognitionCollection: Type: AWS::Rekognition::Collection Properties: CollectionId: FaceCollection
RekognitionStreamProcessor: Type: AWS::Rekognition::StreamProcessor Properties: Name: face-detector-rekognition-stream-processor KinesisVideoStream: Arn: !GetAtt KinesisVideoStream.Arn KinesisDataStream: Arn: !GetAtt KinesisStream.Arn RoleArn: !GetAtt RekognitionStreamProcessorIAMRole.Arn FaceSearchSettings: CollectionId: !Ref RekognitionCollection FaceMatchThreshold: 80 DataSharingPreference: OptIn: false
KinesisStream: Type: AWS::Kinesis::Stream Properties: Name: face-detector-kinesis-stream StreamModeDetails: StreamMode: ON_DEMAND
RekognitionStreamProcessorIAMRole: Type: AWS::IAM::Role Properties: RoleName: face-detector-rekognition-stream-processor-role AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Principal: Service: rekognition.amazonaws.com Action: sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonRekognitionServiceRole Policies: - PolicyName: policy PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - kinesis:PutRecord - kinesis:PutRecords Resource: - !GetAtt KinesisStream.Arn
Lambda Function
- The Rekognition Video stream processor streams detected face data to the Kinesis Data Stream, which is encoded as a Base64 string (line 17).
- The Lambda function generates an HLS URL (line 54-66).
import base64import jsonimport loggingfrom datetime import datetime, timedelta, timezonefrom functools import cache
import boto3
JST = timezone(timedelta(hours=9))kvs_client = boto3.client('kinesisvideo')logger = logging.getLogger(__name__)logger.setLevel(logging.INFO)
def lambda_handler(event: dict, context: dict) -> dict: for record in event['Records']: base64_data = record['kinesis']['data'] stream_processor_event = json.loads(base64.b64decode(base64_data).decode()) # Refer to https://docs.aws.amazon.com/rekognition/latest/dg/streaming-video-kinesis-output.html for details on the structure.
if not stream_processor_event['FaceSearchResponse']: continue
logger.info(stream_processor_event) url = get_hls_streaming_session_url(stream_processor_event) logger.info(url)
return { 'statusCode': 200, }
@cachedef get_kvs_am_client(api_name: str, stream_arn: str): # Retrieves the data endpoint for the stream. # See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesisvideo/client/get_data_endpoint.html endpoint = kvs_client.get_data_endpoint( APIName=api_name.upper(), StreamARN=stream_arn )['DataEndpoint'] return boto3.client('kinesis-video-archived-media', endpoint_url=endpoint)
def get_hls_streaming_session_url(stream_processor_event: dict) -> str: # Generates an HLS streaming URL for the video stream. # See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesis-video-archived-media/client/get_hls_streaming_session_url.html
kinesis_video = stream_processor_event['InputInformation']['KinesisVideo'] stream_arn = kinesis_video['StreamArn'] kvs_am_client = get_kvs_am_client('get_hls_streaming_session_url', stream_arn) start_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST) end_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST) + timedelta(minutes=1)
return kvs_am_client.get_hls_streaming_session_url( StreamARN=stream_arn, PlaybackMode='ON_DEMAND', HLSFragmentSelector={ 'FragmentSelectorType': 'SERVER_TIMESTAMP', 'TimestampRange': { 'StartTimestamp': start_timestamp, 'EndTimestamp': end_timestamp, }, }, ContainerFormat='FRAGMENTED_MP4', Expires=300, )['HLSStreamingSessionURL']
Deploying the Stack
Build and deploy the SAM application using the following commands:
sam buildsam deploy
Indexing Faces
To detect faces using the USB camera, index (register) faces into a Rekognition face collection beforehand. The IndexFaces API is used for this purpose.
Replace the following with the actual values:
<YOUR_BUCKET>
<YOUR_OBJECT>
<PERSON_ID>
aws rekognition index-faces \ --image '{"S3Object": {"Bucket": "<YOUR_BUCKET>", "Name": "<YOUR_OBJECT>"}}' \ --collection-id FaceCollection \ --external-image-id <PERSON_ID>
Rekognition does not store actual images in the face collection. Instead, it extracts and saves facial features as metadata.
https://docs.aws.amazon.com/rekognition/latest/dg/add-faces-to-collection-procedure.html
For each face detected, Amazon Rekognition extracts facial features and stores the feature information in a database. In addition, the command stores metadata for each face that’s detected in the specified face collection. Amazon Rekognition doesn’t store the actual image bytes.
Setting Up the Video Producer
This example uses the Raspberry Pi 4B with 4GB RAM running Ubuntu 23.10 as the video producer.
Building GStreamer Plugin
AWS provides the Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI. This SDK facilitates video streaming from the Raspberry Pi to Kinesis Video Streams.
While AWS offers a Docker image for the GStreamer plugin, the image may not work on Raspberry Pi due to architecture limitations.
Run the following commands. Depending on your system’s specifications, the build may take 20 minutes or more.
sudo apt updatesudo apt upgradesudo apt install \ make \ cmake \ build-essential \ m4 \ autoconf \ default-jdksudo apt install \ libssl-dev \ libcurl4-openssl-dev \ liblog4cplus-dev \ libgstreamer1.0-dev \ libgstreamer-plugins-base1.0-dev \ gstreamer1.0-plugins-base-apps \ gstreamer1.0-plugins-bad \ gstreamer1.0-plugins-good \ gstreamer1.0-plugins-ugly \ gstreamer1.0-tools
git clone https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-cpp.gitmkdir -p amazon-kinesis-video-streams-producer-sdk-cpp/buildcd amazon-kinesis-video-streams-producer-sdk-cpp/build
sudo cmake .. -DBUILD_GSTREAMER_PLUGIN=ON -DBUILD_JNI=TRUEsudo make
Once the build completes, verify the result with the following commands:
cd ~/amazon-kinesis-video-streams-producer-sdk-cppexport GST_PLUGIN_PATH=`pwd`/buildexport LD_LIBRARY_PATH=`pwd`/open-source/local/libgst-inspect-1.0 kvssink
The output should display details similar to this:
Factory Details: Rank primary + 10 (266) Long-name KVS Sink Klass Sink/Video/Network Description GStreamer AWS KVS plugin Author AWS KVS <kinesis-video-support@amazon.com>...
To avoid resetting environment variables every time, add the following exports to your ~/.profile
:
echo "" >> ~/.profileecho "# GStreamer" >> ~/.profileecho "export GST_PLUGIN_PATH=$GST_PLUGIN_PATH" >> ~/.profileecho "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> ~/.profile
Running GStreamer
After building the plugin, connect your USB camera to the Raspberry Pi and run the following command to stream video data to Kinesis Video Streams.
Be sure to replace the following with the actual values:
<KINESIS_VIDEO_STREAM_NAME>
<YOUR_ACCESS_KEY>
<YOUR_SECRET_KEY>
<YOUR_AWS_REGION>
Enhancing video quality (e.g., increasing resolution or frame rate) may result in higher AWS costs.
gst-launch-1.0 -v v4l2src device=/dev/video0 \ ! videoconvert \ ! video/x-raw,format=I420,width=320,height=240,framerate=5/1 \ ! x264enc bframes=0 key-int-max=45 bitrate=500 tune=zerolatency \ ! video/x-h264,stream-format=avc,alignment=au \ ! kvssink stream-name=<KINESIS_VIDEO_STREAM_NAME> storage-size=128 access-key="<YOUR_ACCESS_KEY>" secret-key="<YOUR_SECRET_KEY>" aws-region="<YOUR_AWS_REGION>"
You can verify the live stream by navigating to the Kinesis Video Streams management console.
Testing
Starting the Rekognition Video Stream Processor
Start the Rekognition Video stream processor. This service subscribes to the Kinesis Video Stream, detects faces using the face collection, and streams the results to the Kinesis Data Stream.
Run the following command to start the stream processor:
aws rekognition start-stream-processor \ --name face-detector-rekognition-stream-processor
Verify the status of the stream processor to ensure it is running:
aws rekognition describe-stream-processor \ --name face-detector-rekognition-stream-processor | grep "Status"
The expected output should show "Status": "RUNNING"
.
Capturing Faces
Once the USB camera captures video, the Rekognition Video stream processor analyzes the video stream and detects faces based on the face collection.
To check the results, view the Lambda function logs with the following command:
sam logs -n Function \ --stack-name face-detector-using-kinesis-video-streams \ --tail
The log records include detailed information about the stream processor events, such as the following example:
{ "InputInformation": { "KinesisVideo": { "StreamArn": "arn:aws:kinesisvideo:<AWS_REGION>:<AWS_ACCOUNT_ID>:stream/face-detector-kinesis-video-stream/xxxxxxxxxxxxx", "FragmentNumber": "91343852333181501717324262640137742175000164731", "ServerTimestamp": 1702208586.022, "ProducerTimestamp": 1702208585.699, "FrameOffsetInSeconds": 0.0, } }, "StreamProcessorInformation": {"Status": "RUNNING"}, "FaceSearchResponse": [ { "DetectedFace": { "BoundingBox": { "Height": 0.4744676, "Width": 0.29107505, "Left": 0.33036956, "Top": 0.19599175, }, "Confidence": 99.99677, "Landmarks": [ {"X": 0.41322955, "Y": 0.33761832, "Type": "eyeLeft"}, {"X": 0.54405355, "Y": 0.34024307, "Type": "eyeRight"}, {"X": 0.424819, "Y": 0.5417343, "Type": "mouthLeft"}, {"X": 0.5342691, "Y": 0.54362005, "Type": "mouthRight"}, {"X": 0.48934412, "Y": 0.43806323, "Type": "nose"}, ], "Pose": {"Pitch": 5.547308, "Roll": 0.85795176, "Yaw": 4.76913}, "Quality": {"Brightness": 57.938313, "Sharpness": 46.0298}, }, "MatchedFaces": [ { "Similarity": 99.986176, "Face": { "BoundingBox": { "Height": 0.417963, "Width": 0.406223, "Left": 0.28826, "Top": 0.242463, }, "FaceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "Confidence": 99.996605, "ImageId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "ExternalImageId": "iwasa", }, } ], } ],}
HLS URL for Video Playback
The logs also include the generated HLS URL for on-demand video playback, such as:
https://x-xxxxxxxx.kinesisvideo.<AWS_REGION>.amazonaws.com/hls/v1/getHLSMasterPlaylist.m3u8?SessionToken=xxxxxxxxxx
Open the HLS URL using a supported browser like Safari or Edge.
Chrome does not natively support HLS playback. You can use a third-party extension, such as Native HLS Playback.
Cleaning Up
Clean up all the AWS resources provisioned during this example with the following command:
aws rekognition stop-stream-processor \ --name face-detector-rekognition-stream-processorsam delete