Implementing Face Detector on USB Camera Connected to Raspberry Pi using Kinesis Video Streams and Rekognition Video

Takahiro Iwasa

Dec 9, 2023

9 min read

Kinesis Video Streams Rekognition Video

Using Kinesis Video Streams and Rekognition Video, I implemented a face detector on a USB camera connected to Raspberry Pi. Let me show you how to implement.

Searching faces in a collection in streaming video - Amazon Rekognition

docs.aws.amazon.com

Searching faces in a collection in streaming video - Amazon Rekognition

Overview of Amazon Rekognition face search in streaming video.

Overview

Prerequisites

Video Producer

In this post, the following is used as a video producer:

Raspberry Pi 4B with 4GB RAM
- Ubuntu 23.10 (installed using Raspberry Pi Imager)
USB Camera
GStreamer
- Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI

AWS Resources

This post uses the following:

AWS SAM CLI
Python 3.11

Creating SAM Application

You can pull an example code used in this post from my GitHub repository.

GitHub - iwstkhr/face-detector-using-kinesis-video-streams: AWS SAM example application which detects faces using Kinesis Video Streams and Rekognition Video

github.com

GitHub - iwstkhr/face-detector-using-kinesis-video-streams: AWS SAM example application which detects faces using Kinesis Video Streams and Rekognition Video

AWS SAM example application which detects faces using Kinesis Video Streams and Rekognition Video - iwstkhr/face-detector-using-kinesis-video-streams

Directory Structure

/
|-- src/
|   |-- app.py
|   `-- requirements.txt
|-- samconfig.toml
`-- template.yaml

AWS SAM Template

AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31
Description: face-detector-using-kinesis-video-streams

Resources:
  Function:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: face-detector-function
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.11
      Architectures:
        - arm64
      Timeout: 3
      MemorySize: 128
      Role: !GetAtt FunctionIAMRole.Arn
      Events:
        KinesisEvent:
          Type: Kinesis
          Properties:
            Stream: !GetAtt KinesisStream.Arn
            MaximumBatchingWindowInSeconds: 10
            MaximumRetryAttempts: 3
            StartingPosition: LATEST

  FunctionIAMRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: face-detector-function-role
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
        - arn:aws:iam::aws:policy/service-role/AWSLambdaKinesisExecutionRole
      Policies:
        - PolicyName: policy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - kinesisvideo:GetHLSStreamingSessionURL
                  - kinesisvideo:GetDataEndpoint
                Resource: !GetAtt KinesisVideoStream.Arn

  KinesisVideoStream:
    Type: AWS::KinesisVideo::Stream
    Properties:
      Name: face-detector-kinesis-video-stream
      DataRetentionInHours: 24

  RekognitionCollection:
    Type: AWS::Rekognition::Collection
    Properties:
      CollectionId: FaceCollection

  RekognitionStreamProcessor:
    Type: AWS::Rekognition::StreamProcessor
    Properties:
      Name: face-detector-rekognition-stream-processor
      KinesisVideoStream:
        Arn: !GetAtt KinesisVideoStream.Arn
      KinesisDataStream:
        Arn: !GetAtt KinesisStream.Arn
      RoleArn: !GetAtt RekognitionStreamProcessorIAMRole.Arn
      FaceSearchSettings:
        CollectionId: !Ref RekognitionCollection
        FaceMatchThreshold: 80
      DataSharingPreference:
        OptIn: false

  KinesisStream:
    Type: AWS::Kinesis::Stream
    Properties:
      Name: face-detector-kinesis-stream
      StreamModeDetails:
        StreamMode: ON_DEMAND

  RekognitionStreamProcessorIAMRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: face-detector-rekognition-stream-processor-role
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: rekognition.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonRekognitionServiceRole
      Policies:
        - PolicyName: policy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - kinesis:PutRecord
                  - kinesis:PutRecords
                Resource:
                  - !GetAtt KinesisStream.Arn

Python Script

requirements.txt

Leave it empty.

app.py

The Rekognition Video stream processor streams detected faces data to the Kinesis Data Stream, and it can be obtained as Base64 string (line 18). For information about the data structure, please refer to the official documentation.

Reading streaming video analysis results - Amazon Rekognition

docs.aws.amazon.com

Reading streaming video analysis results - Amazon Rekognition

You can use the Amazon Kinesis Data Streams Client Library to consume analysis results that are sent to the Amazon Kinesis Data Streams output stream. For more information, see Reading Data from a Kinesis Data Stream . Amazon Rekognition Video places a JSON frame record for each analyzed frame into the Kinesis output stream. Amazon Rekognition Video doesn't analyze every frame that's passed to it through the Kinesis video stream.

The Lambda function generates an HLS url using KinesisVideoArchivedMedia#get_hls_streaming_session_url API (line 52-64).

get_hls_streaming_session_url - Boto3 1.34.98 documentation

boto3.amazonaws.com

get_hls_streaming_session_url - Boto3 1.34.98 documentation

import base64
import json
import logging
from datetime import datetime, timedelta, timezone
from functools import cache

import boto3

JST = timezone(timedelta(hours=9))
kvs_client = boto3.client('kinesisvideo')
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)


def lambda_handler(event: dict, context: dict) -> dict:
    for record in event['Records']:
        base64_data = record['kinesis']['data']
        stream_processor_event = json.loads(base64.b64decode(base64_data).decode())
        # For more information the result, please refer to https://docs.aws.amazon.com/rekognition/latest/dg/streaming-video-kinesis-output.html

        if not stream_processor_event['FaceSearchResponse']:
            continue

        logger.info(stream_processor_event)
        url = get_hls_streaming_session_url(stream_processor_event)
        logger.info(url)

    return {
        'statusCode': 200,
    }


@cache
def get_kvs_am_client(api_name: str, stream_arn: str):
    # See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesisvideo/client/get_data_endpoint.html
    endpoint = kvs_client.get_data_endpoint(
        APIName=api_name.upper(),
        StreamARN=stream_arn
    )['DataEndpoint']
    return boto3.client('kinesis-video-archived-media', endpoint_url=endpoint)


def get_hls_streaming_session_url(stream_processor_event: dict) -> str:
    # See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesis-video-archived-media/client/get_hls_streaming_session_url.html

    kinesis_video = stream_processor_event['InputInformation']['KinesisVideo']
    stream_arn = kinesis_video['StreamArn']
    kvs_am_client = get_kvs_am_client('get_hls_streaming_session_url', stream_arn)
    start_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST)
    end_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST) + timedelta(minutes=1)

    return kvs_am_client.get_hls_streaming_session_url(
        StreamARN=stream_arn,
        PlaybackMode='ON_DEMAND',
        HLSFragmentSelector={
            'FragmentSelectorType': 'SERVER_TIMESTAMP',
            'TimestampRange': {
                'StartTimestamp': start_timestamp,
                'EndTimestamp': end_timestamp,
            },
        },
        ContainerFormat='FRAGMENTED_MP4',
        Expires=300,
    )['HLSStreamingSessionURL']

Build and Deploy

Build and deploy with the following command.

sam build
sam deploy

Indexing Faces

Index faces which you want to detect by the USB camera. Before running the following command, replace placeholders <YOUR_BUCKET>, <YOUR_OBJECT> and <PERSON_ID> with your actual values.

docs.aws.amazon.com

IndexFaces - Amazon Rekognition

Detects faces in the input image and adds them to the specified collection.

aws rekognition index-faces \
  --image '{"S3Object": {"Bucket": "<YOUR_BUCKET>", "Name": "<YOUR_OBJECT>"}}' \
  --collection-id FaceCollection \
  --external-image-id <PERSON_ID>

Rekognition does not store actual images in collections.

Adding faces to a collection - Amazon Rekognition

docs.aws.amazon.com

Adding faces to a collection - Amazon Rekognition

You can use the IndexFaces operation to detect faces in an image and add them to a collection. For each face detected, Amazon Rekognition extracts facial features and stores the feature information in a database. In addition, the command stores metadata for each face that's detected in the specified face collection. Amazon Rekognition doesn't store the actual image bytes.

For each face detected, Amazon Rekognition extracts facial features and stores the feature information in a database. In addition, the command stores metadata for each face that’s detected in the specified face collection. Amazon Rekognition doesn’t store the actual image bytes.

Setting up Video Producer

In this post, a Raspberry Pi 4B with 4GB RAM, Ubuntu 23.10 installed, is used as a video producer.

Build the AWS-provided Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI.

GitHub - awslabs/amazon-kinesis-video-streams-producer-sdk-cpp: Amazon Kinesis Video Streams Producer SDK for C++ is for developers to install and customize for their connected camera and other devices to securely stream video, audio, and time-encoded data to Kinesis Video Streams.

github.com

GitHub - awslabs/amazon-kinesis-video-streams-producer-sdk-cpp: Amazon Kinesis Video Streams Producer SDK for C++ is for developers to install and customize for their connected camera and other devices to securely stream video, audio, and time-encoded data to Kinesis Video Streams.

Amazon Kinesis Video Streams Producer SDK for C++ is for developers to install and customize for their connected camera and other devices to securely stream video, audio, and time-encoded data to K...

Although AWS provides Docker images containing the GStreamer plugin, these did not work on my Raspberry Pi.

Example: Kinesis Video Streams Producer SDK GStreamer plugin - Amazon Kinesis Video Streams

docs.aws.amazon.com

Example: Kinesis Video Streams Producer SDK GStreamer plugin - Amazon Kinesis Video Streams

Send video from a GStreamer plugin to the Kinesis Video Streams Producer SDK.

Building GStreamer Plugin

Run the following command to build the plugin. According to your machine specs, it may take about 20 minutes or more.

sudo apt update
sudo apt upgrade
sudo apt install \
  make \
  cmake \
  build-essential \
  m4 \
  autoconf \
  default-jdk
sudo apt install \
  libssl-dev \
  libcurl4-openssl-dev \
  liblog4cplus-dev \
  libgstreamer1.0-dev \
  libgstreamer-plugins-base1.0-dev \
  gstreamer1.0-plugins-base-apps \
  gstreamer1.0-plugins-bad \
  gstreamer1.0-plugins-good \
  gstreamer1.0-plugins-ugly \
  gstreamer1.0-tools

git clone https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-cpp.git
mkdir -p amazon-kinesis-video-streams-producer-sdk-cpp/build
cd amazon-kinesis-video-streams-producer-sdk-cpp/build

sudo cmake .. -DBUILD_GSTREAMER_PLUGIN=ON -DBUILD_JNI=TRUE
sudo make

Check the building result with the following command.

cd ~/amazon-kinesis-video-streams-producer-sdk-cpp
export GST_PLUGIN_PATH=`pwd`/build
export LD_LIBRARY_PATH=`pwd`/open-source/local/lib
gst-inspect-1.0 kvssink

The below should be displayed.

Factory Details:
  Rank                     primary + 10 (266)
  Long-name                KVS Sink
  Klass                    Sink/Video/Network
  Description              GStreamer AWS KVS plugin
  Author                   AWS KVS <[email protected]>
...

For next startup, it is useful to add export <XXX_PATH> to your ~/.profile.

echo "" >> ~/.profile
echo "# GStreamer" >> ~/.profile
echo "export GST_PLUGIN_PATH=$GST_PLUGIN_PATH" >> ~/.profile
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> ~/.profile

Running GStreamer

Connect the USB camera to your device, and run the following command.

Keep in mind that enhancing video data quality may result in additional costs.

gst-launch-1.0 -v v4l2src device=/dev/video0 \
  ! videoconvert \
  ! video/x-raw,format=I420,width=320,height=240,framerate=5/1 \
  ! x264enc bframes=0 key-int-max=45 bitrate=500 tune=zerolatency \
  ! video/x-h264,stream-format=avc,alignment=au \
  ! kvssink stream-name=<KINESIS_VIDEO_STREAM_NAME> storage-size=128 access-key="<YOUR_ACCESS_KEY>" secret-key="<YOUR_SECRET_KEY>" aws-region="<YOUR_AWS_REGION>"

You can check the live streaming video from the camera in the Kinesis Video Streams management console.

Testing

Starting Rekognition Video stream processor

Start the Rekognition Video stream processor. This will subscribe the Kinesis Video Stream, detect faces using face collection, and stream the result to the Kinesis Data Stream.

StartStreamProcessor - Amazon Rekognition

docs.aws.amazon.com

StartStreamProcessor - Amazon Rekognition

Starts processing a stream processor. You create a stream processor by calling . To tell StartStreamProcessor which stream processor to start, use the value of the Name field specified in the call to CreateStreamProcessor .

aws rekognition start-stream-processor --name face-detector-rekognition-stream-processor

Check that the stream processor status is "Status": "RUNNING".

aws rekognition describe-stream-processor --name face-detector-rekognition-stream-processor | grep "Status"

Capturing Faces

After capturing faces by the USB camera, the video data will be processed by the following order:

The video data will be streamed to the Kinesis Video Stream.
The streamed data will be processed by the Rekognition Video stream processor.
The stream processor results will be streamed to the Kinesis Data Stream.
HLS urls will be generated by the Lambda function.

Check the Lambda function log records by the following command.

sam logs -n Function --stack-name face-detector-using-kinesis-video-streams --tail

The log records include the stream processor event data like the following.

{
    "InputInformation": {
        "KinesisVideo": {
            "StreamArn": "arn:aws:kinesisvideo:<AWS_REGION>:<AWS_ACCOUNT_ID>:stream/face-detector-kinesis-video-stream/xxxxxxxxxxxxx",
            "FragmentNumber": "91343852333181501717324262640137742175000164731",
            "ServerTimestamp": 1702208586.022,
            "ProducerTimestamp": 1702208585.699,
            "FrameOffsetInSeconds": 0.0,
        }
    },
    "StreamProcessorInformation": {"Status": "RUNNING"},
    "FaceSearchResponse": [
        {
            "DetectedFace": {
                "BoundingBox": {
                    "Height": 0.4744676,
                    "Width": 0.29107505,
                    "Left": 0.33036956,
                    "Top": 0.19599175,
                },
                "Confidence": 99.99677,
                "Landmarks": [
                    {"X": 0.41322955, "Y": 0.33761832, "Type": "eyeLeft"},
                    {"X": 0.54405355, "Y": 0.34024307, "Type": "eyeRight"},
                    {"X": 0.424819, "Y": 0.5417343, "Type": "mouthLeft"},
                    {"X": 0.5342691, "Y": 0.54362005, "Type": "mouthRight"},
                    {"X": 0.48934412, "Y": 0.43806323, "Type": "nose"},
                ],
                "Pose": {"Pitch": 5.547308, "Roll": 0.85795176, "Yaw": 4.76913},
                "Quality": {"Brightness": 57.938313, "Sharpness": 46.0298},
            },
            "MatchedFaces": [
                {
                    "Similarity": 99.986176,
                    "Face": {
                        "BoundingBox": {
                            "Height": 0.417963,
                            "Width": 0.406223,
                            "Left": 0.28826,
                            "Top": 0.242463,
                        },
                        "FaceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
                        "Confidence": 99.996605,
                        "ImageId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
                        "ExternalImageId": "iwasa",
                    },
                }
            ],
        }
    ],
}

You can also find the HLS url within the log records like the following.

https://x-xxxxxxxx.kinesisvideo.<AWS_REGION>.amazonaws.com/hls/v1/getHLSMasterPlaylist.m3u8?SessionToken=xxxxxxxxxx

To watch the on-demand video, open the url using Safari or Edge.

Chrome does not currently support HLS natively but there are third-party extensions, for example, Native HLS Playback.

chromewebstore.google.com

Native HLS Playback

Allow the browser to play HLS video urls (m3u8) 'natively'

Cleaning Up

Clean up the provisioned AWS resources with the following command.

aws rekognition stop-stream-processor --name face-detector-rekognition-stream-processor
sam delete

Pricing

Here is an example simulation based on the following condition.

AWS region ap-northeast-1 will be used.
The USB camera will always stream video data to the Kinesis Video Stream. Its size will be about 1GB per day and stored during one day.
The Rekognition Video stream processor will always analyze the Kinesis Video Stream.
The face collection has 10 facial data.
The Kinesis Data Stream will always provision one shard.
The Lambda (Arm/128MB) function will be invoked every second. Billable duration will be about 100ms.
Users will watch on-demand video using HLS urls one hour in one day.

Kinesis Video Streams

aws.amazon.com

Amazon Kinesis Video Streams Pricing - Secure Video Ingestion for Analysis & Storage - Amazon Web Services

See pricing for Amazon Kinesis Video Analytics. For Amazon Kinesis Video Streams, you pay only for the volume of data you ingest, store, and consume through the service. Capture, process, and store video streams & media streams for computer-vision apps, real-time video analytics, and media streaming analytics.

Item	Price
Data Ingested into Kinesis Video Streams (per GB data ingested)	$0.01097
Data Consumed from Kinesis Video Streams (per GB data egressed)	$0.01097
Data Consumed from Kinesis Video Streams using HLS (per GB data egressed)	$0.01536
Data Stored in Kinesis Video Streams (per GB-Month data stored)	$0.025

Simulation

Item	Expression	Price
Data Ingested	1 GB * 31 days * $0.01097	$0.34007
Data Consumed	1 GB * 31 days * $0.01097	$0.34007
Data Consumed using HLS	1 hour * 31 days * $0.01536	$0.47616
Data Stored	1 GB * 31 days * $0.025	$0.775
Total	-	$1.9

Rekognition Video

aws.amazon.com

Amazon Rekognition – Pricing - AWS

Find out about pricing for Amazon Rekognition.

Item	Price
Face Vector Storage	$0.000013/face metadata per month
Face Search	$0.15/min

Simulation

Item	Expression	Price
Face Vector Storage	10 faces * $0.000013	$0.00013
Face Search	60 minutes * 24 hours * 31 days * $0.15	$6,696
Total	-	$6,696

If you want to detect faces on streaming videos, using motion detection should be strongly recommended.

Kinesis Data Streams

aws.amazon.com

Managed Streaming Data Service | Amazon Kinesis Data Streams Pricing | Amazon Web Services

Amazon Kinesis Data Streams is a fully managed, serverless streaming data service that makes it easy to elastically ingest and store logs, events, clickstreams, and other forms of streaming data in real time. Kinesis Data Streams uses simple pay-as-you-go pricing.

Item	Price
Shard Hour (1MB/second ingress, 2MB/second egress)	$0.0195
PUT Payload Units, per 1,000,000 units	$0.0215

Simulation

Item	Expression	Price
Shard Hour	1 shard * 24 hours * 31 days * $0.0195	$14.508
PUT Payload Units	1 unit * 60 seconds * 60 minutes * 24 hours * 31 days * 0.0215 / 1000000	$0.0575856
Total	-	$14.6

Lambda

aws.amazon.com

Serverless Computing – AWS Lambda Pricing – Amazon Web Services

With AWS Lambda, you pay only for what you use. You are charged based on the number of requests for your functions and the time your code executes.

Item	Price
Price per 1ms (128MB)	$0.0000000017

Simulation

Item	Expression	Price
Price per 1ms	100ms * 60 seconds * 60 minutes * 24 hours * 31 days * $0.0000000017	$0.455328
Total	-	$0.5