How to Enable Partition Projection in AWS Athena

How to Enable Partition Projection in AWS Athena

Takahiro Iwasa
Takahiro Iwasa
2 min read
Athena

Athena Partition Projection, introduced in June 2020, eliminates the need to run MSCK REPAIR TABLE to add new partitions manually.

Building Backend

The key point in the CloudFormation template is the GlueTable.TableInput.Parameters section of the GlueTable definition (lines 27–35). This section enables Partition Projection with the following important properties:

  • projection.enabled: Enables partition projection.
  • projection.year_month.format: Specifies the date format for partitions.
  • projection.year_month.range: Defines the range for partitions, supporting NOW with offsets (e.g., NOW+9HOURS).
💡 Tip

The projection.year_month.range property is treated as UTC. If you intend to use a different timezone, you need to specify the time difference using a format like NOW+9HOURS.

For more details, please refer to the official documentation.

stack.yml
AWSTemplateFormatVersion: "2010-09-09"
Description: Stack for Athena partition projection sample
Resources:
S3:
Type: AWS::S3::Bucket
Properties:
BucketName: athena-partition-projection-logs
BucketEncryption:
ServerSideEncryptionConfiguration:
- ServerSideEncryptionByDefault:
SSEAlgorithm: AES256
GlueDatabase:
Type: AWS::Glue::Database
Properties:
DatabaseInput:
Name: sample
CatalogId: !Ref AWS::AccountId
GlueTable:
Type: AWS::Glue::Table
Properties:
DatabaseName: !Ref GlueDatabase
CatalogId: !Ref AWS::AccountId
TableInput:
TableType: EXTERNAL_TABLE
Parameters:
classification: json
"projection.enabled": true
"projection.year_month.format": yyyy/MM
"projection.year_month.interval": 1
"projection.year_month.interval.unit": MONTHS
"projection.year_month.range": 2021/09,NOW
"projection.year_month.type": date
"storage.location.template": s3://athena-partition-projection-logs/${year_month}
StorageDescriptor:
Columns:
- Name: id
Type: int
- Name: message
Type: string
Location: !Sub s3://${S3}/
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: false
NumberOfBuckets: 0
SerdeInfo:
SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
Parameters:
paths: id,message
StoredAsSubDirectories: false
PartitionKeys:
- Name: year_month
Type: string
Retention: 0
Name: sample_logs

Deploy the CloudFormation stack with the following command:

Terminal window
aws cloudformation deploy \
--template-file stack.yml \
--stack-name athena-partition-projection-sample

Uploading Data

Upload data to the S3 bucket for testing:

Terminal window
echo '{"id": 1, "message": "hello"}' > 2021-09.json
echo '{"id": 2, "message": "world"}' > 2021-10.json
aws s3 cp 2021-09.json s3://athena-partition-projection-logs/2021/09/
aws s3 cp 2021-10.json s3://athena-partition-projection-logs/2021/10/

Verify that the files are uploaded:

Terminal window
aws s3 ls s3://athena-partition-projection-logs/2021/

Querying Data

Run the following SQL query to fetch data from the 2021/09 partition:

SELECT * FROM "sample"."sample_logs"
WHERE year_month = '2021/09'
LIMIT 10;

Expected result:

1 hello 2021/09

Similarly, query the 2021/10 partition:

SELECT * FROM "sample"."sample_logs"
WHERE year_month = '2021/10'
LIMIT 10;

Expected result:

2 world 2021/10

Cleaning Up

Clean up all the AWS resources provisioned during this example with the following command:

Terminal window
aws s3 rm --recursive s3://athena-partition-projection-logs
aws cloudformation delete-stack --stack-name athena-partition-projection-sample
Takahiro Iwasa

Takahiro Iwasa

Software Developer
Involved in the requirements definition, design, and development of cloud-native applications using AWS. Japan AWS Top Engineers 2020-2023.