How to Enable Partition Projection in AWS Athena

Athena Partition Projection, introduced in June 2020, eliminates the need to run MSCK REPAIR TABLE
to add new partitions manually.
Building Backend
The key point in the CloudFormation template is the GlueTable.TableInput.Parameters
section of the GlueTable
definition (lines 27–35). This section enables Partition Projection with the following important properties:
projection.enabled
: Enables partition projection.projection.year_month.format
: Specifies the date format for partitions.projection.year_month.range
: Defines the range for partitions, supportingNOW
with offsets (e.g.,NOW+9HOURS
).
The projection.year_month.range
property is treated as UTC. If you intend to use a different timezone, you need to specify the time difference using a format like NOW+9HOURS
.
For more details, please refer to the official documentation.
AWSTemplateFormatVersion: "2010-09-09"Description: Stack for Athena partition projection sampleResources: S3: Type: AWS::S3::Bucket Properties: BucketName: athena-partition-projection-logs BucketEncryption: ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: AES256
GlueDatabase: Type: AWS::Glue::Database Properties: DatabaseInput: Name: sample CatalogId: !Ref AWS::AccountId
GlueTable: Type: AWS::Glue::Table Properties: DatabaseName: !Ref GlueDatabase CatalogId: !Ref AWS::AccountId TableInput: TableType: EXTERNAL_TABLE Parameters: classification: json "projection.enabled": true "projection.year_month.format": yyyy/MM "projection.year_month.interval": 1 "projection.year_month.interval.unit": MONTHS "projection.year_month.range": 2021/09,NOW "projection.year_month.type": date "storage.location.template": s3://athena-partition-projection-logs/${year_month} StorageDescriptor: Columns: - Name: id Type: int - Name: message Type: string Location: !Sub s3://${S3}/ InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: false NumberOfBuckets: 0 SerdeInfo: SerializationLibrary: org.openx.data.jsonserde.JsonSerDe Parameters: paths: id,message StoredAsSubDirectories: false PartitionKeys: - Name: year_month Type: string Retention: 0 Name: sample_logs
Deploy the CloudFormation stack with the following command:
aws cloudformation deploy \ --template-file stack.yml \ --stack-name athena-partition-projection-sample
Uploading Data
Upload data to the S3 bucket for testing:
echo '{"id": 1, "message": "hello"}' > 2021-09.jsonecho '{"id": 2, "message": "world"}' > 2021-10.jsonaws s3 cp 2021-09.json s3://athena-partition-projection-logs/2021/09/aws s3 cp 2021-10.json s3://athena-partition-projection-logs/2021/10/
Verify that the files are uploaded:
aws s3 ls s3://athena-partition-projection-logs/2021/
Querying Data
Run the following SQL query to fetch data from the 2021/09
partition:
SELECT * FROM "sample"."sample_logs"WHERE year_month = '2021/09'LIMIT 10;
Expected result:
1 hello 2021/09
Similarly, query the 2021/10
partition:
SELECT * FROM "sample"."sample_logs"WHERE year_month = '2021/10'LIMIT 10;
Expected result:
2 world 2021/10
Cleaning Up
Clean up all the AWS resources provisioned during this example with the following command:
aws s3 rm --recursive s3://athena-partition-projection-logsaws cloudformation delete-stack --stack-name athena-partition-projection-sample