admin管理员组

文章数量:1124398

I have an S3 bucket which does not have versioning or lifecycle rules set due to a client decision and it contains data from as old as 10 years. However, we also want to keep a backup of the files that has been worked on in the last 30 days.

I am planning to create a new S3, turn on versioning and set a lifecycle rule to delete files older than 30 days. After that I will run a cronjob to do the aws s3 sync from the source S3 to the destination S3.

So, the files who are older than 30 days will get deleted from the destination s3. Which is fine. However, my concern is while doing the aws s3 sync command after that, it will restore the old files to the destination which were deleted. Is that correct? If so, how to resolve this and only keep the files for 30 days only?

I have an S3 bucket which does not have versioning or lifecycle rules set due to a client decision and it contains data from as old as 10 years. However, we also want to keep a backup of the files that has been worked on in the last 30 days.

I am planning to create a new S3, turn on versioning and set a lifecycle rule to delete files older than 30 days. After that I will run a cronjob to do the aws s3 sync from the source S3 to the destination S3.

So, the files who are older than 30 days will get deleted from the destination s3. Which is fine. However, my concern is while doing the aws s3 sync command after that, it will restore the old files to the destination which were deleted. Is that correct? If so, how to resolve this and only keep the files for 30 days only?

Share Improve this question asked 2 days ago shohkhanshohkhan 857 bronze badges 5
  • I think you will need to find or create a more sophisticated tool than aws s3 sync. One that can ignore source files older than a certain date and destination files being absent (archived). – jarmod Commented 2 days ago
  • You could setup replication, and setup your lifecycle rules. Live replication will only replicate new objects, it won't copy missing objects. Note that pretty much whatever you do, the lifecycle rule will only remove objects 30 days after you copy them, not 30 days after they were created in the primary bucket. – Anon Coward Commented 2 days ago
  • I don't think your plan will work. When you sync the objects to a new bucket, they will all have a Modification Date of 'now', so the lifecycle rule won't delete any. If your goal is to keep the existing bucket but only have the 'new' bucket contain 'recent' files, then use S3 Replication. By default it will not copy any historical objects, but it will copy any new objects created. You can then find a way to copy the objects from the last 30 days. – John Rotenstein Commented yesterday
  • @JohnRotenstein and AnonCoward, I have looked into replication before. However, the issue is that requires versioning to be turned on, and the multiple teams of the client update a lot of files from tens to hundreds of times a day. That would mean each of the recent files will have a lot of copies and that becomes expensive for us and the client. So we do not have the versioning on. However, if I turn versioning on and set the lifecycle rule for versioned files on the S3 to be zero, does it mean no version will be created? That would solve it, but I have not seen explicit docs on this. – shohkhan Commented yesterday
  • Versions would be created, but the lifecycle rules would delete them each day. – John Rotenstein Commented yesterday
Add a comment  | 

2 Answers 2

Reset to default 1

This isn't possible with s3 sync, but it would be a perfect use case for S3 bucket Event notifications

You can use the Amazon S3 Event Notifications feature to receive notifications when certain events happen in your S3 bucket. To enable notifications, add a notification configuration that identifies the events that you want Amazon S3 to publish. Make sure that it also identifies the destinations where you want Amazon S3 to send the notifications. You store this configuration in the notification subresource that's associated with a bucket.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html

The idea would be to have an event notification on new file creation or file modification on bucket A. Then, have this event notification trigger a lambda function which copies the file from bucket A to bucket B.

You could build your own replication.

  • Create an Amazon S3 Event on the S3 bucket that triggers an AWS Lambda function whenever a new object is created
  • Code the AWS Lambda function to copy the object to the other bucket

This way, you will not require Versioning and you can also add logic to only copy objects in particular paths or with particular extensions (eg just .csv files).

The code is very simple, see: AWS-Lambda function (Python) to copy file from S3 - perform manipulation - store output in another S3 - Stack Overflow

本文标签: amazon web servicesKeep S3 backup for 30 days using the aws s3 sync commandStack Overflow