admin管理员组

文章数量:1302278

I am taking over a website that is about 20years old. The website has hundreds of thousands of images in folders right on the web server taking u close to 100gigs. The folder structure is quite hierarchical and I'd rather not modify it too much. The database design isn't great and currently, to display an image there's simple php code that simply concatenates the various folder metadata and the file name to display a file. ex https://website/country/county/cemetery/filename.

The above has become extremely messy and there are many broken links because typos are constantly being corrected (thereby changing the path), files moved to different folders, etc.

I was hoping to leverage AWS S3 for a few reasons though this is my first time trying it. The API is very easy to work with, I already have about a dozen endpoints working such as Creating new Folders, renaming files, etc.

My question came about as I realized there doesn't seem to be a Unique ID as part of the Folder metadata in S3. The Object Key, which all the documentation says to use to identify an object can actually change with any object modification.

I think I might have a way forward becuase I know some of the issues the clients are facing. In part, the database and the folder contents seem to be out of sync. Meaning files have been uploaded or deleted without this being updated in the database.

I was thinking that I would create a new MS SQL database (becuase this is what I am familiar with) and the front end will be in Blazor. I thought I could store the Unique Database column Ids as Tags in S3.

If my Image Table has a column for Country FK and Cemetery FK then I could store these in the S3 Tags collection.

This way, I could Query the DB for all Images in Cemetery ID AND Query the API for all Tags that equal Cemetery ID and this should allow me to spot any anomalies. Also, I ought to be able to then more easily maintain the Tags of objects whenever something is modified by querying for the old value and updating for the new value.

This should also then allow me to visualize the images when using the GetPreSignedUrl function by string interpolation from the Tags ex

var objectKey = $"bucket/Country/{CountryTag}/{CemeteryTag}/{Filename}"

The overall question is is there a better way to do this or am I just discovering something everyone already knows?

Thanx for your ideas.

I am taking over a website that is about 20years old. The website has hundreds of thousands of images in folders right on the web server taking u close to 100gigs. The folder structure is quite hierarchical and I'd rather not modify it too much. The database design isn't great and currently, to display an image there's simple php code that simply concatenates the various folder metadata and the file name to display a file. ex https://website/country/county/cemetery/filename.

The above has become extremely messy and there are many broken links because typos are constantly being corrected (thereby changing the path), files moved to different folders, etc.

I was hoping to leverage AWS S3 for a few reasons though this is my first time trying it. The API is very easy to work with, I already have about a dozen endpoints working such as Creating new Folders, renaming files, etc.

My question came about as I realized there doesn't seem to be a Unique ID as part of the Folder metadata in S3. The Object Key, which all the documentation says to use to identify an object can actually change with any object modification.

I think I might have a way forward becuase I know some of the issues the clients are facing. In part, the database and the folder contents seem to be out of sync. Meaning files have been uploaded or deleted without this being updated in the database.

I was thinking that I would create a new MS SQL database (becuase this is what I am familiar with) and the front end will be in Blazor. I thought I could store the Unique Database column Ids as Tags in S3.

If my Image Table has a column for Country FK and Cemetery FK then I could store these in the S3 Tags collection.

This way, I could Query the DB for all Images in Cemetery ID AND Query the API for all Tags that equal Cemetery ID and this should allow me to spot any anomalies. Also, I ought to be able to then more easily maintain the Tags of objects whenever something is modified by querying for the old value and updating for the new value.

This should also then allow me to visualize the images when using the GetPreSignedUrl function by string interpolation from the Tags ex

var objectKey = $"bucket/Country/{CountryTag}/{CemeteryTag}/{Filename}"

The overall question is is there a better way to do this or am I just discovering something everyone already knows?

Thanx for your ideas.

Share Improve this question asked Feb 10 at 20:58 Tim CadieuxTim Cadieux 4559 silver badges23 bronze badges 6
  • My advice is this-- don't try to mediate between an old system and a clean new one. Start fresh with a well-normalized database system, and separate concerns. It shouldn't really matter HOW the images are stored-- maybe on the existing file system using just folder names to find files, maybe with GUIDs on a CDN, whatever. So long as you can map whatever mess exists now onto your new system, you're gucci. – Bennyboy1973 Commented Feb 10 at 21:27
  • 2 "Object Key, which all the documentation says to use to identify an object can actually change with any object modification" No, it can't. The object key uniquely identifies a S3 object. It never changes. – Anon Coward Commented Feb 10 at 21:36
  • The object key can never!!! change. What can happen is that you delete a file from key X and place it at a different key Y. In that regard S3 cannot help you if you have to constantly manually fix and move files around. – luk2302 Commented Feb 10 at 22:00
  • If I look at an object key right now it looks like this - Volunteers/Test/IMG_1372.jpeg. If I rename Test to XXX the image Object Key is now - Volunteers/xxx/IMG_1372.jpeg. I realize that its a NEW Object Key but the overall thought is the same, it means I cannot use it to be the base for this image. – Tim Cadieux Commented Feb 10 at 22:02
  • A rename in S3 is a copy and delete. The copy creates a new object and the delete removes the old one. S3 is not a file system, and treating it like one will lead to confusion. S3 object keys never change. – Anon Coward Commented Feb 10 at 22:04
 |  Show 1 more comment

1 Answer 1

Reset to default 0

Amazon S3 is an object store. It supports the concept of folders, but the reality is that folders do not exist.

If you store an object called Volunteers/Test/IMG_1372.jpeg, then the Key of the object contains the full path. It doesn't actually get placed inside a Volunteers or Test folder -- rather, it is simply named Volunteers/Test/IMG_1372.jpeg. Some API calls let you navigate by folder level, but this is only provided as a convenience.

Once the number of objects grows large (eg 10,000+), you might find it easier if you do not use the filesystem as a database. That is, do not assign properties or attributes to objects based on their location in a hierarchy. Instead:

  • Store objects with a UUID as the name, with all objects in the same folder
  • The name and location of the object has no significance -- it's just an identifier
  • Use a database to keep track of the object and its attributes or, more likely, use your existing database and just add the UUID of the object to that database
  • Do not store information in object tags -- you need to retrieve tags individually, it isn't possible to query tags across multiple objects. Instead, put that information in your database.
  • There is never a need to rename an object -- just update the relevant information in the database

The benefit of this method is that the database and folder contents won't drift out of sync because all information is kept in the database, rather than in the name of the files and folders. The disadvantage is that you can't look at an object and know what it is without consulting the database.

There is actually a new Amazon S3 service that can provide a database populated with information about objects stored in an S3 bucket, including tags. The table will be automatically updated by S3. I don't think this is a good match for your particular situation, but mentioning it here since it is useful in situations where there are 100k+ objects in a bucket. See: Introducing queryable object metadata for Amazon S3 buckets (preview) | AWS News Blog

本文标签: amazon s3What metadata should I track in my database to keep track of my images stored in s3Stack Overflow