admin管理员组

文章数量:1296922

I’m new to AWS Glue but want to use it to ingest large amounts of data from a CSV file stored in S3 into a PostgreSQL database.

The data is client provided and can contain a mix of “required” and “optional” fields, with optional fields being anything the client requires (I.e. client 1 provides a field called “telephone_num” whereas client 2 could provide a field called “gross_annual_profit” etc). Required fields (as specified by the schema) are “customer_ref”, “first_name”, “last_name” etc.

For example: -

Client 1 would provide a CSV file as follows:

customerRef,firstName,lastName,telephoneNum,position
'CUST1234','Adam','Ant','07777777777','Lead Singer'
'CUST9876','Hank','Marvin','07777777778','Guitarist'

I would need to transform this to the following PostgreSQL row:

customer_ref,first_name,last_name,additional_data
'CUST1234','Adam','Ant','{"telephoneNum":"07777777777","position":"Lead Singer"}'
'CUST9876','Hank','Marvin','{"telephoneNum: "07777777778","position":"Guitarist"}'

Client 2 would provide a CSV file as follows:

customerRef,firstName,lastName,oscarNominations,oscarWins
'STAR1234','Tom','Hanks','6','2'
'STAR9876','Kate','Winslet','7','1'

I would need to transform this to the following PostgreSQL row:

customer_ref,first_name,last_name,additional_data
'STAR1234','Tom','Hanks','{"oscarNominations":"6","oscarWins":"2"}'
'STAR9876','Kate','Winslet','{"oscarNominations: "7","oscarWins":"1"}'

I need to map the optional fields unknown to the schema to a JSON block and store them as a string in the PostgreSQL database along with the required fields as a single row.

Is this possible using Glue?

I have spent some considerable time researching this but can’t find anything close to my use case. I have also created a simple Glue job but can't find a "stock" transformation appropriate to what I'm trying to achieve. TIA

本文标签: aws glueHandling columns in CSV not known by the schemaStack Overflow