How to correctly read a CSV file while escaping delimiter comma placed within square brackets using Apache Spark and Scala? - St

IT技术

更新时间：2025-02-042

admin管理员组
文章数量:1200977

I have a following CSV which is published by a third party with the values for a specific column containing a comma (for some unexplainable reason). The values for the column is either absent or enclosed inside square brackets/double quotes as it represents a range.

Following is one of such record from the CSV:

A,B
xxxxxxxxx,"['05-01', '06-30']"
yyyyyyyyy,"['04-01', '04-30']"
zzzzzzzzz,

The culprit is the second column as obvious. Is there a way to correctly parse this CSV in Apache Spark (Scala) so as to have the following dataframe:

+---+----------+------------------------+
|A             |B                       |
+---+-----------------------------------+
|xxxxxxxxx     |"['05-01', '06-30']"    |
|yyyyyyyyy     |"['04-01', '04-30']"    |
|zzzzzzzzz     |null                    |
+---+----------+------------------------+

Following is one of such record from the CSV:

A,B
xxxxxxxxx,"['05-01', '06-30']"
yyyyyyyyy,"['04-01', '04-30']"
zzzzzzzzz,

The culprit is the second column as obvious. Is there a way to correctly parse this CSV in Apache Spark (Scala) so as to have the following dataframe:

+---+----------+------------------------+
|A             |B                       |
+---+-----------------------------------+
|xxxxxxxxx     |"['05-01', '06-30']"    |
|yyyyyyyyy     |"['04-01', '04-30']"    |
|zzzzzzzzz     |null                    |
+---+----------+------------------------+

Share Improve this question edited Jan 21 at 15:26 asked Jan 21 at 15:18 ashish.g 5881 gold badge12 silver badges29 bronze badges

This question is similar to: Parsing .csv file using Java 8 Stream. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. – talex Commented Jan 21 at 15:23
The suggested question doesn't talk about escaping delimiters. I didn't ask about a library to parse CSV but to deal with uncanny delimiter inside the column value. – ashish.g Commented Jan 21 at 15:29

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

The default values of delimiter and quote options allow you to parse given csv correctly:

scala> scala.io.Source.fromFile("source.csv").mkString
res2: String =
"A,B
xxxxxxxxx,"['05-01', '06-30']"
yyyyyyyyy,"['04-01', '04-30']"
zzzzzzzzz,
"

scala> val df = spark.read.option("header", "true").csv("source.csv")
df: org.apache.spark.sql.DataFrame = [A: string, B: string]

scala> df.show()
+---------+------------------+
|        A|                 B|
+---------+------------------+
|xxxxxxxxx|['05-01', '06-30']|
|yyyyyyyyy|['04-01', '04-30']|
|zzzzzzzzz|              NULL|
+---------+------------------+

scala>

NOTE that the value for B does not have the double quotes around each value. Which is the correct interpretation of given csv content per csv format.

Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:

"aaa","b CRLF bb","ccc" CRLF zzz,yyy,xxx

本文标签：

版权声明：本文标题：How to correctly read a CSV file while escaping delimiter comma placed within square brackets using Apache Spark and Scala? - St 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1738622681a2103279.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

javascript - adding callback to function - always - Stack Overflow

IT技术

22分钟前

I found myself using a weird way to add callback functions to my functions and I was wondering if there

dns - Getting setting cookies on different domains, with JavaScript or other - Stack Overflow

IT技术

22分钟前

I need to setget the cookies stored at first.example while browsing second.example, I have full access

javascript - lodash: How to do case insensitive _.find() - Stack Overflow

IT技术

21分钟前

I have a complex data structure, with nested arrays and I need to find an element with a string value u

How to manage same slug posts suffix method?

IT技术

21分钟前

domain.tldpost-name | post_id = 5anddomain.tldpost-name-2 | post_id = 77The above scenario is the solution brou

upgrade - Artifactory Upgrading from 7.55.* to latest self hosted version how to handle? - Stack Overflow

IT技术

21分钟前

I have two instances which run a 7.55.9 (old one..) self hosted artifactory version that i want to migr

Custom post query with multiple fields with number values. Totalisation needed

IT技术

19分钟前

The following code gives me several tables (sorted by category) with custom fields and number values. What I need now, i

css - Overcome Outlook darkmode color invert - Stack Overflow

IT技术

19分钟前

I know there are many threads regarding this issues but I couldn't find a solution and not sure ho

javascript - this.state is undefined during onPress event in react native - Stack Overflow

IT技术

19分钟前

Hello i new in react native, my code is: import React, {View,Text,TextInput,Component} from 'reac

javascript - how to make a div match a dynamically created google chart height - Stack Overflow

IT技术

18分钟前

I have a page that dynamically loads a google timeline chart onto a div. The timeline increases in heig

categories - wp_dropdown_categories() works correctly but the list is not filtered in admin for custom post type. What is the pr

IT技术

17分钟前

When the filter is executed the correct query is passed asedit.php?s&post_status=all&post_type=banner&actio

javascript - How to return a Promise with setInterval() - Stack Overflow

IT技术

15分钟前

I am trying to return a Promise object ever 1000ms, but i am not sure how to access the data returned i

plugins - Let users sell video in my website

IT技术

15分钟前

Closed. This question is off-topic. It is not currently accepting answers.Asking to recommend a product (plugin, theme,

javascript - How can I open a new tab or window when a link is clicked? - Stack Overflow

IT技术

10分钟前

I have same ancorshyperlink in my html file. These point to a new website outside my site.So I want t

javascript - how to detect CTRL+C and CTRL+V key pressing using regular expression? - Stack Overflow

IT技术

9分钟前

I have blocked all aTOz character input for my text field using regular expression in my JavaScript but

javascript - Count number of keys in object with Coffeescript - Stack Overflow

IT技术

7分钟前

I would like to know how many keys are in my coffeescript object.I can do it with this in js:Object.key

Solana NFT Collection creation fails due to expecting Public Key while creating Candy Machine - Stack Overflow

IT技术

5分钟前

I am trying to create a testnet collection in-order to perform a transaction using an NFT browser which

javascript - How to create a pop-up div on mouse over and stay when click - Stack Overflow

IT技术

3分钟前

I'm trying to create popup that can show when mouse over it and will stay when click on the link .

monorepo - [NX]: Deleting config lib files make affected conmmand take every projects as affected - Stack Overflow

IT技术

2分钟前

When deleting the .eslintrc.js file for example inside a library in an Nx workspace, the nx affected co

apache - Override htacces rule only for specific directory

IT技术

2分钟前

I have a WordPress site with ithemes security installed plugin. I want to disable this rule:RewriteCond %{HTTP_USER_AGE

regex - Javascript function to validate time 00:00 with regular expression - Stack Overflow

IT技术

55秒前

I am trying to create a javascript function with regular expression to validate and format the time 24

发表评论

全部评论 0

暂无评论

编程频道|软件玩家 - 软件改变生活！

How to correctly read a CSV file while escaping delimiter comma placed within square brackets using Apache Spark and Scala? - St

1 Answer 1

更多相关文章

javascript - adding callback to function - always - Stack Overflow

dns - Getting setting cookies on different domains, with JavaScript or other - Stack Overflow

javascript - lodash: How to do case insensitive _.find() - Stack Overflow

How to manage same slug posts suffix method?

upgrade - Artifactory Upgrading from 7.55.* to latest self hosted version how to handle? - Stack Overflow

Custom post query with multiple fields with number values. Totalisation needed

css - Overcome Outlook darkmode color invert - Stack Overflow

javascript - this.state is undefined during onPress event in react native - Stack Overflow

javascript - how to make a div match a dynamically created google chart height - Stack Overflow

categories - wp_dropdown_categories() works correctly but the list is not filtered in admin for custom post type. What is the pr

javascript - How to return a Promise with setInterval() - Stack Overflow

plugins - Let users sell video in my website

javascript - How can I open a new tab or window when a link is clicked? - Stack Overflow

javascript - how to detect CTRL+C and CTRL+V key pressing using regular expression? - Stack Overflow

javascript - Count number of keys in object with Coffeescript - Stack Overflow

Solana NFT Collection creation fails due to expecting Public Key while creating Candy Machine - Stack Overflow

javascript - How to create a pop-up div on mouse over and stay when click - Stack Overflow

monorepo - [NX]: Deleting config lib files make affected conmmand take every projects as affected - Stack Overflow

apache - Override htacces rule only for specific directory

regex - Javascript function to validate time 00:00 with regular expression - Stack Overflow

发表评论

推荐文章

c++ - Why does using separate lock_guard for cv wait and pop() cause a segmentation fault? - Stack Overflow

catalystbyzoho - Zoho Catalyst: How to Delete Entire Data in a particular table and Start Fresh in Catalyst Datastore? - Stack O

javascript - Jade : New warning on multiple attributes - Stack Overflow

javascript confirm dialog box before close browser window - Stack Overflow

kotlin - One-Shot pureEdDSA signing with Bouncycastle - Stack Overflow

热门文章

javascript - Html5 component for rendering and annotating PDF documents in the browser? - Stack Overflow

webview - I am currently upgrading the project from Electron 15 to 29, but the embedded document websites consistently pop up a

How do I set up a listener in jQueryjavascript to monitor a if a value in the textbox has changed? - Stack Overflow

Get all comments of author&#39;s posts

Calculating JavaScript date time with offset - Stack Overflow

javascript - Convert associative array to numeric array - Stack Overflow

c# - How to delete cookies in AppFixture - Stack Overflow

posts - How to prevent WordPress from updating the modified time?

Duplicate new categories across multisite network

Laravel livewire model is not working when updating input field from javascript? - Stack Overflow

最新文章

电脑小白怎么重装系统_电脑小白u盘重装系统详细教程【小白必看】

忘记电脑密码如何修改win7

Windows7 SP1更新升级失败

Windows7BT种子大全

适合win7的python版本_Win7操作系统上安装 Python3.X环境

javascript - Which HTML tags can be used with onClick event? - Stack Overflow

Using workload identity to download zip file from azure blob in Azure DevOps pipeline - Stack Overflow

javascript - Can I access custom html tag &lt;component&gt; or &lt;slot&gt; contents - Stack Overflow

regex - Javascript function to validate time 00:00 with regular expression - Stack Overflow

Enabling Autocompletion and Help in Restricted R Environments for Exams under Windows - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

Get all comments of author's posts

javascript - Can I access custom html tag <component> or <slot> contents - Stack Overflow