admin管理员组文章数量:1314574
Sample documents of a collection
DB: MYDB
Collection: MYCOLL
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=5242;AF=987;C=82622"
}
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=h242;AF=9y87;C=8w622"
}
I would like to have expected output as following:
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=5242;AF=987;C=82622",
"A":"5242",
"AF":"987",
"C":"82622"
}
I have tried to do it myself, but for now I created the following aggregate code for a specific value in CSV. which is not what I want to do. Can you please tell me how can I make it better in performance and create (updating the current collection document) in the form as I showed above?
db["MYCOLL"].aggregate([
{
$project: {
INFO_ARR: {
$split:["$INFO",";"]
}
}
},
{
$project: {
AF_ARR: {
$arrayElemAt: [ "$INFO_ARR" , 1 ] }
}
},
{
$project: {
AF_FREQ_ARR: {
$split: [ "$AF_ARR" , "=" ]}
}
},
{
$project: {
AF: {
$arrayElemAt: [ "$AF_FREQ_ARR" , 1 ] }
}
},
{ $merge : { into: { db: "MYDB", coll: "MYCOLL" }, on: "_id", whenMatched: "merge", whenNotMatched: "insert" } }
])
Note: I can not use javascript as the collection is of size 400GB and is having millions of records. Hence, Javascript may be too slow to update all of the records. Am I wrong about this ?
Sample documents of a collection
DB: MYDB
Collection: MYCOLL
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=5242;AF=987;C=82622"
}
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=h242;AF=9y87;C=8w622"
}
I would like to have expected output as following:
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=5242;AF=987;C=82622",
"A":"5242",
"AF":"987",
"C":"82622"
}
I have tried to do it myself, but for now I created the following aggregate code for a specific value in CSV. which is not what I want to do. Can you please tell me how can I make it better in performance and create (updating the current collection document) in the form as I showed above?
db["MYCOLL"].aggregate([
{
$project: {
INFO_ARR: {
$split:["$INFO",";"]
}
}
},
{
$project: {
AF_ARR: {
$arrayElemAt: [ "$INFO_ARR" , 1 ] }
}
},
{
$project: {
AF_FREQ_ARR: {
$split: [ "$AF_ARR" , "=" ]}
}
},
{
$project: {
AF: {
$arrayElemAt: [ "$AF_FREQ_ARR" , 1 ] }
}
},
{ $merge : { into: { db: "MYDB", coll: "MYCOLL" }, on: "_id", whenMatched: "merge", whenNotMatched: "insert" } }
])
Note: I can not use javascript as the collection is of size 400GB and is having millions of records. Hence, Javascript may be too slow to update all of the records. Am I wrong about this ?
Share Improve this question asked Jan 30 at 12:57 ShahanShahan 491 silver badge7 bronze badges 02 Answers
Reset to default 3After splitting the array on semi-colons ;
, split each item of that result on =
. That gives an array of pairs like [ ["A", "5242"], ["AF", "987"], ["C", "82622"]
.
Then use arrayToObject
which will treat the 1st item as the key and the 2nd item as the value. After that, merge with Root and remove the temp array.
I've done the transformations as separate steps but you can combine them into one:
db.mycoll.aggregate([
{
$set: {
INFO_ARR: { $split: ["$INFO", ";"] }
}
},
{
$set: {
INFO_ARR: {
$map: {
input: "$INFO_ARR",
in: { $split: ["$$this", "="] }
}
}
}
},
{
$set: {
INFO_ARR: { $arrayToObject: "$INFO_ARR" }
}
},
{
$replaceWith: {
$mergeObjects: ["$$ROOT", "$INFO_ARR"]
}
},
{
$unset: "INFO_ARR"
},
{
$merge: {
into: {
db: "MYDB",
coll: "MYCOLL"
},
on: "_id",
whenMatched: "merge",
whenNotMatched: "insert"
}
}
])
Mongo Playground
Also, this can be done as an updateMany
query since every doc needs to be updated anyway: Mongo Playground with update & aggregation syntax
You can use the following aggregation pipeline to split the contents of the INFO
property and add the values as properties:
[
{
$project: {
INFO: 1
}
},
{
$set: {
info_arr: {
"$split": [
"$INFO",
";"
]
}
}
},
{
$set: {
info_arr2: {
"$map": {
"input": "$info_arr",
"in": {
k: {
$first: {
$split: [
"$$this",
"="
]
}
},
v: {
$last: {
$split: [
"$$this",
"="
]
}
}
}
}
}
}
},
{
$set: {
info_obj: {
"$arrayToObject": "$info_arr2"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$$ROOT",
"$info_obj"
]
}
}
},
{
$unset: [
"info_arr",
"info_arr2",
"info_obj"
]
}
]
- The pipeline first splits the field content at the semicolons, then splits the individual lines at the equality sign and assigns the key to a
k
field, the value to av
field. - At this point, the
$arrayToObject
operator is used to create an object from the array contents. - Afterwards, the properties of the newly created subdocument are merged with the
$$ROOT
document. - At the end, the temporary properties are removed using an
$unset
stage.
At the end, the documents look like this:
{
"A": "5242",
"AF": "987",
"C": "82622",
"INFO": "A=5242;AF=987;C=82622",
"_id": ObjectId("5a934e000102030405000001")
}
If you add a $merge
stage, you can add the new properties to existing documents in the collection.
See this playground to test.
本文标签:
版权声明:本文标题:Converting csv (";" based) string value of a field to new key:value items in same document in Mongodb aggregat 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741964048a2407443.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论