admin管理员组

文章数量:1122832

I am using the following aggregate pipeline in MongoDB to traverse a two-way graph:

{ $match: { from: "some_root" } },
{
  $graphLookup: {
    from: "connections",
    startWith: "$to",
    connectFromField: "to",
    connectToField: "from",
    as: "children",
    depthField: "depth",
    maxDepth: 100
  },
}

With my data structure, this results in zero or more output documents which all have the same array of connection documents in their children fields. I would like to do further processing to those connection documents but avoid working with the duplicates.

How can I select only the first document's children array and promote them to output documents for processing in the pipeline's next steps?

Adding the following stages results in what I want but with duplicate documents for each output document of the $graphLookup stage:

{ $unwind: { path: "$children" } },
{ $replaceRoot: { newRoot: "$children" } }

To me, this seems like the semantically correct approach and is missing before it only a stage to "$select" the children field of the first document for processing.

I am aware that a $group stage might be able to filter out duplicates but that feels like a convoluted solution to a simple problem (i.e., why filter out duplicates if you can avoid having those duplicates in the first place).

Pseudo-example for clarification

The $graphLookup stage results in:

{
  _id: 1,
  children: [
    { _id: 10 },
    { _id: 11 }
  ]
},
{
  _id: 2,
  children: [
    { _id: 10 },
    { _id: 11 }
  ]
}

From which I wish to extract one set of children and promote them to documents for later steps in the pipeline, i.e., to have as input documents:

{ _id: 10 },
{ _id: 11 }

I am using the following aggregate pipeline in MongoDB to traverse a two-way graph:

{ $match: { from: "some_root" } },
{
  $graphLookup: {
    from: "connections",
    startWith: "$to",
    connectFromField: "to",
    connectToField: "from",
    as: "children",
    depthField: "depth",
    maxDepth: 100
  },
}

With my data structure, this results in zero or more output documents which all have the same array of connection documents in their children fields. I would like to do further processing to those connection documents but avoid working with the duplicates.

How can I select only the first document's children array and promote them to output documents for processing in the pipeline's next steps?

Adding the following stages results in what I want but with duplicate documents for each output document of the $graphLookup stage:

{ $unwind: { path: "$children" } },
{ $replaceRoot: { newRoot: "$children" } }

To me, this seems like the semantically correct approach and is missing before it only a stage to "$select" the children field of the first document for processing.

I am aware that a $group stage might be able to filter out duplicates but that feels like a convoluted solution to a simple problem (i.e., why filter out duplicates if you can avoid having those duplicates in the first place).

Pseudo-example for clarification

The $graphLookup stage results in:

{
  _id: 1,
  children: [
    { _id: 10 },
    { _id: 11 }
  ]
},
{
  _id: 2,
  children: [
    { _id: 10 },
    { _id: 11 }
  ]
}

From which I wish to extract one set of children and promote them to documents for later steps in the pipeline, i.e., to have as input documents:

{ _id: 10 },
{ _id: 11 }
Share Improve this question asked yesterday bassebasse 1,2971 gold badge25 silver badges41 bronze badges 2
  • 1 Limit 1 ({ $limit: 1 }) before $unwind? But is it possible that an object appears in the children array for other documents, not in the first document and do you want those objects? – Yong Shun Commented yesterday
  • 1 Wow, so simple. How could I not think of this :D Thank you very much! If you want to write that up as an answer, I'll accept. And to answer your question: No, in my case the children arrays will always be identical. (That probably is indicative that I should optimize the pipeline further.) – basse Commented yesterday
Add a comment  | 

1 Answer 1

Reset to default 1

Written answer from the comment as requested.

Apply the $limit stage to limit 1 document before the $unwind stage as the Post Owner needs the objects in the children array for the first document only returning from the previous stage.

[
  ...,  // Previous stages
  { $limit: 1 },
  { $unwind: { path: "$children" } },
  { $replaceRoot: { newRoot: "$children" } }
]

本文标签: mongodbDeconstruct array field of single document to output documents in MongoStack Overflow