admin管理员组

文章数量:1403506

I need to remove everything thing starting with "[@" and ending with "]" in the given XML. The text that falls in-between these sqaure brackers should also be removed. I tried the replace function but it turned the XML into a String and did not remove the full string including [@ and ]. Please see below sample input and expected output;

Sample input. See the [@ inside the < differences> tag:

<?xml version='1.0' encoding='UTF-8'?>
<ns0:trans xmlns:ns0="/1.0" id="1495144" TPId="4aec" change="0" count="1" dateStamp="2024-08-02T03:07:48" effectiveDate="2024-08-01" transId="12447055">
    <data id="FF2AD08D621">
        <StateFlag>0</StateFlag>
        <DCC>B</DCC>
        <LastName>MMNGorthy</LastName>
        <Name>Scott</Name>
        <Structure>NonProfit</Structure>
        <BusinessType>Engineering</BusinessType>
        <meams>AI</meams>
        <Count>0</Count>
        <CTier>000</CTier>
        <CTCond>111</CTCond>
        <CTPl>222</CTPl>
        <DO>1</DO>
        <AuthObject>0</AuthObject>
    </data>
    <differences ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce">
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@dateModified" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@dateModified" oldValue="2023-09-29" newValue="2023-11-05" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@manuscript" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@manuscript" oldValue="PGR_CommercialAuto_Pages_US_7_8_0_0" newValue="PGR_TransACTMaster2_10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@caption" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@caption" oldValue="PGR CommercialAuto Pages US 7.8.0.0" newValue="PGR TransACT Master 2.10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
    </differences>
</ns0:trans>

Expected output. Notice how every text that starts with "[@" has been removed under < differences> tag:

<?xml version='1.0' encoding='UTF-8'?>
<ns0:trans xmlns:ns0="/1.0" id="1495144" TPId="4aec" change="0" count="1" dateStamp="2024-08-02T03:07:48" effectiveDate="2024-08-01" transId="12447055">
    <data id="FF2AD08D621">
        <StateFlag>0</StateFlag>
        <DCC>B</DCC>
        <LastName>MMNGorthy</LastName>
        <Name>Scott</Name>
        <Structure>NonProfit</Structure>
        <BusinessType>Engineering</BusinessType>
        <meams>AI</meams>
        <Count>0</Count>
        <CTier>000</CTier>
        <CTCond>111</CTCond>
        <CTPl>222</CTPl>
        <DO>1</DO>
        <AuthObject>0</AuthObject>
    </data>
    <differences ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce">
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@dateModified" fullPath="/session/properties/@dateModified" oldValue="2023-09-29" newValue="2023-11-05" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@manuscript" fullPath="/session/properties/@manuscript" oldValue="PGR_CommercialAuto_Pages_US_7_8_0_0" newValue="PGR_TransACTMaster2_10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@caption" fullPath="/session/properties/@caption" oldValue="PGR CommercialAuto Pages US 7.8.0.0" newValue="PGR TransACT Master 2.10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
    </differences>
</ns0:trans>

Here is the code that I tried:

%dw 2.0
import * from dw::core::Strings
output text/plain
---
(write(payload, "application/json") replace "[@" with "") replace "]" with ""

Output of my code has made everything Null under < differences> tag:

{
  "trans": {
    "data": {
      "StateFlag": "0",
      "DCC": "B",
      "LastName": "MMNGorthy",
      "Name": "Scott",
      "Structure": "NonProfit",
      "BusinessType": "Engineering",
      "meams": "AI",
      "Count": "0",
      "CTier": "000",
      "CTCond": "111",
      "CTPl": "222",
      "DO": "1",
      "AuthObject": "0"
    },
    "differences": {
      "difference": null,
      "difference": null,
      "difference": null
    }
  }
}

I need to remove everything thing starting with "[@" and ending with "]" in the given XML. The text that falls in-between these sqaure brackers should also be removed. I tried the replace function but it turned the XML into a String and did not remove the full string including [@ and ]. Please see below sample input and expected output;

Sample input. See the [@ inside the < differences> tag:

<?xml version='1.0' encoding='UTF-8'?>
<ns0:trans xmlns:ns0="http://Test.Sample.data/1.0" id="1495144" TPId="4aec" change="0" count="1" dateStamp="2024-08-02T03:07:48" effectiveDate="2024-08-01" transId="12447055">
    <data id="FF2AD08D621">
        <StateFlag>0</StateFlag>
        <DCC>B</DCC>
        <LastName>MMNGorthy</LastName>
        <Name>Scott</Name>
        <Structure>NonProfit</Structure>
        <BusinessType>Engineering</BusinessType>
        <meams>AI</meams>
        <Count>0</Count>
        <CTier>000</CTier>
        <CTCond>111</CTCond>
        <CTPl>222</CTPl>
        <DO>1</DO>
        <AuthObject>0</AuthObject>
    </data>
    <differences ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce">
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@dateModified" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@dateModified" oldValue="2023-09-29" newValue="2023-11-05" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@manuscript" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@manuscript" oldValue="PGR_CommercialAuto_Pages_US_7_8_0_0" newValue="PGR_TransACTMaster2_10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@caption" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@caption" oldValue="PGR CommercialAuto Pages US 7.8.0.0" newValue="PGR TransACT Master 2.10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
    </differences>
</ns0:trans>

Expected output. Notice how every text that starts with "[@" has been removed under < differences> tag:

<?xml version='1.0' encoding='UTF-8'?>
<ns0:trans xmlns:ns0="http://Test.Sample.data/1.0" id="1495144" TPId="4aec" change="0" count="1" dateStamp="2024-08-02T03:07:48" effectiveDate="2024-08-01" transId="12447055">
    <data id="FF2AD08D621">
        <StateFlag>0</StateFlag>
        <DCC>B</DCC>
        <LastName>MMNGorthy</LastName>
        <Name>Scott</Name>
        <Structure>NonProfit</Structure>
        <BusinessType>Engineering</BusinessType>
        <meams>AI</meams>
        <Count>0</Count>
        <CTier>000</CTier>
        <CTCond>111</CTCond>
        <CTPl>222</CTPl>
        <DO>1</DO>
        <AuthObject>0</AuthObject>
    </data>
    <differences ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce">
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@dateModified" fullPath="/session/properties/@dateModified" oldValue="2023-09-29" newValue="2023-11-05" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@manuscript" fullPath="/session/properties/@manuscript" oldValue="PGR_CommercialAuto_Pages_US_7_8_0_0" newValue="PGR_TransACTMaster2_10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@caption" fullPath="/session/properties/@caption" oldValue="PGR CommercialAuto Pages US 7.8.0.0" newValue="PGR TransACT Master 2.10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
    </differences>
</ns0:trans>

Here is the code that I tried:

%dw 2.0
import * from dw::core::Strings
output text/plain
---
(write(payload, "application/json") replace "[@" with "") replace "]" with ""

Output of my code has made everything Null under < differences> tag:

{
  "trans": {
    "data": {
      "StateFlag": "0",
      "DCC": "B",
      "LastName": "MMNGorthy",
      "Name": "Scott",
      "Structure": "NonProfit",
      "BusinessType": "Engineering",
      "meams": "AI",
      "Count": "0",
      "CTier": "000",
      "CTCond": "111",
      "CTPl": "222",
      "DO": "1",
      "AuthObject": "0"
    },
    "differences": {
      "difference": null,
      "difference": null,
      "difference": null
    }
  }
}
Share Improve this question edited Mar 21 at 3:04 Zak01 asked Mar 20 at 21:26 Zak01Zak01 335 bronze badges 5
  • Please share what you tried and the result you got. – aled Commented Mar 20 at 21:34
  • @aled. Thanks for getting back to me. I've updated the question with the code I've tried and the output I got. I used the replace function but it made everything null under < differences> tag. It also converted the XML into String which I am trying to avoid. Any help from you will be greatly appreciated. Thanks! – Zak01 Commented Mar 21 at 3:02
  • The information in your comment should go into the question too, not in a comment. – aled Commented Mar 21 at 3:29
  • Why is the script writing a JSON if the expected output is in XML? – aled Commented Mar 21 at 11:27
  • Does it really have to replace the regex everywhere? or is it only going to be at trans.differences.*difference.@fullPath ? Both are possible, but if you do not need it then it is not worth it to make the code complex just to make it generic – Harshank Bansal Commented Mar 21 at 16:01
Add a comment  | 

1 Answer 1

Reset to default 2

Replacing over the entire XML as a string causes the script to lose the structure provided by the XML object, which would make the transformation more fragile and prone to errors. What you really should do is to use a recursive function to transverse the input object, using pattern-matching to identify the current type and transform the values inside object attributes only. I recommend to get familiar with the basic structure of such functions because they are commonly needed.

fun recursiveFunction(x)=
    x match {
        case o is Object -> o mapObject ((value, key) -> (key): recursiveFunction(value)
        case a is Array -> a map recursiveFunction($)
        case s is String -> recursiveFunction(s)
        case n is Number -> recursiveFunction(n)
        ...
        else -> x 
    }

Note that in XML there is no concept of arrays, like in JSON, however I added a check for them anyway in case the input changes. JSON in the other hand does not has a concept of attributes so it may not be useful for you. You can remove or comment it.

I also added a check for the value of string XML elements. If you only want to transform attributes you can remove it.

To make it more generic we can pass the actual regular expression as a parameter. This is the recursive function

fun removeRegex(x, r: Regex)=
    x match {
        case o is Object -> o mapObject ((value, key) -> (key) @((removeInAttributes(log(key.@), r))): removeRegex(value, r) )
        case s is String -> removeInStrings(s, r) // if needed for element values too
        case a is Array -> a map removeRegex($, r) // XML doesn't has arrays but I'm leaving this branch in case the input format changes
        else -> x
    }

About the actual replacement of the string value, the script in the question only replaces the delimiters. You want to remove also the content between those delimiters. For that we can use a regular expression. We need to be careful to stop at the first ending delimiter or it could consume the text between two separate square brackets sections.

/\[@[^\]]+]/

I moved the actual removal to a separate function for clarity and because I reused it for Strings. In case the input object doesn't has attributes I used a second definition of the function to catch it without having to check for nulls.

fun removeInAttributes(o: Object, r: Regex) = o mapObject ((value, key) -> (key): removeInStrings(value, r) )

fun removeInAttributes(n: Null, r: Regex) = null // if there are no attributes do nothing

fun removeInStrings(s: String, r: Regex) = s replace r with ("")

Finally you just call the main function with the input XML and the regular expression:

---
removeRegex(payload, /\[@[^\]]+]/)

Output:

<?xml version='1.0' encoding='UTF-8'?>
<ns0:trans xmlns:ns0="http://Test.Sample.data/1.0" id="1495144" TPId="4aec" change="0" count="1" dateStamp="2024-08-02T03:07:48" effectiveDate="2024-08-01" transId="12447055">
  <data id="FF2AD08D621">
    <StateFlag>0</StateFlag>
    <DCC>B</DCC>
    <LastName>MMNGorthy</LastName>
    <Name>Scott</Name>
    <Structure>NonProfit</Structure>
    <BusinessType>Engineering</BusinessType>
    <meams>AI</meams>
    <Count>0</Count>
    <CTier>000</CTier>
    <CTCond>111</CTCond>
    <CTPl>222</CTPl>
    <DO>1</DO>
    <AuthObject>0</AuthObject>
  </data>
  <differences ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce">
    <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@dateModified" fullPath="/session/properties/@dateModified" oldValue="2023-09-29" newValue="2023-11-05" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce"/>
    <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@manuscript" fullPath="/session/properties/@manuscript" oldValue="PGR_CommercialAuto_Pages_US_7_8_0_0" newValue="PGR_TransACTMaster2_10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce"/>
    <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@caption" fullPath="/session/properties/@caption" oldValue="PGR CommercialAuto Pages US 7.8.0.0" newValue="PGR TransACT Master 2.10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce"/>
  </differences>
</ns0:trans>

本文标签: muleRemove data inside the arrays of an XMLStack Overflow