admin管理员组

文章数量:1122832

Problem

My job is to extract data from an XML document and build an HTML page using that data. I'm using PHP to parse and manipulate the XML document.

One portion of the XML document contains inlined elements used in a fashion similar to this:

<desc>
    These are the <special>&lt;best&gt;</special>
    chocolate chip cookies <special>&lt;EVER&gt;</special>
</desc>

I'd like to convert this into my HTML document like so:

These are the <em>&lt;best&gt;</em> chocolate chip cookies <em>&lt;EVER&gt;</em>

So that it displays in the browser as

These are the <best> chocolate chip cookies <EVER>

I'm currently using PHP's SimpleXML module. I have no problem parsing the XML document and retrieving the parent element (<desc>).

My Attempt

I thought about manipulating the raw XML string and doing a search and replace to convert the <special> tags to my target tag (<em>), but, of course, XML will just parse it just the same, only under the name <em> instead.

I also considered retrieving the XML directly from the <desc> node at the point of use with asXML() and then doing the search and replace there and then simply echoing the raw string into the HTML document, but at that point it appears that the <special> nodes have already been parsed away and I just get the string:

These are the <best> chocolate chip cookies <EVER>

I've also looked into the XMLReader class, but it seems to read the XML from a stream, so I can't access the nodes I need arbitrarily when I need them.

I'd appreciate any advice. Thanks.

Problem

My job is to extract data from an XML document and build an HTML page using that data. I'm using PHP to parse and manipulate the XML document.

One portion of the XML document contains inlined elements used in a fashion similar to this:

<desc>
    These are the <special>&lt;best&gt;</special>
    chocolate chip cookies <special>&lt;EVER&gt;</special>
</desc>

I'd like to convert this into my HTML document like so:

These are the <em>&lt;best&gt;</em> chocolate chip cookies <em>&lt;EVER&gt;</em>

So that it displays in the browser as

These are the <best> chocolate chip cookies <EVER>

I'm currently using PHP's SimpleXML module. I have no problem parsing the XML document and retrieving the parent element (<desc>).

My Attempt

I thought about manipulating the raw XML string and doing a search and replace to convert the <special> tags to my target tag (<em>), but, of course, XML will just parse it just the same, only under the name <em> instead.

I also considered retrieving the XML directly from the <desc> node at the point of use with asXML() and then doing the search and replace there and then simply echoing the raw string into the HTML document, but at that point it appears that the <special> nodes have already been parsed away and I just get the string:

These are the <best> chocolate chip cookies <EVER>

I've also looked into the XMLReader class, but it seems to read the XML from a stream, so I can't access the nodes I need arbitrarily when I need them.

I'd appreciate any advice. Thanks.

Share Improve this question edited Nov 25, 2024 at 0:45 asked Nov 23, 2024 at 3:20 user23062437user23062437 7
  • Use proper XML parser to get desc element, get text pieces and build the html part with them. – LMC Commented Nov 23, 2024 at 3:37
  • @LMC I can't build a new string from the component strings because their original locations in the string are not accessible. Using my example, I could get the string "These are the chocolate chip cookies" and the strings "&lt;best&gt;" and "&lt;EVER&gt;", but there's no reliable way to know how to recombine those strings. – user23062437 Commented Nov 23, 2024 at 3:51
  • “So that it displays in the browser…” - I just have to ask, is all you care about is the display? If so, can you attack this with just CSS? Or do you really need to transform it? – Chris Haas Commented Nov 23, 2024 at 4:33
  • “I thought about manipulating the raw XML … but, of course, XML will just parse it just the same” - I’m not clear what you mean by that? – Chris Haas Commented Nov 23, 2024 at 4:37
  • @ChrisHaas The idea is that the data is stored in a general format using XML. My job is to take that general data and create an HTML page to display it. The point is that I need to get the data in the page in the first place, so CSS isn't even on the table until then. – user23062437 Commented Nov 23, 2024 at 4:43
 |  Show 2 more comments

2 Answers 2

Reset to default 0

Here is a solution that creates a DOM object from a SimpleXMLElement, and iterates over its child nodes to build the HTML:

$xml = <<<XML
<desc>
    These are the <special>&lt;best&gt;</special> chocolate chip cookies <special>&lt;EVER&gt;</special>
</desc>
XML;

$sx = new SimpleXMLElement($xml);
$dom = dom_import_simplexml($sx);

$html = '';
foreach($dom->childNodes as $node)
{
    switch($node->nodeType)
    {
        case XML_ELEMENT_NODE:
            if($node->tagName=='special')
                $html .= '<em>'.htmlspecialchars($node->textContent).'</em>';
            break;
        case XML_TEXT_NODE:
            $html .= htmlspecialchars($node->data);
            break;
    }
}

echo trim($html);

Output:

These are the <em>&lt;best&gt;</em> chocolate chip cookies <em>&lt;EVER&gt;</em>

(demo)

XSLT is a language for exactly that - transforming an XML into another XML or HTML. PHP supports XSLT 1.0 with ext/xslt.

<?php
$xml = <<<'XML'
<desc>
    These are the <special>&lt;best&gt;</special>
    chocolate chip cookies <special>&lt;EVER&gt;</special>
</desc>
XML;

$xslt = <<<'XSLT'
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="html"/>

  <xsl:template match="/desc">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="special">
    <em><xsl:apply-templates/></em>
  </xsl:template>

</xsl:stylesheet>
XSLT;

// load the content
$content = new DOMDocument();
$content->loadXML($xml);
// load the template
$template = new DOMDocument();
$template->loadXML($xslt);
// bootstrap XSLT
$processor = new XSLTProcessor();
$processor->importStylesheet($template);
// transform and output
echo $processor->transformToXml($content);

Output

    These are the <em>&lt;best&gt;</em>
    chocolate chip cookies <em>&lt;EVER&gt;</em>

本文标签: phpConvert an inlined XML element to HTMLStack Overflow