admin管理员组

文章数量:1125618

I'm working on updating/replacing certain data in WordPress blog posts(1000+) from 2 categories from which I need to extract specific HTML content from two p tags and replace one of the p tags with new content. I'm trying to target specific elements within the HTML, such as link/image and price/discount price. Any assistance would be very welcomed.

I am attempting to extract the discount and original price from this element. From another category this element discount price is wrapped in del tags.

<p><span style="font-weight:bold;font-style: italic"><a target="_blank" href="/
" rel="nofollow sponsored noopener">117 EUR</a></span> instead of 296 EUR</p>
<p><span style="font-weight:bold;font-style: italic"><a target="_blank" href="/
" rel="nofollow sponsored noopener"><del>117 EUR</del></a></span> instead of 296 EUR</p>

Also I am trying to extract the url and image link from this element, and replace it with a new HTML structure.

<p><a style="font-size: 26px;text-decoration: none" target="_blank" href="/
" rel="nofollow sponsored noopener">Go to: <img decoding="async" width="130" style="border-radius:20px" src=".png"></a> </p>
The new replace structure for the second HTML element is

This is my attempt at achieving the desired result, however for some reason, the first 6, 7 posts get replaced properly and then other posts are replaced but broken. The content is the same in every post, except for the prices... I have tried with and without the date and tax_query, the result was the same. I am not that good at regex patterns, I am not sure if that or what is causing this behavior.

 $args = [
        'post_type' => 'post',
        'posts_per_page' => $limit,
        'offset' => $offset,
        'date_query' => [['after' => '1 month ago']],
        'tax_query' => [[
           'relation' => 'OR',
        [  'taxonomy' => 'category', 'field' => 'slug', 'terms' => ['category-1', 'category-2']],
        ]],
   ];

    $query = new WP_Query($args);

     if ($query->have_posts()) {
     $processed_posts = 0;
      while ($query->have_posts()) {
             $query->the_post();
            $post_id = get_the_ID();
            $post_content = get_post_field('post_content', $post_id);
   
   $dom = new DOMDocument();
            @$dom->loadHTML(mb_convert_encoding($post_content, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
            $xpath = new DOMXPath($dom);

                error_log("Processing post ID: " . $post_id);
            $oldStructure = $xpath->query("//a[contains(text(), 'Go to:') and @target='_blank' and @rel='nofollow sponsored noopener']")->item(0);
error_log("XPath query result for post ID: " . $post_id . ": " . ($oldStructure ? "Found" : "Not found"));

            if ($oldStructure) {
                error_log("Old structure found!");
                $productUrlNode = $xpath->query('//a[@style="font-size: 26px;text-decoration: none"]/@href')->item(0);
                $productUrl = $productUrlNode ? $productUrlNode->nodeValue : '';


                $imageSrcNode = $xpath->query('//p/a[@style="font-size: 26px;text-decoration: none"]/img/@src')->item(0);
                $imageSrc = $imageSrcNode ? $imageSrcNode->nodeValue : '';


                $discountPriceNodeQuery = "//p/span/a/del/text() | //p/span/a[not(del)]/text()";
                $discountPriceNodes = $xpath->query($discountPriceNodeQuery);
                
                $discountPrice = '';
                foreach ($discountPriceNodes as $node) {
                    $textContent = $node->nodeValue;
                    
                    if (preg_match('/(\d+)/', $textContent, $matches)) {
                        $discountPrice = $matches[1]; 
                        break; 
                    }
                }
                
                $originalPrice = '';
                $pNodeTexts = $xpath->query("//p[contains(., 'instead of')]");
                foreach ($pNodeTexts as $textNode) {
                    if (preg_match('/instead of (\d+)/', $textNode->nodeValue, $matches)) {
                        $originalPrice = $matches[1]; 
                        break; 
                    }
                }

      $pElement = $dom->createElement('p');
                $anchor = $dom->createElement('a');
                $anchor->setAttribute('href', esc_url($productUrl));
                $anchor->setAttribute('class', 'product-link_wrap');
                $anchor->setAttribute('target', '_blank');

                $buttonBlock = $dom->createElement('div');
                $buttonBlock->setAttribute('class', 'button-block');
                $anchor->appendChild($buttonBlock);

                $buttonImage = $dom->createElement('div');
                $buttonImage->setAttribute('class', 'button-image');
                $img = $dom->createElement('img');
                $img->setAttribute('src', esc_url($imageSrc));
                $img->setAttribute('alt', 'Product Image');
                $img->setAttribute('class', 'webpexpress-processed');
                $buttonImage->appendChild($img);
                $buttonBlock->appendChild($buttonImage);

                $productTitle = $dom->createElement('div');
                $productTitle->setAttribute('class', 'product-title');
                $productName = $dom->createElement('span', esc_html(get_the_title($post_id)));
                $productName->setAttribute('class', 'product-name');
                $productTitle->appendChild($productName);
                $buttonBlock->appendChild($productTitle);

                $productPrices = $dom->createElement('div');
                $productPrices->setAttribute('class', 'prices-container');
                $productOriginalPrice = $dom->createElement('span', esc_html($originalPrice));
                $productDiscountPrice = $dom->createElement('span', esc_html($discountPrice));
                $productPrices->appendChild($productOriginalPrice);
                $productPrices->appendChild($productDiscountPrice);
                $buttonBlock->appendChild($productPrices);

                $buttonContainer = $dom->createElement('div');
                $buttonContainer->setAttribute('class', 'button-container');
                $button = $dom->createElement('button');
                $button->setAttribute('class', 'product-button');
                $span = $dom->createElement('span', 'Go to Product');
                $button->appendChild($span);
                $svg = $dom->createElement('svg');
                $svg->setAttribute('xmlns', '');
                $svg->setAttribute('enable-background', 'new 0 0 24 24');
                $svg->setAttribute('viewBox', '0 0 24 24');
                $path = $dom->createElement('path');
                $path->setAttribute('d', 'M15.5,11.3L9.9,5.6c-0.4-0.4-1-0.4-1.4,0s-0.4,1,0,1.4l4.9,4.9l-4.9,4.9c-0.2,0.2-0.3,0.4-0.3,0.7c0,0.6,0.4,1,1,1c0.3,0,0.5-0.1,0.7-0.3l5.7-5.7c0,0,0,0,0,0C15.9,12.3,15.9,11.7,15.5,11.3z');
                $svg->appendChild($path);
                $button->appendChild($svg);
                $buttonContainer->appendChild($button);
                $buttonBlock->appendChild($buttonContainer);

                $pElement->appendChild($anchor);
                $oldStructure->parentNode->replaceChild($anchor, $oldStructure);

                $updated_content = $dom->saveHTML();
                error_log("New content: " . $updated_content);
                $result = wp_update_post([
                    'ID' => $post_id,
                    'post_content' => $updated_content,
                ]);
                
                if ($result === 0 || $result === false) {
                    error_log("Failed to update post ID: " . $post_id);
                } else {
                    error_log("Successfully updated post ID: " . $post_id);
                }

This is the correctly replaced element:

[29-Feb-2024 11:26:21 UTC] Processing post ID: 56526
[29-Feb-2024 11:26:21 UTC] XPath query result for post ID: 56526: Found
[29-Feb-2024 11:26:21 UTC] Old structure found!
[29-Feb-2024 11:26:21 UTC] Img: .png
[29-Feb-2024 11:26:21 UTC] URL: 
[29-Feb-2024 11:26:21 UTC] discount: 12
[29-Feb-2024 11:26:21 UTC] original: 133

The incorrectly replaced element:

[29-Feb-2024 11:26:21 UTC] Processing post ID: 56510
[29-Feb-2024 11:26:21 UTC] XPath query result for post ID: 56510: Found
[29-Feb-2024 11:26:21 UTC] Old structure found!
[29-Feb-2024 11:26:21 UTC] Img: 
[29-Feb-2024 11:26:21 UTC] URL: 
[29-Feb-2024 11:26:21 UTC] discount: 
[29-Feb-2024 11:26:21 UTC] original: 340

the incorrectly replaced element is being stripped of the p tag and therefore causing it to break, I assume...

<span style="font-weight: bold; font-style: italic;"><a href="; target="_blank" rel="nofollow sponsored noopener">28 EUR</a></span> instead of 340 EUR

本文标签: Extracting and Replacing HTML Post content with PHP DOM