admin管理员组文章数量:1287598
I'm a web editor that uses WordPress and my site has a bit of an annoying problem. We basically have tens of thousands of articles going back about ten years and we have to delete all of the images we posted in articles from around 2012 to 2018. The reason is that the then editor of the website had a bad habit of using Creative Commons images and not attributing them correctly so we're now vulnerable to legal action. I batch deleted all the actual images from our media library but that still leaves random bits of image attribution/text sitting in old articles and it all looks a complete mess. It took me about a day just to go through one month of old articles to correct this.
Anyway, before I get forced to throw myself out a window and end the misery, I wondered if conceptually speaking it might be possible to write some python code to automate this process. Basically what is required is a program that can go through every article we published between 2012 and 2018, identify sections of text (all the image attributions start with "Credit:") and then delete all this text. I'm a novice with python and I've just started thinking about this but I just wondered if anyone with more experience thinks this is at least possible. I honestly think it will take less time for me to learn python and do this than it will to manually go through each article deleting everything due to the volume of content there is on this site.
I'm a web editor that uses WordPress and my site has a bit of an annoying problem. We basically have tens of thousands of articles going back about ten years and we have to delete all of the images we posted in articles from around 2012 to 2018. The reason is that the then editor of the website had a bad habit of using Creative Commons images and not attributing them correctly so we're now vulnerable to legal action. I batch deleted all the actual images from our media library but that still leaves random bits of image attribution/text sitting in old articles and it all looks a complete mess. It took me about a day just to go through one month of old articles to correct this.
Anyway, before I get forced to throw myself out a window and end the misery, I wondered if conceptually speaking it might be possible to write some python code to automate this process. Basically what is required is a program that can go through every article we published between 2012 and 2018, identify sections of text (all the image attributions start with "Credit:") and then delete all this text. I'm a novice with python and I've just started thinking about this but I just wondered if anyone with more experience thinks this is at least possible. I honestly think it will take less time for me to learn python and do this than it will to manually go through each article deleting everything due to the volume of content there is on this site.
Share Improve this question asked Sep 18, 2021 at 5:40 henrikdundarionhenrikdundarion 1 1 |1 Answer
Reset to default 0No need to over complicate things by developing a script to do what you want and to end your misery. You can do this using SQL, presuming you have access to the database.
If you are using MySQL 8+, you can use PREG_REPLACE in a query, as suggested in this SO article: How to remove all tag from column using a SQL query
UPDATE wp_posts
SET post_content = REGEXP_REPLACE(post_content, '<img.*?/>', '')
WHERE post_content LIKE '%<img%';
Note - it is strongly recommended to make a back up of your website or, at least, the database before doing any SQL operations.
本文标签: automationUsing python to delete specified text from thousands of old blog posts
版权声明:本文标题:automation - Using python to delete specified text from thousands of old blog posts 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741313032a2371757.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
the_content
if the text is in the post content. Or, the content could be updated in the database by running a CLI script. – Linnea Huxford Commented Sep 20, 2021 at 1:36