admin管理员组

文章数量:1323716

Is there any technical way to search post titles and return the results of the most used duplicated words in them?

I.e Three articles in total:

  • The quick brown fox jumps over the lazy dog
  • A brown bag for the summer
  • New record - Athlete jumps higher

Most used words in post titles:

  1. Brown
  2. Jumps

Is there any technical way to search post titles and return the results of the most used duplicated words in them?

I.e Three articles in total:

  • The quick brown fox jumps over the lazy dog
  • A brown bag for the summer
  • New record - Athlete jumps higher

Most used words in post titles:

  1. Brown
  2. Jumps
Share Improve this question asked Sep 9, 2020 at 12:38 JohnnyBratsoniJohnnyBratsoni 32 bronze badges 1
  • Yes - there is always a way. There is no built-in method, however, if that is what you are asking. It would require many lines of custom code to A: build an array of post titles; B: parse individual words from each; C: sort by frequency. Post a new question when you get stuck on a specific step. – jdm2112 Commented Sep 9, 2020 at 12:54
Add a comment  | 

2 Answers 2

Reset to default 0

Below MYSQL query return the 10 most common value (words) in a post_title(column) FROM wp_posts(table):

SELECT post_title, COUNT(post_title) AS Appearances FROM wp_posts GROUP BY post_title ORDER BY Appearances DESC LIMIT 10

If you can use a custom (and ugly) SQL query, here is something that might interest you.

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(wp_posts.post_title, ' ', words.n), ' ', -1) word, count(SUBSTRING_INDEX(SUBSTRING_INDEX(wp_posts.post_title, ' ', words.n), ' ', -1)) count

FROM (select 1 n union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9 union all select 10) words

INNER JOIN wp_posts on CHAR_LENGTH(wp_posts.post_title) - CHAR_LENGTH(REPLACE(wp_posts.post_title, ' ', '')) >= words.n - 1

WHERE wp_posts.post_type != 'revision' AND wp_posts.post_status = 'publish'
  AND SUBSTRING_INDEX(SUBSTRING_INDEX(wp_posts.post_title, ' ', words.n), ' ', -1) NOT IN ('-', '&', ',', '.')
  AND LENGTH(SUBSTRING_INDEX(SUBSTRING_INDEX(wp_posts.post_title, ' ', words.n), ' ', -1)) > 3

GROUP BY word
ORDER BY count DESC, word ASC;

Explanation:

This query fetch all the post_title from the table and split them in words so we can group and count them individually.

It filters the rows when the post is not a revision and is published. It can also exclude some words from the results (with the NOT IN clause). It can also exclude small words (with the LENGTH > 3 clause).

NOTE : If your titles have more than 10 words, they will not be counted properly. You will have to adapt the select 1 n union all... line to count more words.

The query is widely adapted from this answer : https://stackoverflow/a/17942691/2342137

PS : If anyone wants to rewrite the SQL if it is possible to reuse the word column instead of doing the SUBSTRING_INDEX multiple times, that would be great.

本文标签: functionsFind most used words in post titles