admin管理员组文章数量:1349699
I need to count words in a string using PHP or Javascript (preferably PHP). The problem is that the counting needs to be the same as it works in Microsoft Word, because that is where the people assemble their original texts in so that is their reference frame. PHP has a word counting function (.str-word-count.php) but that is not 100% the same as far as I know.
Any pointers?
I need to count words in a string using PHP or Javascript (preferably PHP). The problem is that the counting needs to be the same as it works in Microsoft Word, because that is where the people assemble their original texts in so that is their reference frame. PHP has a word counting function (http://php/manual/en/function.str-word-count.php) but that is not 100% the same as far as I know.
Any pointers?
Share Improve this question asked Jan 27, 2010 at 9:54 MaartenMaarten 4,6717 gold badges38 silver badges51 bronze badges 4- 4 You might want to explain how Microsoft Word counts words or add a meaningful screenshot – Gordon Commented Jan 27, 2010 at 9:56
- 2 A screenshot is worth a thousand words. – Greg Hewgill Commented Jan 27, 2010 at 10:05
- 1 Not being a Microsoft Word expert I don't know how it calculates it..just know it es up with different counts then my code.. – Maarten Commented Jan 27, 2010 at 10:22
- Well, most of us are not diviners, so why not post your code and explain what it outputs and what you would expect it to output. The question as it is now is just vague. – Gordon Commented Jan 27, 2010 at 10:37
5 Answers
Reset to default 10The real problem here is that you're trying to develop a solution without really understanding the exact requirements. This isn't a coding problem so much as a problem with the specs.
The crux of the issue is that your word-counting algorithm is different to Word's word-counting algorithm - potentially for good reason, since there are various edge-cases to consider with no obvious answers. Thus your question should really be "What algorithm does Word use to calculate word count?" And if you think about this for a bit, you already know the answer - it's closed-source, proprietary software so no-one can know for sure. And even if you do work it out, this isn't a public interface so it can easily be changed in the next version.
Basically, I think it's fundamentally a bad idea to design your software so that it functions identically to something that you cannot fully understand. Personally, I would concentrate on just developing a sane word-count of your own, documenting the algorithm behind it and justifying why it's a reasonable method of counting words (pointing out that there is no One True Way).
If you must conform to Word's attempt for some short-sighted business reason, then the number one task is to work out what methodology they use to the point where you can write down an algorithm on paper. But this won't be easy, will be very hard to verify pletely and is liable to change without notice... :-/
Bit of a mine-field as MS word counts are considered wrong and unreliable by profesionals who depend on word counts -- journalists, translators, and, lawers who are often involved in legal procedures where motions and submisions must be less than a specific number fo words.
Having said that this article- http://dotnetperls./word-count
describes a pretty good regex algorithm implemented in C# -- but should be faily easy to transalate into php.
I think his small inaccuracies are based on two factors -- MS Word misses out words not conatined in "regular paragraphs" so footnotes, text box and table wrapped words may or may not be counted. Also I think the EVIL smart quotes feature messing with hypens may affect the results. So it may be worth changing all the 'el-dash' and 'em-dash' characters back to the normal minus sign.
The following JS code gives a word count of 67. OpenOffice gives the same number.
str = "I need to count words in a string using PHP or Javascript (preferably PHP). The problem is that the counting needs to be the same as it works in Microsoft Word, because that is where the people assemble their original texts in so that is their reference frame. PHP has a word counting function (http://php/manual/en/function.str-word-count.php) but that is not 100% the same as far as I know.";
wordCount = str.split(/\s+/g).length;
function countWords( $text )
{
$text = preg_replace('![^ \pL\pN\s]+!u', '', strtolower($text));
$text = trim( preg_replace('![ \s]+!u', ' ', $text) );
$count = count( explode(' ', $text) );
return $count;
}
you can use this code for word count
<title>Untitled Document</title>
<script type="text/javascript" src="mootools.svn.js"></script>
<script type="text/javascript">
window.addEvent('domready', function()
{
$('myInput').addEvent('keyup', function()
{
max_chars = 0;
current_value = $('myInput').value;
current_length = current_value.length;
remaining_chars = max_chars+current_length;
$('counter_number').innerHTML = remaining_chars;
if(remaining_chars<=5)
{
$('counter_number').setStyle('color', '#990000');
} else {
$('counter_number').setStyle('color', '#666666');
}
});
});
</script>
<style type="text/css">
body{
font-family:"Lucida Grande", "Lucida Sans Unicode", Verdana, Arial, Helvetica, sans-serif;
font-size:12px;
color:#000000;
}
a:link, a:visited{color:#0066CC;}
label{display:block;}
.counter{
font-family:Georgia, "Times New Roman", Times, serif;
font-size:16px;
font-weight:bold;
color:#666666
}
</style>
</head>
<body>
<label for="myInput">Write something here:</label>
<input type="text" id="myInput" maxlength="20" />
<span id="counter_number" class="counter">20</span>
Remaining chars
and download the mootools library...
本文标签: phpCount words like Microsoft Word doesStack Overflow
版权声明:本文标题:php - Count words like Microsoft Word does - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743866127a2552616.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论