admin管理员组

文章数量:1426020

Problem:

I'd like to be able to count the number of lines in a Google Document. For example, the script must return 6 for the following text.

There doesn't seem to be any reliable method of extracting '\n' or '\r' characters from the text though.

text.findText(/\r/g)  //OR
text.findText(/\n/g)

The 2nd line of code is not supposed to work anyway, because according to GAS documentation, 'new line characters are automatically converted to /r'

Problem:

I'd like to be able to count the number of lines in a Google Document. For example, the script must return 6 for the following text.

There doesn't seem to be any reliable method of extracting '\n' or '\r' characters from the text though.

text.findText(/\r/g)  //OR
text.findText(/\n/g)

The 2nd line of code is not supposed to work anyway, because according to GAS documentation, 'new line characters are automatically converted to /r'

Share Improve this question asked Mar 21, 2018 at 14:49 Anton DementievAnton Dementiev 5,7164 gold badges23 silver badges36 bronze badges 8
  • @SimeonNakov thanks, it works, but it returns the number of paragraphs as I expected. So there doesn't seem to be a way to count lines in Google docs after all :( – Anton Dementiev Commented Mar 21, 2018 at 15:27
  • @tehhowch My intention is to count individual lines as stated (not paragraphs). You could simply count paragraphs by calling body.getParagraphs().length, however, I got confused by the statement in the documentation to the effect that "all new line characters are converted into \r", which indeed seems to be the case, so the question is unresolved. – Anton Dementiev Commented Mar 21, 2018 at 15:29
  • Yes, I know. that's why I linked to the getParagraphs() method. It is unclear what you actually want to count - sentences, paragraphs, or some quantity that is entirely dependent on the current document's formatting settings (how much space is used). – tehhowch Commented Mar 21, 2018 at 15:30
  • 1 Sentences, paragraphs, and lines are distinct entities, so not sure I understand the ambiguity here. As stated, what I'd like to do is to count the number of individual lines in the document. I can count the number of first lines in each paragraph by counting the number of paragraphs, but I also want to count the number of "inner" lines that occupy the entire page width. – Anton Dementiev Commented Mar 21, 2018 at 15:41
  • 2 I think it's pretty clear what is being asked here. The questioner wishes to know if there is an easy way to get the number of lines. Perhaps they are writing documents which are required to be all on one page and the person to whom they are being sent would prefer a smaller type just to keep the document all on one page. Unfortunately, I think this simple answer is no there is not a function which provides the number of lines that I know of. – Cooper Commented Mar 21, 2018 at 16:17
 |  Show 3 more ments

4 Answers 4

Reset to default 1

If you are still looking for the solution, how about this answer? Unfortunately, I couldn't find the prepared methods for retrieving the number of lines in the Google Document. In order to do this, how about this workaround?

If the end of each line can be detected, the number of lines can be retrieved. So I tried to add the end markers of each line using OCR. I think that there might be several workarounds to solve your issue. So please think of this as one of them.

At Google Documents, when a sentence is over the page width, the sentence automatically has the line break. But the line break has no \r\n or \n. When users give the line break by the enter key, the line break has \r\n or \n. By this, the text data retrieved from the document has only the line breaks which were given by users. In your case, it seems that your document has the line breaks for after incididunt and consequat.. So the number of lines doesn't bee 6.

I thought that OCR may be able to be used for this situation. The flow is as follows.

  1. Convert Google Document to PDF.
  2. Convert PDF to text data using OCR.
    • I selected "ocr.space" for OCR.
      • If you have already known APIs of OCR, you can try to do this.
    • When I used OCR of Drive API, the line breaks of \r\n or \n were not added to the converted text data. So I used ocr.space. ocr.space can add the line breaks.
  3. Count \n in the converted text data.
    • This number means the number of lines.

The sample script for above flow is as follows. When you use this, please retrieve your apikey at "ocr.space". When you input your information and email to the form, you will receive an email including API key. Please use it to this sample script. And please read the quota of API. I tested this using Free plan.

Sample script :

var apikey = "### Your API key for using ocr.space ###";

var id = DocumentApp.getActiveDocument().getId();
var url = "https://docs.google./feeds/download/documents/export/Export?id=" + id + "&format=pdf&access_token=" + ScriptApp.getOAuthToken();
var blob = UrlFetchApp.fetch(url).getBlob();
var payload = {method: "POST", headers: {apikey: apikey}, payload: {file: blob}};
var ocrRes = JSON.parse(UrlFetchApp.fetch("https://api.ocr.space/Parse/Image", payload));
var result = ocrRes.ParsedResults.map(function(e){return e.ParsedText.match(/\n/g).length})[0];
Logger.log(result)

Result :

When your sentences are used, 6 is obtained as the result of script.

Note :

  • Even if the last line of the document has no \r\n or \n, the converted text data has \r\n at the end of all lines.
  • In this case, the precision of OCR is not important. The important point is to retrieve the line breaks.

I tested this script for several documents. In my environment, the correct number of line can be retrieved. But I'm not sure whether this script works for your environment. If this script cannot be used for your environment, I'm sorry.

There is no easy way using Google Apps Script to count the line numbers in a Google Docs document. This is because the Documents Service (Class DocumentsApp) doesn't include a method for this. Implementing a custom method is plex because there are many factors to consider, such as the font used, the wrapping strategy, the page size and page margins, etc.

Workaround

If you set your document to use the Pages setup instead of Pageless, the most straightforward workaround is to turn on the option "Show Line Numbers".

  1. Click File > Page Setup. This will open the Page setup dialog

  2. If Pageless is selected, click Pages, then OK. Otherwise, click the Close (X) button.

  3. Click Tools > Show Line Numbers. This will open the Show Line Numbers sidebar.

    1. Check the Show Line Numbers checkbox,
    2. On the Line numbering option group, click Continue throughout document. On the Apply to option group, the Entire document is selected, and the other option is disabled.
  4. Go to the end of the document. You will see the last line number, which is the number of lines of your document.

    Notes:

    1. The above image shows that the last line is empty.
    2. The horizontal white spaces between paragraphs don't include a number because this is the paragraph spacing.

As you noted in the ments there is no API to do retrieve the number of lines in Google Docs. This happens because the document is rendered dynamically in the client side, so the server doesn't know this number.

One possible solution is scraping the HTML of the Google Doc, because each line is redered with it's own divwith the "kix-lineview" class, however you will need to actually open the page in an iframe or headless browser and then scroll page by page to make them render and then be able to count the divs

After publishing your Google Doc with «Publish to the web» in «File» menu, use the URL in the following script:

var url = "https://docs.google./document/d/e/2PACX-1vSElK...iwUhaFo/pub";
var text = UrlFetchApp.fetch(url).getContentText();
var count = (text.match(/<\/br>/g) || []).length;
Logger.log(count.toString());

This is only handy if all of your document lines are ended in </br>, although there is the possibility to add any other variants:

var url = "https://docs.google./document/d/e/2PACX-1vSElK...iwUhaFo/pub";
var text = UrlFetchApp.fetch(url).getContentText();
var count1 = (text.match(/<\/br>/g) || []).length;
var count2 = (text.match(/<\/p>/g) || []).length;
var count3 = (text.match(/<hr>/g) || []).length;
var count = coun1 + coung2 + count3;
Logger.log(count);

本文标签: javascriptCounting the number of lines in Google DocumentStack Overflow