admin管理员组

文章数量:1327675

I used the code from this tutorial to set up the pdf to text conversion.

Looked all over on this site .js/ for some hints as to how to format the conversion, but couldn't find anything. I am just wondering if anyone has any idea of how to display line breaks as \n when parsing text using pdf.js.

Thanks in advance.

I used the code from this tutorial http://ourcodeworld./articles/read/405/how-to-convert-pdf-to-text-extract-text-from-pdf-with-javascript to set up the pdf to text conversion.

Looked all over on this site https://mozilla.github.io/pdf.js/ for some hints as to how to format the conversion, but couldn't find anything. I am just wondering if anyone has any idea of how to display line breaks as \n when parsing text using pdf.js.

Thanks in advance.

Share Improve this question asked Jun 5, 2017 at 19:36 Thomas ValadezThomas Valadez 1,7572 gold badges24 silver badges27 bronze badges 2
  • Have you tried replacing any \r with \\r and same with \n to \\n with something like string.replace('\r','\\r').replace('\n','\\n');?, note: for those who don't know \r (carriage return) is monly paired with a newline character in some environments (i.e. windows) – Patrick Barr Commented Jun 5, 2017 at 19:50
  • yeah, I tried. except the \n doesn't ever exist. I am worried that the pdf.js just overlooks new line characters. – Thomas Valadez Commented Jun 5, 2017 at 19:57
Add a ment  | 

1 Answer 1

Reset to default 9

In PDF there no such thing as controlling layout using control chars such as '\n' -- glyphs in PDF positioned using exact coordinates. Use text y-coordinate (can be extracted from transform matrix) to detect a line change.

var url = "https://cdn.mozilla/pdfjs/tracemonkey.pdf";
var pageNumber = 2;
// Load document
PDFJS.getDocument(url).then(function (doc) {
  // Get a page
  return doc.getPage(pageNumber);
}).then(function (pdfPage) {
  // Get page text content
  return pdfPage.getTextContent();
}).then(function (textContent) {
  var p = null;
  var lastY = -1;
  textContent.items.forEach(function (i) {
    // Tracking Y-coord and if changed create new p-tag
    if (lastY != i.transform[5]) {
      p = document.createElement("p");
      document.body.appendChild(p);
      lastY = i.transform[5];
    }
    p.textContent += i.str;
  });
});
<script src="https://npmcdn./pdfjs-dist/build/pdf.js"></script>

本文标签: javascriptDisplay line breaks as n in pdf to text conversion using pdfjsStack Overflow