admin管理员组文章数量:1327675
I used the code from this tutorial to set up the pdf to text conversion.
Looked all over on this site .js/ for some hints as to how to format the conversion, but couldn't find anything. I am just wondering if anyone has any idea of how to display line breaks as \n
when parsing text using pdf.js.
Thanks in advance.
I used the code from this tutorial http://ourcodeworld./articles/read/405/how-to-convert-pdf-to-text-extract-text-from-pdf-with-javascript to set up the pdf to text conversion.
Looked all over on this site https://mozilla.github.io/pdf.js/ for some hints as to how to format the conversion, but couldn't find anything. I am just wondering if anyone has any idea of how to display line breaks as \n
when parsing text using pdf.js.
Thanks in advance.
Share Improve this question asked Jun 5, 2017 at 19:36 Thomas ValadezThomas Valadez 1,7572 gold badges24 silver badges27 bronze badges 2-
Have you tried replacing any
\r
with\\r
and same with\n
to\\n
with something likestring.replace('\r','\\r').replace('\n','\\n');
?, note: for those who don't know\r
(carriage return) is monly paired with a newline character in some environments (i.e. windows) – Patrick Barr Commented Jun 5, 2017 at 19:50 -
yeah, I tried. except the
\n
doesn't ever exist. I am worried that the pdf.js just overlooks new line characters. – Thomas Valadez Commented Jun 5, 2017 at 19:57
1 Answer
Reset to default 9In PDF there no such thing as controlling layout using control chars such as '\n' -- glyphs in PDF positioned using exact coordinates. Use text y-coordinate (can be extracted from transform matrix) to detect a line change.
var url = "https://cdn.mozilla/pdfjs/tracemonkey.pdf";
var pageNumber = 2;
// Load document
PDFJS.getDocument(url).then(function (doc) {
// Get a page
return doc.getPage(pageNumber);
}).then(function (pdfPage) {
// Get page text content
return pdfPage.getTextContent();
}).then(function (textContent) {
var p = null;
var lastY = -1;
textContent.items.forEach(function (i) {
// Tracking Y-coord and if changed create new p-tag
if (lastY != i.transform[5]) {
p = document.createElement("p");
document.body.appendChild(p);
lastY = i.transform[5];
}
p.textContent += i.str;
});
});
<script src="https://npmcdn./pdfjs-dist/build/pdf.js"></script>
本文标签: javascriptDisplay line breaks as n in pdf to text conversion using pdfjsStack Overflow
版权声明:本文标题:javascript - Display line breaks as `n` in pdf to text conversion using pdf.js - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742224922a2436135.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论