admin管理员组文章数量:1315792
MOTIVATION
I am trying to create HTML files in PDFBOX 3 using pdf2dom.
MILESTONES
- I have updated pom dependencies of the pdf2dom @ .tugalsan.api.file.pdf.pdfbox3.pdf2dom
- I have updated pom dependencies of the gfxassert@ .tugalsan.api.file.pdf.pdfbox3.pdf2dom.gfxassert
- I have updated pom dependencies of the fontverter@ .tugalsan.api.file.pdf.pdfbox3.pdf2dom.fontverter
- Removed some codes, [and delete test classes (!)] to able to compile.
- Then I could able convert pdf files to html [ and it works :) ] at .tugalsan.api.file.pdf.pdfbox3/blob/main/src/main/java/com/tugalsan/api/file/pdf/pdfbox3/server/TS_FilePdfBox3UtilsHtml.java
QUESTION
- While font box is updated from "fontbox-2.0.27" to "fontbox-3.0.4", the function does not exists anymore "Type2CharString.getType2Sequence()".
- What is the new way of implementing the function below:
.tugalsan.api.file.pdf.pdfbox3.pdf2dom.fontverter/blob/master/src/main/java//mabb/fontverter/cff/CffFontAdapter.java
public List<CffGlyph> getGlyphs() throws IOException {
List<CffGlyph> glyphs = new ArrayList<CffGlyph>();
for (GlyphMapReader.GlyphMapping mapOn : getGlyphMaps()) {
CffGlyph glyph = createGlyph();
Type2CharString charStr = font.getType2CharString(mapOn.glyphId);
// glyph.readType2Sequence(charStr.getType2Sequence());
glyph.map = mapOn;
glyph.charStr = charStr;
glyphs.add(glyph);
}
return glyphs;
}
MOTIVATION
I am trying to create HTML files in PDFBOX 3 using pdf2dom.
MILESTONES
- I have updated pom dependencies of the pdf2dom @ https://github/tugalsan/com.tugalsan.api.file.pdf.pdfbox3.pdf2dom
- I have updated pom dependencies of the gfxassert@ https://github/tugalsan/com.tugalsan.api.file.pdf.pdfbox3.pdf2dom.gfxassert
- I have updated pom dependencies of the fontverter@ https://github/tugalsan/com.tugalsan.api.file.pdf.pdfbox3.pdf2dom.fontverter
- Removed some codes, [and delete test classes (!)] to able to compile.
- Then I could able convert pdf files to html [ and it works :) ] at https://github/tugalsan/com.tugalsan.api.file.pdf.pdfbox3/blob/main/src/main/java/com/tugalsan/api/file/pdf/pdfbox3/server/TS_FilePdfBox3UtilsHtml.java
QUESTION
- While font box is updated from "fontbox-2.0.27" to "fontbox-3.0.4", the function does not exists anymore "Type2CharString.getType2Sequence()".
- What is the new way of implementing the function below:
https://github/tugalsan/com.tugalsan.api.file.pdf.pdfbox3.pdf2dom.fontverter/blob/master/src/main/java//mabb/fontverter/cff/CffFontAdapter.java
public List<CffGlyph> getGlyphs() throws IOException {
List<CffGlyph> glyphs = new ArrayList<CffGlyph>();
for (GlyphMapReader.GlyphMapping mapOn : getGlyphMaps()) {
CffGlyph glyph = createGlyph();
Type2CharString charStr = font.getType2CharString(mapOn.glyphId);
// glyph.readType2Sequence(charStr.getType2Sequence());
glyph.map = mapOn;
glyph.charStr = charStr;
glyphs.add(glyph);
}
return glyphs;
}
Share
Improve this question
edited Jan 30 at 1:41
Tugalsan Karabacak
asked Jan 30 at 0:50
Tugalsan KarabacakTugalsan Karabacak
6539 silver badges21 bronze badges
5
- That was removed in issues.apache./jira/browse/PDFBOX-5143 probably as part of an optimization effort. Maybe use 2.0.33 instead? – Tilman Hausherr Commented Jan 30 at 4:27
- @tilman hausherr, but i thought pdfbox family should have same version number for compatability. Otherwise, I will continue calling from pdfbox3 application to a pdfbox2 application. Which is just make complicating things. github/tugalsan/com.tugalsan.lib.file.pdf.to.html/blob/main/… – Tugalsan Karabacak Commented Jan 30 at 4:41
- I do not really understand enough about this and why you need it, you should either create an enhancement request in JIRA ( issues.apache./jira/browse/PDFBOX ) explaining why you need that. If you have to register, make a useful description (maybe mention this SO question). Alternatively you'd have to copy parts of the old fontbox code to get this information. – Tilman Hausherr Commented Jan 30 at 8:14
- I found implementations of "getType2CharString()" @ .apache.fontbox.cff: CFFType1Font, CFFFCIDFont that constructs Type2CharString class. I will try to figure out the places where CFF fonts created, and extends them if possible to create a custom Type2CharString. – Tugalsan Karabacak Commented Jan 30 at 10:29
- I found a way with reflections, but have not tested yet. github/tugalsan/… – Tugalsan Karabacak Commented Jan 30 at 13:09
1 Answer
Reset to default 0To solve it,
1- I used reflection to reach private members of class, and escalate the implementation of function getType2CharString that lives inside CFFType1Font and CFFCIDFont, as below to a new class named CffFontPatchUtils. There new getType2CharString is returning "Type2CharString charStr" and "List type2Sequence" at the same time.
WARNING: It is my first time using reflections. I do not understand why there was no compilation error, in the first run.
package .mabb.fontverter.cff;
import com.tugalsan.api.unsafe.client.TGS_UnSafe;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import .apache.fontbox.cff.CFFCIDFont;
import .apache.fontbox.cff.CFFFont;
import .apache.fontbox.cff.CFFType1Font;
import .apache.fontbox.cff.CIDKeyedType2CharString;
import .apache.fontbox.cff.Type2CharString;
import .apache.fontbox.cff.Type2CharStringParser;
import .apache.fontbox.type1.Type1CharStringReader;
public class CffFontPatchUtils {
public static record Result(Type2CharString charStr, List<Object> type2Sequence) {
public static Result of(Type2CharString charStr, List<Object> type2Sequence) {
return new Result(charStr, type2Sequence);
}
}
public static Result getType2CharString(CFFFont font, int cidOrGid) {
if (font instanceof CFFType1Font _font) {
return CFFType1Font_getType2CharString(_font, cidOrGid);
}
if (font instanceof CFFCIDFont _font) {
return CFFCIDFont_getType2CharString(_font, cidOrGid);
}
return null;
}
private static Result CFFType1Font_getType2CharString(CFFType1Font font, int gid) {
String name = "GID+" + gid; // for debugging only
return CFFType1Font_getType2CharString(font, gid, name);
}
// Returns the Type 2 charstring for the given GID, with name for debugging
private static Result CFFType1Font_getType2CharString(CFFType1Font font, int gid, String name) {
return TGS_UnSafe.call(() -> {
var field_charStringCache = font.getClass().getDeclaredField("charStringCache");
field_charStringCache.setAccessible(true);
var charStringCache = (Map<Integer, Type2CharString>) field_charStringCache.get("charStringCache");
var type2 = charStringCache.get(gid);
List<Object> type2seq = null;
if (type2 == null) {
var field_charStrings = font.getClass().getDeclaredField("charStrings");
field_charStrings.setAccessible(true);
var charStrings = (byte[][]) field_charStrings.get("charStrings");
byte[] bytes = null;
if (gid < charStrings.length) {
bytes = charStrings[gid];
}
if (bytes == null) {
bytes = charStrings[0]; // .notdef
}
var method_getParser = font.getClass().getDeclaredMethod("getParser");
method_getParser.setAccessible(true);
var parser = (Type2CharStringParser) method_getParser.invoke(font);
var field_globalSubrIndex = font.getClass().getDeclaredField("globalSubrIndex");
field_globalSubrIndex.setAccessible(true);
var globalSubrIndex = (byte[][]) field_globalSubrIndex.get("globalSubrIndex");
var method_getLocalSubrIndex = font.getClass().getDeclaredMethod("getLocalSubrIndex");
method_getLocalSubrIndex.setAccessible(true);
var getLocalSubrIndex = (byte[][]) method_getLocalSubrIndex.invoke(font, gid);
type2seq = parser.parse(bytes, globalSubrIndex, getLocalSubrIndex, name);
var field_reader = font.getClass().getDeclaredField("reader");
field_reader.setAccessible(true);
var reader = (Type1CharStringReader) field_reader.get("reader");
var method_getDefaultWidthX = font.getClass().getDeclaredMethod("getDefaultWidthX");
method_getDefaultWidthX.setAccessible(true);
var getDefaultWidthX = (Integer) method_getDefaultWidthX.invoke(font, gid);
var method_getNominalWidthX = font.getClass().getDeclaredMethod("getNominalWidthX");
method_getNominalWidthX.setAccessible(true);
var getNominalWidthX = (Integer) method_getNominalWidthX.invoke(font, gid);
type2 = new Type2CharString(reader, font.getName(), name, gid, type2seq, getDefaultWidthX, getNominalWidthX);
charStringCache.put(gid, type2);
}
return Result.of(type2, type2seq);
});
}
private static Result CFFCIDFont_getType2CharString(CFFCIDFont font, int cid) {
return TGS_UnSafe.call(() -> {
var field_charStringCache = font.getClass().getDeclaredField("charStringCache");
field_charStringCache.setAccessible(true);
var charStringCache = (Map<Integer, CIDKeyedType2CharString>) field_charStringCache.get("charStringCache");
var type2 = charStringCache.get(cid);
List<Object> type2seq = null;
if (type2 == null) {
var gid = font.getCharset().getGIDForCID(cid);
var field_charStrings = font.getClass().getDeclaredField("charStrings");
field_charStrings.setAccessible(true);
var charStrings = (byte[][]) field_charStrings.get("charStrings");
byte[] bytes = null;
if (gid < charStrings.length) {
bytes = charStrings[gid];
}
if (bytes == null) {
bytes = charStrings[0]; // .notdef
}
var method_getParser = font.getClass().getDeclaredMethod("getParser");
method_getParser.setAccessible(true);
var parser = (Type2CharStringParser) method_getParser.invoke(font);
var field_globalSubrIndex = font.getClass().getDeclaredField("globalSubrIndex");
field_globalSubrIndex.setAccessible(true);
var globalSubrIndex = (byte[][]) field_globalSubrIndex.get("globalSubrIndex");
var method_getLocalSubrIndex = font.getClass().getDeclaredMethod("getLocalSubrIndex");
method_getLocalSubrIndex.setAccessible(true);
var getLocalSubrIndex = (byte[][]) method_getLocalSubrIndex.invoke(font, gid);
type2seq = parser.parse(bytes, globalSubrIndex, getLocalSubrIndex, String.format(Locale.US, "%04x", cid));
var field_reader = font.getClass().getDeclaredField("reader");
field_reader.setAccessible(true);
var reader = (Type1CharStringReader) field_reader.get("reader");
var method_getDefaultWidthX = font.getClass().getDeclaredMethod("getDefaultWidthX");
method_getDefaultWidthX.setAccessible(true);
var getDefaultWidthX = (Integer) method_getDefaultWidthX.invoke(font, gid);
var method_getNominalWidthX = font.getClass().getDeclaredMethod("getNominalWidthX");
method_getNominalWidthX.setAccessible(true);
var getNominalWidthX = (Integer) method_getNominalWidthX.invoke(font, gid);
type2 = new CIDKeyedType2CharString(reader, font.getName(), cid, gid, type2seq, getDefaultWidthX, getNominalWidthX);
charStringCache.put(cid, type2);
}
return Result.of(type2, type2seq);
});
}
}
- Then, I updated the in the function CffFontAdapter.getGlyphs(), at package .mabb.fontverter.cff, in dependency fontverter, as below.
public List<CffGlyph> getGlyphs() throws IOException {
List<CffGlyph> glyphs = new ArrayList<CffGlyph>();
for (GlyphMapReader.GlyphMapping mapOn : getGlyphMaps()) {
CffGlyph glyph = createGlyph();
// Type2CharString charStr = font.getType2CharString(mapOn.glyphId);
var result = CffFontPatchUtils.getType2CharString(font, mapOn.glyphId);
// glyph.readType2Sequence(charStr.getType2Sequence());
glyph.readType2Sequence(result.type2Sequence());
glyph.map = mapOn;
// glyph.charStr = charStr;
glyph.charStr = result.charStr();
glyphs.add(glyph);
}
return glyphs;
}
- I tested the pdf to html conversion with pdf files @ https://github/py-pdf/sample-files, most of them worked. Some failed ones are like 007-imagemagick-images, 008-reportlab-inline-image...
本文标签:
版权声明:本文标题:java - While font box is updated from "fontbox-2.0.27" to "fontbox-3.0.4", the function does 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741990940a2409092.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论