admin管理员组

文章数量:1315792

MOTIVATION
I am trying to create HTML files in PDFBOX 3 using pdf2dom.

MILESTONES

  • I have updated pom dependencies of the pdf2dom @ .tugalsan.api.file.pdf.pdfbox3.pdf2dom
  • I have updated pom dependencies of the gfxassert@ .tugalsan.api.file.pdf.pdfbox3.pdf2dom.gfxassert
  • I have updated pom dependencies of the fontverter@ .tugalsan.api.file.pdf.pdfbox3.pdf2dom.fontverter
  • Removed some codes, [and delete test classes (!)] to able to compile.
  • Then I could able convert pdf files to html [ and it works :) ] at .tugalsan.api.file.pdf.pdfbox3/blob/main/src/main/java/com/tugalsan/api/file/pdf/pdfbox3/server/TS_FilePdfBox3UtilsHtml.java

QUESTION

  • While font box is updated from "fontbox-2.0.27" to "fontbox-3.0.4", the function does not exists anymore "Type2CharString.getType2Sequence()".
  • What is the new way of implementing the function below:

.tugalsan.api.file.pdf.pdfbox3.pdf2dom.fontverter/blob/master/src/main/java//mabb/fontverter/cff/CffFontAdapter.java

    public List<CffGlyph> getGlyphs() throws IOException {
        List<CffGlyph> glyphs = new ArrayList<CffGlyph>();
        for (GlyphMapReader.GlyphMapping mapOn : getGlyphMaps()) {
            CffGlyph glyph = createGlyph();
            Type2CharString charStr = font.getType2CharString(mapOn.glyphId);
//            glyph.readType2Sequence(charStr.getType2Sequence());
            glyph.map = mapOn;
            glyph.charStr = charStr;
            glyphs.add(glyph);
        }

        return glyphs;
    }

MOTIVATION
I am trying to create HTML files in PDFBOX 3 using pdf2dom.

MILESTONES

  • I have updated pom dependencies of the pdf2dom @ https://github/tugalsan/com.tugalsan.api.file.pdf.pdfbox3.pdf2dom
  • I have updated pom dependencies of the gfxassert@ https://github/tugalsan/com.tugalsan.api.file.pdf.pdfbox3.pdf2dom.gfxassert
  • I have updated pom dependencies of the fontverter@ https://github/tugalsan/com.tugalsan.api.file.pdf.pdfbox3.pdf2dom.fontverter
  • Removed some codes, [and delete test classes (!)] to able to compile.
  • Then I could able convert pdf files to html [ and it works :) ] at https://github/tugalsan/com.tugalsan.api.file.pdf.pdfbox3/blob/main/src/main/java/com/tugalsan/api/file/pdf/pdfbox3/server/TS_FilePdfBox3UtilsHtml.java

QUESTION

  • While font box is updated from "fontbox-2.0.27" to "fontbox-3.0.4", the function does not exists anymore "Type2CharString.getType2Sequence()".
  • What is the new way of implementing the function below:

https://github/tugalsan/com.tugalsan.api.file.pdf.pdfbox3.pdf2dom.fontverter/blob/master/src/main/java//mabb/fontverter/cff/CffFontAdapter.java

    public List<CffGlyph> getGlyphs() throws IOException {
        List<CffGlyph> glyphs = new ArrayList<CffGlyph>();
        for (GlyphMapReader.GlyphMapping mapOn : getGlyphMaps()) {
            CffGlyph glyph = createGlyph();
            Type2CharString charStr = font.getType2CharString(mapOn.glyphId);
//            glyph.readType2Sequence(charStr.getType2Sequence());
            glyph.map = mapOn;
            glyph.charStr = charStr;
            glyphs.add(glyph);
        }

        return glyphs;
    }
Share Improve this question edited Jan 30 at 1:41 Tugalsan Karabacak asked Jan 30 at 0:50 Tugalsan KarabacakTugalsan Karabacak 6539 silver badges21 bronze badges 5
  • That was removed in issues.apache./jira/browse/PDFBOX-5143 probably as part of an optimization effort. Maybe use 2.0.33 instead? – Tilman Hausherr Commented Jan 30 at 4:27
  • @tilman hausherr, but i thought pdfbox family should have same version number for compatability. Otherwise, I will continue calling from pdfbox3 application to a pdfbox2 application. Which is just make complicating things. github/tugalsan/com.tugalsan.lib.file.pdf.to.html/blob/main/… – Tugalsan Karabacak Commented Jan 30 at 4:41
  • I do not really understand enough about this and why you need it, you should either create an enhancement request in JIRA ( issues.apache./jira/browse/PDFBOX ) explaining why you need that. If you have to register, make a useful description (maybe mention this SO question). Alternatively you'd have to copy parts of the old fontbox code to get this information. – Tilman Hausherr Commented Jan 30 at 8:14
  • I found implementations of "getType2CharString()" @ .apache.fontbox.cff: CFFType1Font, CFFFCIDFont that constructs Type2CharString class. I will try to figure out the places where CFF fonts created, and extends them if possible to create a custom Type2CharString. – Tugalsan Karabacak Commented Jan 30 at 10:29
  • I found a way with reflections, but have not tested yet. github/tugalsan/… – Tugalsan Karabacak Commented Jan 30 at 13:09
Add a comment  | 

1 Answer 1

Reset to default 0

To solve it,

1- I used reflection to reach private members of class, and escalate the implementation of function getType2CharString that lives inside CFFType1Font and CFFCIDFont, as below to a new class named CffFontPatchUtils. There new getType2CharString is returning "Type2CharString charStr" and "List type2Sequence" at the same time.
WARNING: It is my first time using reflections. I do not understand why there was no compilation error, in the first run.

package .mabb.fontverter.cff;

import com.tugalsan.api.unsafe.client.TGS_UnSafe;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import .apache.fontbox.cff.CFFCIDFont;
import .apache.fontbox.cff.CFFFont;
import .apache.fontbox.cff.CFFType1Font;
import .apache.fontbox.cff.CIDKeyedType2CharString;
import .apache.fontbox.cff.Type2CharString;
import .apache.fontbox.cff.Type2CharStringParser;
import .apache.fontbox.type1.Type1CharStringReader;

public class CffFontPatchUtils {

    public static record Result(Type2CharString charStr, List<Object> type2Sequence) {

        public static Result of(Type2CharString charStr, List<Object> type2Sequence) {
            return new Result(charStr, type2Sequence);
        }
    }

    public static Result getType2CharString(CFFFont font, int cidOrGid) {
        if (font instanceof CFFType1Font _font) {
            return CFFType1Font_getType2CharString(_font, cidOrGid);
        }
        if (font instanceof CFFCIDFont _font) {
            return CFFCIDFont_getType2CharString(_font, cidOrGid);
        }
        return null;
    }

    private static Result CFFType1Font_getType2CharString(CFFType1Font font, int gid) {
        String name = "GID+" + gid; // for debugging only
        return CFFType1Font_getType2CharString(font, gid, name);
    }

    // Returns the Type 2 charstring for the given GID, with name for debugging
    private static Result CFFType1Font_getType2CharString(CFFType1Font font, int gid, String name) {
        return TGS_UnSafe.call(() -> {
            var field_charStringCache = font.getClass().getDeclaredField("charStringCache");
            field_charStringCache.setAccessible(true);
            var charStringCache = (Map<Integer, Type2CharString>) field_charStringCache.get("charStringCache");

            var type2 = charStringCache.get(gid);
            List<Object> type2seq = null;
            if (type2 == null) {

                var field_charStrings = font.getClass().getDeclaredField("charStrings");
                field_charStrings.setAccessible(true);
                var charStrings = (byte[][]) field_charStrings.get("charStrings");

                byte[] bytes = null;
                if (gid < charStrings.length) {
                    bytes = charStrings[gid];
                }
                if (bytes == null) {
                    bytes = charStrings[0]; // .notdef
                }

                var method_getParser = font.getClass().getDeclaredMethod("getParser");
                method_getParser.setAccessible(true);
                var parser = (Type2CharStringParser) method_getParser.invoke(font);

                var field_globalSubrIndex = font.getClass().getDeclaredField("globalSubrIndex");
                field_globalSubrIndex.setAccessible(true);
                var globalSubrIndex = (byte[][]) field_globalSubrIndex.get("globalSubrIndex");

                var method_getLocalSubrIndex = font.getClass().getDeclaredMethod("getLocalSubrIndex");
                method_getLocalSubrIndex.setAccessible(true);
                var getLocalSubrIndex = (byte[][]) method_getLocalSubrIndex.invoke(font, gid);

                type2seq = parser.parse(bytes, globalSubrIndex, getLocalSubrIndex, name);

                var field_reader = font.getClass().getDeclaredField("reader");
                field_reader.setAccessible(true);
                var reader = (Type1CharStringReader) field_reader.get("reader");

                var method_getDefaultWidthX = font.getClass().getDeclaredMethod("getDefaultWidthX");
                method_getDefaultWidthX.setAccessible(true);
                var getDefaultWidthX = (Integer) method_getDefaultWidthX.invoke(font, gid);

                var method_getNominalWidthX = font.getClass().getDeclaredMethod("getNominalWidthX");
                method_getNominalWidthX.setAccessible(true);
                var getNominalWidthX = (Integer) method_getNominalWidthX.invoke(font, gid);

                type2 = new Type2CharString(reader, font.getName(), name, gid, type2seq, getDefaultWidthX, getNominalWidthX);
                charStringCache.put(gid, type2);
            }
            return Result.of(type2, type2seq);
        });
    }

    private static Result CFFCIDFont_getType2CharString(CFFCIDFont font, int cid) {
        return TGS_UnSafe.call(() -> {
            var field_charStringCache = font.getClass().getDeclaredField("charStringCache");
            field_charStringCache.setAccessible(true);
            var charStringCache = (Map<Integer, CIDKeyedType2CharString>) field_charStringCache.get("charStringCache");

            var type2 = charStringCache.get(cid);
            List<Object> type2seq = null;
            if (type2 == null) {
                var gid = font.getCharset().getGIDForCID(cid);

                var field_charStrings = font.getClass().getDeclaredField("charStrings");
                field_charStrings.setAccessible(true);
                var charStrings = (byte[][]) field_charStrings.get("charStrings");

                byte[] bytes = null;
                if (gid < charStrings.length) {
                    bytes = charStrings[gid];
                }
                if (bytes == null) {
                    bytes = charStrings[0]; // .notdef
                }

                var method_getParser = font.getClass().getDeclaredMethod("getParser");
                method_getParser.setAccessible(true);
                var parser = (Type2CharStringParser) method_getParser.invoke(font);

                var field_globalSubrIndex = font.getClass().getDeclaredField("globalSubrIndex");
                field_globalSubrIndex.setAccessible(true);
                var globalSubrIndex = (byte[][]) field_globalSubrIndex.get("globalSubrIndex");

                var method_getLocalSubrIndex = font.getClass().getDeclaredMethod("getLocalSubrIndex");
                method_getLocalSubrIndex.setAccessible(true);
                var getLocalSubrIndex = (byte[][]) method_getLocalSubrIndex.invoke(font, gid);

                type2seq = parser.parse(bytes, globalSubrIndex, getLocalSubrIndex, String.format(Locale.US, "%04x", cid));

                var field_reader = font.getClass().getDeclaredField("reader");
                field_reader.setAccessible(true);
                var reader = (Type1CharStringReader) field_reader.get("reader");

                var method_getDefaultWidthX = font.getClass().getDeclaredMethod("getDefaultWidthX");
                method_getDefaultWidthX.setAccessible(true);
                var getDefaultWidthX = (Integer) method_getDefaultWidthX.invoke(font, gid);

                var method_getNominalWidthX = font.getClass().getDeclaredMethod("getNominalWidthX");
                method_getNominalWidthX.setAccessible(true);
                var getNominalWidthX = (Integer) method_getNominalWidthX.invoke(font, gid);

                type2 = new CIDKeyedType2CharString(reader, font.getName(), cid, gid, type2seq, getDefaultWidthX, getNominalWidthX);
                charStringCache.put(cid, type2);
            }
            return Result.of(type2, type2seq);
        });
    }
}
  1. Then, I updated the in the function CffFontAdapter.getGlyphs(), at package .mabb.fontverter.cff, in dependency fontverter, as below.
    public List<CffGlyph> getGlyphs() throws IOException {
        List<CffGlyph> glyphs = new ArrayList<CffGlyph>();
        for (GlyphMapReader.GlyphMapping mapOn : getGlyphMaps()) {
            CffGlyph glyph = createGlyph();
//            Type2CharString charStr = font.getType2CharString(mapOn.glyphId);
            var result = CffFontPatchUtils.getType2CharString(font, mapOn.glyphId);
//            glyph.readType2Sequence(charStr.getType2Sequence());
            glyph.readType2Sequence(result.type2Sequence());
            glyph.map = mapOn;
//            glyph.charStr = charStr;
            glyph.charStr = result.charStr();
            glyphs.add(glyph);
        }
        return glyphs;
    }
  1. I tested the pdf to html conversion with pdf files @ https://github/py-pdf/sample-files, most of them worked. Some failed ones are like 007-imagemagick-images, 008-reportlab-inline-image...

本文标签: