admin管理员组

文章数量:1350935

I'm trying to resolve a problem that shows up with another software for a certain PDF (which I can't upload, sorry). This other software should convert from PDF to PDF/A-2b.

The problem is apparently that the PDF is a rendered PDF created by Ghostscript 9.55.0 (Print to PDF?). Looking at the PDF with Tungsten Power PDF Advanced, I see the following information about the font:

The only font that's contained in the PDF has a blank name and is a Type 3 font.

Here's my attempt at converting the PDF to a PDF-A/2b with Stirling PDF (run locally). It just renders gibberish.

I can only show a trivial part of the original document. This is the top:

Looking at the structure with iText RUPS, I see the following:

Here's the font information, but iText RUPS seems to have problems reading the PDF (see error message on the right):

My other software then throws an exception because the font has no name, so I would like to set a dummy font (Helvetica, Arial, whatever...) instead of that empty font that I get.

com.namewithheld.System.Exception: No '' font found!

I tried replacing the font based on answers on Stackoverflow, but they seem to be written for version 2 of PDFBox.

Here's some source code I create to read all fonts of a given PDF.

  public static SortedSet<String> getDeclaredFonts(final ByteBuffer pdf) {
    try {
      final SortedSet<String> declaredFonts = Sets.newTreeSet();
      try (final PDDocument document = Loader.loadPDF(pdf.array())) {
        for (final PDPage page : document.getPages()) {
          for (final COSName fontName : page.getResources().getFontNames()) {
            final PDFont font = page.getResources().getFont(fontName);
            final String name = Objects.toString(font.getName(), "");
            if (name.contains("+")) {
              declaredFonts.add(StringUtils.substringAfter(name, "+"));
            } else {
              declaredFonts.add(name);
            }
          }
        }
      }
      return declaredFonts;
    } catch (final IOException e) {
      throw new IllegalArgumentException("Can't get declared fonts", e);
    }
  }

Based on Tilman Hausherr's comment I tried to replace the font name with PDFBox 2:

  private ByteBuffer replaceMissingFonts(final ByteBuffer pdf) {
    try {
      final PDDocument document = PDDocument.load(pdf.array());
      for (final PDPage page : document.getPages()) {
        for (final COSName fontName : page.getResources().getFontNames()) {
          final PDFont font = page.getResources().getFont(fontName);
          if (Strings.isNullOrEmpty(font.getName())) {
            System.out.println("*** Font is null or empty");
            font.getCOSObject().setName(COSName.NAME, "Arial");
          }
        }
      }

      try (ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream()) {
        document.save(byteArrayOutputStream);
        document.close();
        return ByteBuffer.wrap(byteArrayOutputStream.toByteArray());
      }
    } catch (final IOException e) {
      throw new RuntimeException("Can't set missing fonts", e);
    }
  }

However, veraPDF then shows the following validation errors:

"validationResult":{
  "details" : {
    "passedRules" : 142,
    "failedRules" : 1,
    "passedChecks" : 45995,
    "failedChecks" : 3,
    "ruleSummaries" : [ {
      "ruleStatus" : "FAILED",
      "specification" : "ISO 19005-2:2011",
      "clause" : "6.2.11.8",
      "testNumber" : 1,
      "status" : "failed",
      "failedChecks" : 3,
      "description" : "A PDF/A-2 compliant document shall not contain a reference to the .notdef glyph from any of the text showing operators, regardless of text rendering mode, in any content stream",
      "object" : "Glyph",
      "test" : "name != \".notdef\"",
      "checks" : [ {
        "status" : "failed",
        "context" : "root/document[0]/pages[0](6 0 obj PDPage)/contentStream[0](9 0 obj PDContentStream)/operators[21]/usedGlyphs[38](Arial Arial 17 0  0 false)",
        "errorArguments" : [ ]
      } ]
    } ]
  },
  "jobEndStatus" : "normal",
  "profileName" : "PDF/A-2B validation profile",
  "statement" : "PDF file is not compliant with Validation Profile requirements.",
  "compliant" : false
}

So, cany anyone tell me how to do this with PDFBox 3? Thanks!

本文标签: PDFBox 3 How can I replace a font that doesn39t existStack Overflow