admin管理员组

文章数量:1122832

I am currently playing around with Hakyll and Pandoc.

I want to create a static HTML website from Markdown sources including inline maths in LaTeX. Using pandoc-katex I was able to do the conversion with the following command:

$ pandoc -f markdown -t html --filter pandoc-katex --css "@$(pandoc-katex --katex-version)/dist/katex.min.css" --css ".css" --standalone -o output.html input.md

However, I want to use the pandoc-katex filter in Hakyll and obtain the exact same result as with the command above (for now), i.e. I want to use Pandoc's standard HTML template, make it load the two CSS files and process any available metadata in the input.md in exactly the same way as the command above does.

I exported the standard HTML template as follows:

$ pandoc -D html > default-template.html

Using pandocCompilerWithTransformM, I was able to use the pandoc-katex filter:

katexCompiler = pandocCompilerWithTransformM defaultHakyllReaderOptions (defaultHakyllWriterOptions) katexFilter
  where katexFilter = recompilingUnsafeCompiler
      . runIOorExplode
      . applyFilters noEngine def [JSONFilter "pandoc-katex"] []

Using this compiler in Hakyll, I only get the body part of the HTML file though. I searched online for solutions to this, but all the information that I find seems to refer to deprecated versions of Pandoc. Apparently there was a writerStandalone option in earlier versions of Pandoc, but it does not exist anymore (even though the command line tool still has opStandalone and the --standalone parameter used above evidently works).

What I currently do is, I apply the default template with loadAndApplyTemplate "templates/default-template.html" myCtx and then try to manually replicate the default context in myCtx. This is obviously not how it should be done.

Here is a somewhat minimal example of my attempt (sorry that it's still a bit lengthy - exactly that is the problem):

{-# LANGUAGE OverloadedStrings #-}

import Text.Pandoc
import Text.Pandoc.Filter
import Text.Pandoc.Scripting
import Hakyll

css1Item   = Item (fromFilePath "css/katex.min.css") "/[email protected]/dist/katex.min.css" 
css2Item   = Item (fromFilePath "css/pandoc.css") ".css" 
authorItem = Item (fromFilePath "general") "Jon Doe"

stylesString = "/* 15 lines of CSS */"

myCtx :: Context String
myCtx = dateField "date" "%B %e, %Y"
       <> constField "pagetitle" "My Title"
       <> constField "styles.html" stylesString
       <> listCtx "author" [authorItem]
       <> listCtx "author-meta" [authorItem]
       <> listCtx "css" [css1Item, css2Item]
       <> listCtx "header-includes" []
       <> listCtx "include-before" []
       <> listCtx "include-after" []
       <> defaultContext

listCtx :: String -> [Item String] -> Context String
listCtx name lst = listField name ctx (return $ lst)
  where ctx = field name (return . itemBody)

katexCompiler = pandocCompilerWithTransformM defaultHakyllReaderOptions (defaultHakyllWriterOptions) katexFilter
  where katexFilter = recompilingUnsafeCompiler
      . runIOorExplode
      . applyFilters noEngine def [JSONFilter "pandoc-katex"] []

main :: IO ()
main = hakyll $ do
  match "templates/default-template.html" $ compile templateBodyCompiler
  match "input.md" $ do
      route   $ setExtension ".html"
      compile $ katexCompiler
                >>= loadAndApplyTemplate "templates/default-template.html" myCtx

I have a two concrete questions:

  • The Item data type associates keys of type Identifier with values. The constructors for Identifier suggest that the Identifiers should be file names, but for some of the Items in my context, (e.g. for the author field; see variable authorItem), having a file name as a key does not make sense. I think I misinterpreted the purpose of this type. How should I think of these Items?
  • Is there a way to obtain the Context that the command line tool uses, when making the conversion? The default Context seems to be a lot more involved than my quick draft, e.g. it reads the abstract from the metadata of the Markdown file and puts every paragraph in between separate <p> ... </p> HTML tags. I know there is a metadataField :: Context a, but it does not seem to be what I want.

Apart from these concrete questions, the general question is:

  • Do I do this right at all or would there be a much simpler way of doing what I try to do (i.e. replicating the output of the initial pandoc shell command in Haskell with Hakyll)?

I am currently playing around with Hakyll and Pandoc.

I want to create a static HTML website from Markdown sources including inline maths in LaTeX. Using pandoc-katex I was able to do the conversion with the following command:

$ pandoc -f markdown -t html --filter pandoc-katex --css "https://cdn.jsdelivr.net/npm/katex@$(pandoc-katex --katex-version)/dist/katex.min.css" --css "https://pandoc.org/demo/pandoc.css" --standalone -o output.html input.md

However, I want to use the pandoc-katex filter in Hakyll and obtain the exact same result as with the command above (for now), i.e. I want to use Pandoc's standard HTML template, make it load the two CSS files and process any available metadata in the input.md in exactly the same way as the command above does.

I exported the standard HTML template as follows:

$ pandoc -D html > default-template.html

Using pandocCompilerWithTransformM, I was able to use the pandoc-katex filter:

katexCompiler = pandocCompilerWithTransformM defaultHakyllReaderOptions (defaultHakyllWriterOptions) katexFilter
  where katexFilter = recompilingUnsafeCompiler
      . runIOorExplode
      . applyFilters noEngine def [JSONFilter "pandoc-katex"] []

Using this compiler in Hakyll, I only get the body part of the HTML file though. I searched online for solutions to this, but all the information that I find seems to refer to deprecated versions of Pandoc. Apparently there was a writerStandalone option in earlier versions of Pandoc, but it does not exist anymore (even though the command line tool still has opStandalone and the --standalone parameter used above evidently works).

What I currently do is, I apply the default template with loadAndApplyTemplate "templates/default-template.html" myCtx and then try to manually replicate the default context in myCtx. This is obviously not how it should be done.

Here is a somewhat minimal example of my attempt (sorry that it's still a bit lengthy - exactly that is the problem):

{-# LANGUAGE OverloadedStrings #-}

import Text.Pandoc
import Text.Pandoc.Filter
import Text.Pandoc.Scripting
import Hakyll

css1Item   = Item (fromFilePath "css/katex.min.css") "https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" 
css2Item   = Item (fromFilePath "css/pandoc.css") "https://pandoc.org/demo/pandoc.css" 
authorItem = Item (fromFilePath "general") "Jon Doe"

stylesString = "/* 15 lines of CSS */"

myCtx :: Context String
myCtx = dateField "date" "%B %e, %Y"
       <> constField "pagetitle" "My Title"
       <> constField "styles.html" stylesString
       <> listCtx "author" [authorItem]
       <> listCtx "author-meta" [authorItem]
       <> listCtx "css" [css1Item, css2Item]
       <> listCtx "header-includes" []
       <> listCtx "include-before" []
       <> listCtx "include-after" []
       <> defaultContext

listCtx :: String -> [Item String] -> Context String
listCtx name lst = listField name ctx (return $ lst)
  where ctx = field name (return . itemBody)

katexCompiler = pandocCompilerWithTransformM defaultHakyllReaderOptions (defaultHakyllWriterOptions) katexFilter
  where katexFilter = recompilingUnsafeCompiler
      . runIOorExplode
      . applyFilters noEngine def [JSONFilter "pandoc-katex"] []

main :: IO ()
main = hakyll $ do
  match "templates/default-template.html" $ compile templateBodyCompiler
  match "input.md" $ do
      route   $ setExtension ".html"
      compile $ katexCompiler
                >>= loadAndApplyTemplate "templates/default-template.html" myCtx

I have a two concrete questions:

  • The Item data type associates keys of type Identifier with values. The constructors for Identifier suggest that the Identifiers should be file names, but for some of the Items in my context, (e.g. for the author field; see variable authorItem), having a file name as a key does not make sense. I think I misinterpreted the purpose of this type. How should I think of these Items?
  • Is there a way to obtain the Context that the command line tool uses, when making the conversion? The default Context seems to be a lot more involved than my quick draft, e.g. it reads the abstract from the metadata of the Markdown file and puts every paragraph in between separate <p> ... </p> HTML tags. I know there is a metadataField :: Context a, but it does not seem to be what I want.

Apart from these concrete questions, the general question is:

  • Do I do this right at all or would there be a much simpler way of doing what I try to do (i.e. replicating the output of the initial pandoc shell command in Haskell with Hakyll)?
Share Improve this question edited Nov 21, 2024 at 23:05 user11718766 asked Nov 21, 2024 at 21:06 user11718766user11718766 2751 silver badge10 bronze badges 7
  • On writerStandalone no longer existing, you can nonetheless specify a template through the writerTemplate option. It's not clear that would be significantly better than your approach, though. – duplode Commented Nov 21, 2024 at 21:47
  • (It might actually be better, though, as in principle Pandoc would handle all those fields without you having to bother with defining them in a Hakyll context. I say "in principle" because I haven't tried to rely on writerTemplate to this extent in a Hakyll site.) – duplode Commented Nov 21, 2024 at 22:13
  • Thanks for your comments. Yes, I would prefer to just use getDefaultTemplate "html" or something like that to get the template, but I'm still trying to figure out how to do it. My current attempts do not type check yet. – user11718766 Commented Nov 21, 2024 at 22:19
  • ghc says "No instance for (PandocMonad Template) arising from a use of ‘getDefaultTemplate’", when I try setting {writerTemplate = Just (getDefaultTemplate "html")}. I don't see what the problem could be. The type of the writerTemplate field should be Maybe (Template Text) and getDefaultTemplate has type Text -> m Text. Any ideas? – user11718766 Commented Nov 21, 2024 at 22:51
  • The type of getDefaultTemplate is PandocMonad m => Text -> m Text so you need to run that with, for instance, runIOorExplode. – duplode Commented Nov 21, 2024 at 23:14
 |  Show 2 more comments

1 Answer 1

Reset to default 2

The nicest way to do that is probably using writerTemplate in Pandoc's WriterOptions to pass the default template, as given by compileDefaultTemplate:

main :: IO ()
main = do
    pandocTmpl <- runIOorExplode $ compileDefaultTemplate "html"
    let katexOpts = defaultHakyllWriterOptions
            { writerTemplate = Just pandocTmpl
            , writerHTMLMathMethod = KaTeX ""
            -- And whatever else you need.
            }
        -- Defining it this way because pandocCompilerWith strips
        -- the metadata block before handing the body to Pandoc.
        --
        -- I'm relying on Pandoc's built-in KaTeX support. If
        -- you'd rather stick with the pandoc-katex filter, you
        -- can use renderPandocWithTransformM to reshape the
        -- compiler you defined in the question in this fashion.
        katexCompiler = do
            fullItem <- getResourceString
            renderPandocWith defaultHakyllReaderOptions katexOpts fullItem

    hakyll $ do
        -- etc.
        match "input.md" $ do
            route $ setExtension ".html"
            compile katexCompiler

See also pandoc issue #10209, which points to a similar approach.


Side questions:

How should I think of these Items?

Item indeed is primarily meant for things bound to a file path in your site tree. Occasionally, it makes sense to use a fake path for the identifier — for instance, when synthesising some content with a create rule. However, that's not typically something one would want to do for the sake of setting a context field, as there likely are more straightforward ways to do that. (In particular, if, unlike in this answer, you are using Hakyll's templates, you don't have to explicitly define the fields that you include in the metadata headers of your source files, as Hakyll's defaultContext covers that already by including metadataField.)

Is there a way to obtain the Context that the command line tool uses, when making the conversion?

While Pandoc offers ways to manipulate its own metadata (which I have never used myself; Text.Pandoc.Writers.Shared might be a good place to start browsing), the template systems of Pandoc and Hakyll are similar-looking but distinct, and in particular Hakyll's Context type is not the same as its Pandoc counterpart.


On a final note, it is worth mentioning that if you were completely stuck trying to reproduce Pandoc's output within Hakyll, a last resort would be using unixFilter to set up a compiler that shells out to command-line Pandoc.

本文标签: