javascript - How to tokenize markdown using Node.js? - Stack Overflow

IT技术

更新时间：2025-02-040

admin管理员组
文章数量:1201793

Im building an iOS app that have a view that is going to have its source from markdown.

My idea is to be able to parse markdown stored in MongoDB into a JSON-object that looks something like:

{
    "h1": "This is the heading",
    "p" : "Heres the first paragraph",
    "link": {
        "text": "Text for link",
        "url": "",
    }
}

On the server I am running Node.js, and was looking at the module marked which seem to be the most popular one out there. It gives me access to the Lexer, which is tokenizing the markdown to some custom object. But when I look at the object, it doesnt tokenize the link. If I go ahead and parse the markdown to HTML, the link is detected and the HTML looks correct.

After looking into some more modules, and failing I thought that maybe I could do this on the client instead and found MMMarkdown which seemed promising, but then again .. that worked fine when parsing directly to HTML, but when stepping in between and just parsing the markdown to the so called MMDocument, it did not consist of any MMElement of type Link.

So, is there anything fundamental about markdown parsing that I am missing? Is the lexing of the inline links supposed to be done in a second round, or something? I cant get my head around it.

If nothing else works, I might just go with using a UIWebView filled withed the HTML from the parsed markdown, but then we have to design the whole thing again, but with CSS, and we are running out of time so we cant reallt afford the double work.

Im building an iOS app that have a view that is going to have its source from markdown.

My idea is to be able to parse markdown stored in MongoDB into a JSON-object that looks something like:

{
    "h1": "This is the heading",
    "p" : "Heres the first paragraph",
    "link": {
        "text": "Text for link",
        "url": "http://exampledomain.com",
    }
}

On the server I am running Node.js, and was looking at the module marked which seem to be the most popular one out there. It gives me access to the Lexer, which is tokenizing the markdown to some custom object. But when I look at the object, it doesnt tokenize the link. If I go ahead and parse the markdown to HTML, the link is detected and the HTML looks correct.

After looking into some more modules, and failing I thought that maybe I could do this on the client instead and found MMMarkdown which seemed promising, but then again .. that worked fine when parsing directly to HTML, but when stepping in between and just parsing the markdown to the so called MMDocument, it did not consist of any MMElement of type Link.

So, is there anything fundamental about markdown parsing that I am missing? Is the lexing of the inline links supposed to be done in a second round, or something? I cant get my head around it.

If nothing else works, I might just go with using a UIWebView filled withed the HTML from the parsed markdown, but then we have to design the whole thing again, but with CSS, and we are running out of time so we cant reallt afford the double work.

Share Improve this question asked Feb 26, 2014 at 12:43 bobmoff 2,4854 gold badges25 silver badges32 bronze badges

Add a comment |

3 Answers 3

Sorted by: Reset to default 11

Although this question is already quite a few years old, I wanted to give a little update.

I found the combination of unified and remark-parse a good fit for my situation. After installing those packages (with npm, yarn, pnpm or your most favourite js package manager) I wrote a little test script as follows:

const unified = require('unified');
const markdown = require('remark-parse');

const tokens = unified()
  .use(markdown)
  .parse('# Hello world');

console.log(tokens);

This of course generates a token tree and needs further processing.

Maybe this is useful for someone else who stumbled upon this question.

Did you look at https://github.com/evilstreak/markdown-js ?

It seems to give you access to the syntax tree.

For example:

var md = require( "markdown" ).markdown,
text = "Header\n---------------\n\n" +
       "This is a paragraph\n\n" +
"This is [an example](http://example.com/ \"Title\") inline link.";

// parse the markdown into a tree and grab the link references
var tree = md.parse( text );

console.log(JSON.stringify(tree));

produces

[
    "markdown",
    [
        "header",
        {
            "level": 2
        },
        "Header"
    ],
    [
        "para",
        "This is a paragraph"
    ],
    [
        "para",
        "This is ",
        [
            "link",
            {
                "href": "http://example.com/",
                "title": "Title"
            },
            "an example"
        ],
        " inline link."
    ]
]

Here's the code that I ended up using instead.

var nodes = markdownText.split('\r\n');
var content = [];

nodes.forEach(function(node) {

    // Heading 2
    if (node.indexOf('##') == 0) {
        content.push({
            h2: node.replace('##','')
        })
    }

    // Heading 1
    else if (node.indexOf('#') == 0) {
        content.push({
            h1: node.replace('#','')
        })
    }

    // Link (Text + URL)
    else if (node.indexOf('[') == 0) {
        var matches = node.match(/\[(.*)\]\((.*)\)/);
        content.push({
            link: {
                text: matches[1],
                url: matches[2]
            }
        })
    }

    // Paragraph
    else if (node.length > 0) {
        content.push({
            p: node
        })
    }

});

I know this matching is very non-forgiving, but in our case it works fine.

本文标签： javascriptHow to tokenize markdown using NodejsStack Overflow

版权声明：本文标题：javascript - How to tokenize markdown using Node.js? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1738616380a2102931.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

javascript - How to tokenize markdown using Node.js? - Stack Overflow

3 Answers 3

更多相关文章