admin管理员组文章数量:1334697
I currently have some code that uses marked.js to transform one big markdown string (read from a .md file) into html for display on the browser. 'md' is the markdown string and calling 'marked(md)' translates it to html.
getContent(filePath)
.then(response => {
if (!response.ok) {
return Promise.reject(response);
}
return response.text().then(md => setContent(marked(md)));
})
.catch(e => Dialog.error('Page failed to load!', e));
}, [filePath]);
How can I (either using marked.js, or another solution) parse the markdown/html to get only the text values? Some sample Markdown below.
### HEADER TEXT
---
# Some Page Title
<a href="cafe" target="_blank">Go to Cafe Page</a>
<Cafe host>/portos/cafe
## Links
- ##### [Tacos](#cafe_tacos)
- ##### [Burritos](#cafe_burritos)
- ##### [Bebidas](#cafe_bebidas)
## Overview
This is the overview text for the page. I really like tacos and burritos.
[](some/path/to/images/hello.png)
## Dining <a name="dining"></a>
Dining is foo bar burrito taco mulita.
[](some/path/to/images/hello2.png)
The cafe has been open since 1661. It has lots of food.
It was declared the top 1 cafe of all time.
### How to order food
You can order food by ordering food.
<div class="alert alert-info">
<strong> Note: </strong> TACOS ARE AMAZING.
</div>
I currently have some code that uses marked.js to transform one big markdown string (read from a .md file) into html for display on the browser. 'md' is the markdown string and calling 'marked(md)' translates it to html.
getContent(filePath)
.then(response => {
if (!response.ok) {
return Promise.reject(response);
}
return response.text().then(md => setContent(marked(md)));
})
.catch(e => Dialog.error('Page failed to load!', e));
}, [filePath]);
How can I (either using marked.js, or another solution) parse the markdown/html to get only the text values? Some sample Markdown below.
### HEADER TEXT
---
# Some Page Title
<a href="cafe" target="_blank">Go to Cafe Page</a>
<Cafe host>/portos/cafe
## Links
- ##### [Tacos](#cafe_tacos)
- ##### [Burritos](#cafe_burritos)
- ##### [Bebidas](#cafe_bebidas)
## Overview
This is the overview text for the page. I really like tacos and burritos.
[](some/path/to/images/hello.png)
## Dining <a name="dining"></a>
Dining is foo bar burrito taco mulita.
[](some/path/to/images/hello2.png)
The cafe has been open since 1661. It has lots of food.
It was declared the top 1 cafe of all time.
### How to order food
You can order food by ordering food.
<div class="alert alert-info">
<strong> Note: </strong> TACOS ARE AMAZING.
</div>
Share
Improve this question
asked Oct 18, 2022 at 19:05
BarryBarry
2561 gold badge4 silver badges14 bronze badges
1
- 1 You might investigate mdast, which creates a usable syntax tree from markdown text. You would still need to do the work of pulling the data out of the AST, but that should be an easier task – Scott Sauyet Commented Oct 18, 2022 at 20:27
3 Answers
Reset to default 3One way to do it is by parsing the HTML string with DOMParser API to turn your string into a Document
object and then walk through it with a TreeWalker object to get the textContent
of each Text
node in the HTML. The result should be an array of strings.
function parseTextFromMarkDown(mdString) {
const htmlString = marked(mdString);
const parser = new DOMParser();
const doc = parser.parseFromString(htmlString, 'text/html');
const walker = document.createTreeWalker(doc, NodeFilter.SHOW_TEXT);
const textList = [];
let currentNode = walker.currentNode;
while(currentNode) {
textList.push(currentNode.textContent);
currentNode = walker.nextNode();
}
return textList;
}
While I think Emiel already gave the best answer, another approach would be to use the abstract syntax tree created by Marked's parser, mdast. Then we can walk the syntax tree extracting all the text, bining it into reasonable output. One approach looks like this:
const astToText = ((types) => ({type, children = [], ...rest}) =>
(types [type] || types .default) (children .map (astToText), rest)
)(Object .fromEntries (Object .entries ({
'default': () => ` *** Missing type: ${type} *** `,
'root': (ns) => ns .join ('\n'),
'heading, paragraph': (ns) => ns .join ('') + '\n',
'text, code': (ns, {value}) => value,
'html': (ns, {value}) =>
new DOMParser () .parseFromString (value, 'text/html') .textContent,
'listItem, link, emphasis': (ns) => ns .join (''),
'list': (ns, {ordered}) => ordered
? ns .map ((n, i) => `${i + 1} ${n}`) .join ('\n')
: ns .map ((n) => `• ${n}`) .join ('\n'),
'image': (ns, {title, url, alt}) => `Image "${title}" ("${alt}" - ${url})`,
// ... probably many more
}) .flatMap (([k, v]) => k .split (/,\s*/) .map (n => [n, v]))))
// import {fromMarkdown} from 'mdast-util-from-markdown'
// const ast = fromMarkdown (<your string>)
// dummy version
const ast = {type: "root", children: [{type: "heading", depth:1, children: [{type: "text", value: "Some Page Title", children: []}]}, {type: "paragraph", children: [{type: "html", value: '<a href="cafe" target="_blank">', children: []}, {type: "text", value: "Go to Cafe Page", children: []}, {type: "html", value: "</a>", children: []}]}, {type: "code", lang:null, meta:null, value: "<Cafe host>/portos/cafe", children: []}, {type: "heading", depth:2, children: [{type: "text", value: "Links", children: []}]}, {type: "list", ordered:!1, start:null, spread:!1, children: [{type: "listItem", spread:!1, checked:null, children: [{type: "heading", depth:5, children: [{type: "link", title:null, url: "#cafe_tacos", children: [{type: "text", value: "Tacos", children: []}]}]}]}, {type: "listItem", spread:!1, checked:null, children: [{type: "heading", depth:5, children: [{type: "link", title:null, url: "#cafe_burritos", children: [{type: "text", value: "Burritos", children: []}]}]}]}, {type: "listItem", spread:!1, checked:null, children: [{type: "heading", depth:5, children: [{type: "link", title:null, url: "#cafe_bebidas", children: [{type: "text", value: "Bebidas", children: []}]}]}]}]}, {type: "heading", depth:2, children: [{type: "text", value: "Overview", children: []}]}, {type: "paragraph", children: [{type: "text", value: "This is the overview text for the page. I really like tacos and burritos.", children: []}]}, {type: "paragraph", children: [{type: "link", title:null, url: "some/path/to/images/hello.png", children: [{type: "image", title: "Tacos", url: "some/path/to/images/hello.png", alt: "Taco Tabs", children: []}]}]}, {type: "heading", depth:2, children: [{type: "text", value: "Dining ", children: []}, {type: "html", value: '<a name="dining">', children: []}, {type: "html", value: "</a>", children: []}]}, {type: "paragraph", children: [{type: "text", value: "Dining is foo bar burrito taco mulita.", children: []}]}, {type: "paragraph", children: [{type: "link", title:null, url: "some/path/to/images/hello2.png", children: [{type: "image", title: "Cafe Overview", url: "some/path/to/images/hello2.png", alt: "Cafe Overview", children: []}]}]}, {type: "paragraph", children: [{type: "text", value: "The cafe has been open since 1661. It has lots of food.", children: []}]}, {type: "paragraph", children: [{type: "text", value: "It was declared the top 1 cafe of all time.", children: []}]}, {type: "heading", depth:3, children: [{type: "text", value: "How to order food", children: []}]}, {type: "paragraph", children: [{type: "text", value: "You can order food by ordering food.", children: []}]}, {type: "html", value: '<div class="alert alert-info">\n <strong> Note: </strong> TACOS ARE AMAZING.\n</div>', children: []}]}
console .log (astToText (ast))
.as-console-wrapper {max-height: 100% !important; top: 0}
The advantage of this approach over the plain HTML one is that we can decide how certain nodes are rendered in plain text. For instance, here we choose to render this image markup:

as
Image "Tacos" ("Taco Tabs" - some/path/to/images/hello.png)
Of course HTML nodes are still going to be problematic. Here I use DOMParser
and .textContent
, but you could just add it to text, code
to include the raw HTML text.
Each function passed to the configuration receives a list of already formatted children as well as the remainder of the node,
You can also explore https://www.npmjs./package/markdown-to-txt.
import markdownToTxt from 'markdown-to-txt';
markdownToTxt('Some *bold text*'); // "Some quoted"
本文标签: htmlHow to get only text values from a markdown string in JavascriptStack Overflow
版权声明:本文标题:html - How to get only text values from a markdown string in Javascript - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742373929a2462780.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论