admin管理员组文章数量:1195318
I'm looking for an easy way to test if a string contains markdown. Currently I'm thinking to convert the string to HTML and then test if there has html with a simple regex but I wonder if there is a more succinct way to do it.
Here's what I've got so far
/<[a-z][\s\S]*>/i.test( markdownToHtml(string) )
I'm looking for an easy way to test if a string contains markdown. Currently I'm thinking to convert the string to HTML and then test if there has html with a simple regex but I wonder if there is a more succinct way to do it.
Here's what I've got so far
/<[a-z][\s\S]*>/i.test( markdownToHtml(string) )
Share
Improve this question
asked Jul 10, 2014 at 21:26
jwerrejwerre
9,58410 gold badges65 silver badges72 bronze badges
5
|
4 Answers
Reset to default 18I think you have to accept that it's impossible to know with certainty. Markdown borrows its syntax from existing customs—for example underscores for italics was popular on Usenet (though single asterisks meant bold, not italics as well). And of course, people have been using dashes as obvious substitutes for plaintext bullet points, long before Markdown.
Having decided it's subjective though, we may now embark on the task of determining degrees of likelihood that a piece of text contains Markdown. Here are some things I'd consider evidence for Markdown, in order of decreasing strength:
Consecutive lines beginning with
1.
, e.g.(^|[\n\r])\s*1\.\s.*\s+1\.\s
. (See the Markdown behind this answer, for example.) I'd consider this a dead giveaway, because there's even that joke:There are only two kinds of people in this world.
1. Those who understand Markdown.
1. And those who don't.
Link markdown, e.g.
\[[^]]+\]\(https?:\/\/\S+\)
.Double underscores or asterisks when a left-right pair (indicated by whether the whitespace is to the left or right, respectively) can be found, e.g.
\s(__|\*\*)(?!\s)(.(?!\1))+(?!\s(?=\1))
. Let me know if you want me to explain this one.
And so on. Ultimately, you'll have to come up with your own "scoring" system to determine the weight of each of these things. A good way to go about this would be to gather some sample inputs (if you have real ones, then even better), classify them manually as having Markdown or not, and running your regexes and scoring system to see what weights sort them out most accurately.
As @andrew-cheong pointed out, there is no way to know whether a string is a Markdown document or just plaintext with text structured in a Markdown fashion.
If you want to determine the degree of likelihood that a text is supposed to be Markdown, you can use the marked package as an alternative to using the regex approach:
import { marked } from 'marked';
export function isMarkdownValue(value: string): boolean {
const tokenTypes: string[] = [];
// https://marked.js.org/using_pro#tokenizer
marked(value, {
walkTokens: (token) => {
tokenTypes.push(token.type);
},
});
const isMarkdown = [
'space',
'code',
'fences',
'heading',
'hr',
'link',
'blockquote',
'list',
'html',
'def',
'table',
'lheading',
'escape',
'tag',
'reflink',
'strong',
'codespan',
'url',
].some((tokenType) => tokenTypes.includes(tokenType));
return isMarkdown;
}
This is just a simple example implementation using the walkTokens
option of the marked package: https://marked.js.org/using_pro#tokenizer
This way you can easily implement any kind of detection logic based on the actual parsing of the potential markdown tokens. You could also implement a likelihood score instead of returning true or false.
You can get map of tokens
from marked library and recursively check if it has markdown
related token types (strong
, link
etc). This is similar to derbenoo's answer but more complete:
import * as marked from "marked";
function isMarkdownValue(text: string): boolean {
function containsNonTextTokens(tokens) {
return tokens.some(token => {
if (token.type !== 'text' && token.type !== 'paragraph' ) { // change this as per your needs
return true;
}
// Check recursively for nested tokens
if (token.tokens && containsNonTextTokens(token.tokens)) {
return true;
}
return false;
});
}
// Use the lexer to tokenize the input without rendering it to HTML
const tokens = marked.lexer(text);
// Check if the tokens contain any Markdown elements
return containsNonTextTokens(tokens);
}
I've implemented the regular expression approach in very-small-parser
, the code looks something like this:
// Headings H1-H6.
const h1 = /(^|\n) {0,3}#{1,6} {1,8}[^\n]{1,64}\r?\n\r?\n\s{0,32}\S/;
// Bold, italic, underline, strikethrough, highlight.
const bold = /(?:\s|^)(_|__|\*|\*\*|~~|==|\+\+)(?!\s).{1,64}(?<!\s)(?=\1)/;
// Basic inline link (also captures images).
const link = /\[[^\]]{1,128}\]\(https?:\/\/\S{1,999}\)/;
// Inline code.
const code = /(?:\s|^)`(?!\s)[^`]{1,48}(?<!\s)`([^\w]|$)/;
// Unordered list.
const ul = /(?:^|\n)\s{0,5}\-\s{1}[^\n]+\n\s{0,15}\-\s/;
// Ordered list.
const ol = /(?:^|\n)\s{0,5}\d+\.\s{1}[^\n]+\n\s{0,15}\d+\.\s/;
// Horizontal rule.
const hr = /\n{2} {0,3}\-{2,48}\n{2}/;
// Fenced code block.
const fences = /(?:\n|^)(```|~~~|\$\$)(?!`|~)[^\s]{0,64} {0,64}[^\n]{0,64}\n[\s\S]{0,9999}?\s*\1 {0,64}(?:\n+|$)/;
// Classical underlined H1 and H2 headings.
const title = /(?:\n|^)(?!\s)\w[^\n]{0,64}\r?\n(\-|=)\1{0,64}\n\n\s{0,64}(\w|$)/;
// Blockquote.
const blockquote = /(?:^|(\r?\n\r?\n))( {0,3}>[^\n]{1,333}\n){1,999}($|(\r?\n))/;
/**
* Returns `true` if the source text might be a markdown document.
*
* @param src Source text to analyze.
*/
export const is = (src: string): boolean =>
h1.test(src) ||
bold.test(src) ||
link.test(src) ||
code.test(src) ||
ul.test(src) ||
ol.test(src) ||
hr.test(src) ||
fences.test(src) ||
title.test(src) ||
blockquote.test(src);
Call the is
function to return you the answer:
is('__bold__'); // true
is('hello world'); // false
The advantage of the regular expression approach is that the whole code you can see on the screen above, you don't need to ship tens of KB parser.
本文标签: javascriptHow to test if a string has Markdown in itStack Overflow
版权声明:本文标题:javascript - How to test if a string has Markdown in it - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1738515272a2091048.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
<
– Reactgular Commented Jul 10, 2014 at 21:30