javascript - Regex character count, but some count for three - Stack Overflow

IT技术

更新时间：2025-03-173

admin管理员组
文章数量:1327301

I'm trying to build a regular expression that places a limit on the input length, but not all characters count equal in this length. I'll put the rationale at the bottom of the question. As a simple example, let's limit the maximum length to 12 and allow only a and b, but b counts for 3 characters.

Allowed are:

aa (anything less than 12 is fine).
aaaaaaaaaaaa (exactly 12 is fine).
aaabaaab (6 + 2 * 3 = 12, which is fine).
abaaaaab (still 6 + 2 * 3 = 12).

Disallowed is:

aaaaaaaaaaaaa (13 a's).
bbbba (1 + 4 * 3 = 13, which is too much).
baaaaaaab (7 + 2 * 3 = 13, which is too much).

I've made an attempt that gets fairly close:

^(a{0,3}|b){0,4}$

This matches on up to 4 clusters that may consist of 0-3 a's or one b.

However, it fails to match on my last positive example: abaaaaab, because that forces the first cluster to be the single a at the beginning, consumes a second cluster for the b, then leaves only 2 more clusters for the rest, aaaaab, which is too long.

Constraints

Must run in JavaScript. This regex is supplied to Qt, which apparently uses JavaScript's syntax.
Doesn't really need to be fast. In the end it'll only be applied to strings of up to 40 characters. I hope it validates within 50ms or so, but slightly slower is acceptable.

Rationale

Why do I need to do this with a regular expression?

It's for a user interface in Qt via PyQt and QML. The user can type a name in a text field here for a profile. This profile name is url-encoded (special characters are replaced by %XX), and then saved on the user's file system. We encounter problems when the user types a lot of special characters, such as Chinese, which then encode to a very long file name. Turns out that at somewhere like 17 characters, this file name bees too long for some file systems. The URL-encoding encodes as UTF-8, which has up to 4 bytes per character, resulting in up to 12 characters in the file name (as each of these gets percent-encoded).

16 characters is too short for profile names. Even some of our default names exceed that. We need a variable limit based on these special characters.

Qt normally allows you to specify a Validator to determine which values are acceptable in a text box. We tried implementing such a validator, but that resulted in a segfault upstream, due to a bug in PyQt. It can't seem to handle custom Validator implementations at the moment. However, PyQt also exposes three built-in validators. Two apply only to numbers. The third is a regex validator that allows you to put a regular expression that matches all valid strings. Hence the need for this regular expression.

I'm trying to build a regular expression that places a limit on the input length, but not all characters count equal in this length. I'll put the rationale at the bottom of the question. As a simple example, let's limit the maximum length to 12 and allow only a and b, but b counts for 3 characters.

Allowed are:

aa (anything less than 12 is fine).
aaaaaaaaaaaa (exactly 12 is fine).
aaabaaab (6 + 2 * 3 = 12, which is fine).
abaaaaab (still 6 + 2 * 3 = 12).

Disallowed is:

aaaaaaaaaaaaa (13 a's).
bbbba (1 + 4 * 3 = 13, which is too much).
baaaaaaab (7 + 2 * 3 = 13, which is too much).

I've made an attempt that gets fairly close:

^(a{0,3}|b){0,4}$

This matches on up to 4 clusters that may consist of 0-3 a's or one b.

However, it fails to match on my last positive example: abaaaaab, because that forces the first cluster to be the single a at the beginning, consumes a second cluster for the b, then leaves only 2 more clusters for the rest, aaaaab, which is too long.

Constraints

Must run in JavaScript. This regex is supplied to Qt, which apparently uses JavaScript's syntax.
Doesn't really need to be fast. In the end it'll only be applied to strings of up to 40 characters. I hope it validates within 50ms or so, but slightly slower is acceptable.

Rationale

Why do I need to do this with a regular expression?

It's for a user interface in Qt via PyQt and QML. The user can type a name in a text field here for a profile. This profile name is url-encoded (special characters are replaced by %XX), and then saved on the user's file system. We encounter problems when the user types a lot of special characters, such as Chinese, which then encode to a very long file name. Turns out that at somewhere like 17 characters, this file name bees too long for some file systems. The URL-encoding encodes as UTF-8, which has up to 4 bytes per character, resulting in up to 12 characters in the file name (as each of these gets percent-encoded).

16 characters is too short for profile names. Even some of our default names exceed that. We need a variable limit based on these special characters.

Qt normally allows you to specify a Validator to determine which values are acceptable in a text box. We tried implementing such a validator, but that resulted in a segfault upstream, due to a bug in PyQt. It can't seem to handle custom Validator implementations at the moment. However, PyQt also exposes three built-in validators. Two apply only to numbers. The third is a regex validator that allows you to put a regular expression that matches all valid strings. Hence the need for this regular expression.

Share Improve this question asked Oct 28, 2016 at 1:32 Ghostkeeper 3,0501 gold badge19 silver badges30 bronze badges

I can make this regex without much trouble, but I feel dirty doing it. I've made several attempts, but can't make a good, generic solution that can be expanded for longer strings (length 13 for example) or higher values (b=4 for example) – Addison Commented Oct 28, 2016 at 4:29
Could you not length the submitted name (after url-encoding) then decide to accept or reject it? Seems the simplest solution. – A. L Commented Oct 28, 2016 at 5:00
2 I'm bookmarking this question as my point of reference on how to ask a good regex question. Too many regex questions out there are sloppily written, unspecific and unclear. This is perfect. – Tim Pietzcker Commented Oct 28, 2016 at 5:28
@A.Lau That's impossible. I've tried a solution where I could write my own validator via PyQt, but that resulted in a segfault. We traced that to a bug in PyQt and submitted a chreq for Riverbank Solutions. I'm therefore limited to using one of their built-in validators. The only validator that applies to other stuff than numbers is the RegExpValidator. – Ghostkeeper Commented Oct 28, 2016 at 8:04

Add a ment |

3 Answers 3

Sorted by: Reset to default 6

There is no real straightforward way to do this, given the limitations of regexp. You're going to have to test for all binations, such as thirteen b with up to one a, twelve b with up to four a, and so on. We will build a little program to generate these for us. The basic format for testing for up to four a will be

/^(?=([^a]*a){0,4}[^a]*$)/

We'll write a little routine to create these lookaheads for us, given some letter and a minimum and maximum number of occurrences:

function matchLetter(c, m, n) {
  return `(?=([^${c}]*${c}){${m},${n}}[^${c}]*$)`;
}

> matchLetter('a', 0, 4)
< "(?=([^a]*a){0,4}[^a]*$)"

We can bine these to test for three b with up to three a:

/^(?=([^b]*b){3}[^b]*$)(?=([^a]*a){0,3}[^a]*$)/

We will write a function to create such bined lookaheads which matches exactly m occurrences of c1 and up to n occurrences of c2:

function matchTwoLetters(c1, m, c2, n) {
  return matchLetter(c1, m, m) + matchLetter(c2, 0, n);
}

We can use this to match exactly twelve b and up to four a, for a total of forty or less:

> matchTwoLetters('b', 12, 'a', 1, 4)
< "(?=([^b]*b){12,12}[^b]*$)(?=([^a]*a){0,4}[^a]*$)"

It remains to simply create versions of this for each count of b, and glom them together (for the case of a max count of 12):

function makeRegExp() {
  const res = [];
  for (let bs = 0; bs <= 4; bs++)
    res.push(matchTwoLetters('b', bs, 'a', 12 - bs*3));
  return new RegExp(`^(${res.join('|')})`);
}

> makeRegExp()
< "^((?=([^b]*b){0,0}[^b]*$)(?=([^a]*a){0,12}[^a]*$)|(?=([^b]*b){1,1}[^b]*$)(?=([^a]*a){0,9}[^a]*$)|(?=([^b]*b){2,2}[^b]*$)(?=([^a]*a){0,6}[^a]*$)|(?=([^b]*b){3,3}[^b]*$)(?=([^a]*a){0,3}[^a]*$)|(?=([^b]*b){4,4}[^b]*$)(?=([^a]*a){0,0}[^a]*$))"

Now you can do the test with

makeRegExp().test("baabaaa");

For the case of length=40, the regxp is 679 characters long. A very rough benchmark shows that it executes in under a microsecond.

If you want to count bytes when multibyte encoding is present, you can use this function:

function bytesLength(str) {
  var s = str.length;
  for (var i = s-1; i > -1; i--) {
    var code = str.charCodeAt(i);
    if (code > 0x7f && code <= 0x7ff) {s++;}
    else if (code > 0x7ff && code <= 0xffff) {s+=2;}
    if (code >= 0xDC00 && code <= 0xDFFF) {i--;}
  }
  return s;
}

console.log(bytesLength('敗')); // length 3

Try using something like this:

^((a{1,3}|b){1,4}|(a{1,4}|a?b|ba){1,3}|((a{2,3}|b){2}|aaba|abaa){2})$

Example: https://regex101./r/yTTiEX/6

This breaks it up into the logical possibilities:

4 parts, each with a value up to 3.
3 parts, each with a value up to 4.
2 parts, each with a value up to 6.

本文标签： javascriptRegex character count but some count for threeStack Overflow

版权声明：本文标题：javascript - Regex character count, but some count for three - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1742204223a2432514.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

javascript - Regex character count, but some count for three - Stack Overflow

Constraints

Rationale

Constraints

Rationale

3 Answers 3

更多相关文章

javascript - Regex character count, but some count for three - Stack Overflow

发表评论

推荐文章

javascript - How to make first array the header and remaining arrays rows of objects - Stack Overflow

javascript - Rendering React components with WebWorkers - Stack Overflow

firefox - &quot;throw&quot; a warning in JavaScript? - Stack Overflow

javascript - Opera preventDefault() on keydown event - Stack Overflow

updates - Replace or Alter the wp_version_check() Function

热门文章

javascript - Flow property is missing in mixed passed - Stack Overflow

javascript - It&#39;is possible to create an alphabetic vertical sidebar in HTMLjQuery? - Stack Overflow

php - Removing jQuery migrate and working with dependencies

front end - Simple reservation system for a card game

javascript - Wordpress Ajax custom taxonomy - Stack Overflow

javascript - Base64 value is not working with video tag? - Stack Overflow

javascript - Whats the best practice for storing Basic Auth password? - Stack Overflow

Google SheetsJavascript -&gt; Loop through array and set the values to the cells - Stack Overflow

javascript - Testing for numeric input with regular expression - Stack Overflow

javascript - Width of Browser minus 25% - Stack Overflow

最新文章

重装系统后github的各项配置

电脑重装系统的PE工具

使用专业工具一键重装Windows7系统的详细步骤

系统装机必备神器：石大师一键重装系统详解

“windows安装程序无法将windows配置为在此计算机上运行“原因分析及解决方法

proxy - How to be sure to get last page version without ANY cache?

javascript - Does seeking an HTML5 video require loading the whole file? - Stack Overflow

javascript - How to add regular expression to filter out the xhr URL in Cypress - Stack Overflow

javascript - How to I undo .detach()? - Stack Overflow

javascript - react native &quot;attempt to set value to an immutable object&quot; - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

firefox - "throw" a warning in JavaScript? - Stack Overflow

javascript - It'is possible to create an alphabetic vertical sidebar in HTMLjQuery? - Stack Overflow

Google SheetsJavascript -> Loop through array and set the values to the cells - Stack Overflow

javascript - react native "attempt to set value to an immutable object" - Stack Overflow