admin管理员组

文章数量:1389809

trying to divide string into subsections using std::regex.

Using this regular expression string:

"(^([\s\S]+?)\s*(\W)\s*[\r\n]+A\.\s*((?:.+(?:[\r\n](?![B-D]\.))*)+)[\r\n]+B\.\s*((?:.+(?:[\r\n](?![CD]\.))*)+)[\r\n]+C\.\s*((?:.+(?:[\r\n](?!D\.))*)+)[\r\n]+D\.\s*([\s\S]+)$)"

Most of the time it is working fine, but on the specific text it throws an exception:

Unhandled exception:
Unhandled exception at 0x00007FF7C93FCF66 in Convert.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x000000FE83A43F98).

in the xtr1common: _EXPORT_STD _NODISCARD constexpr bool is_constant_evaluated() noexcept { return __builtin_is_constant_evaluated(); } #endif // _HAS_CXX20

the string:

The portfolio manager from your division thought it might be helpful to the project teams if she delivered a short presentation on the elements in her portfolio. A number of team members, after receiving the e- mail announcement for the presentation, come to you and ask if this meeting is worth their time. After all isn’t a portfolio just a big project? As a Senior Project manager your best response would be:
A.  You’re right. The meeting probably would be a waste of your time
B.  Not really. A portfolio is a group of related projects managed together to achieve synergies between the projects and establish common methods and procedures.
C.  Not really. A portfolio can be a group of programs, projects, or sub-projects designed to help the anization meet specific business goals
 
D.  Not really. A portfolio is a collection of documents, methods, and procedures that help us manage projects

Does it mean the std regex is buggy?

best regards

#include <string>
#include <regex>
#include <iostream>

using std::string;
using std::regex;
using std::smatch;

int main()
{
    string s = R"(The portfolio manager from your division thought it might be helpful to the project teams if she delivered a short presentation on the elements in her portfolio. A number of team members, after receiving the e- mail announcement for the presentation, come to you and ask if this meeting is worth their time. After all isn’t a portfolio just a big project? As a Senior Project manager your best response would be:
A.  You’re right. The meeting probably would be a waste of your time
B.  Not really. A portfolio is a group of related projects managed together to achieve synergies between the projects and establish common methods and procedures.
C.  Not really. A portfolio can be a group of programs, projects, or sub-projects designed to help the anization meet specific business goals
 
D.  Not really. A portfolio is a collection of documents, methods, and procedures that help us manage projects)";
    string pattern = R"(^([\s\S]+?)\s*(\W)\s*[\r\n]+A\.\s*((?:.+(?:[\r\n](?![B-D]\.))*)+)[\r\n]+B\.\s*((?:.+(?:[\r\n](?![CD]\.))*)+)[\r\n]+C\.\s*((?:.+(?:[\r\n](?!D\.))*)+)[\r\n]+D\.\s*([\s\S]+)$)";

    regex questionRegex(pattern, std::regex_constants::ECMAScript /* | std::regex_constants::collate */);
    smatch question_match;
    try {
        if (regex_search(s, question_match, questionRegex)) {
            std::cout << "Query: " << question_match[1] << "\n";
            std::cout << "A: " << question_match[2] << "\n";
            std::cout << "B: " << question_match[3] << "\n";
            std::cout << "C: " << question_match[4] << "\n";
            std::cout << "D: " << question_match[5] << "\n";
        }
    }
    catch (std::regex_error & error)
    {
        std::cout << "error" << error.what();
    }

    return 0;
}

trying to divide string into subsections using std::regex.

Using this regular expression string:

"(^([\s\S]+?)\s*(\W)\s*[\r\n]+A\.\s*((?:.+(?:[\r\n](?![B-D]\.))*)+)[\r\n]+B\.\s*((?:.+(?:[\r\n](?![CD]\.))*)+)[\r\n]+C\.\s*((?:.+(?:[\r\n](?!D\.))*)+)[\r\n]+D\.\s*([\s\S]+)$)"

Most of the time it is working fine, but on the specific text it throws an exception:

Unhandled exception:
Unhandled exception at 0x00007FF7C93FCF66 in Convert.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x000000FE83A43F98).

in the xtr1common: _EXPORT_STD _NODISCARD constexpr bool is_constant_evaluated() noexcept { return __builtin_is_constant_evaluated(); } #endif // _HAS_CXX20

the string:

The portfolio manager from your division thought it might be helpful to the project teams if she delivered a short presentation on the elements in her portfolio. A number of team members, after receiving the e- mail announcement for the presentation, come to you and ask if this meeting is worth their time. After all isn’t a portfolio just a big project? As a Senior Project manager your best response would be:
A.  You’re right. The meeting probably would be a waste of your time
B.  Not really. A portfolio is a group of related projects managed together to achieve synergies between the projects and establish common methods and procedures.
C.  Not really. A portfolio can be a group of programs, projects, or sub-projects designed to help the anization meet specific business goals
 
D.  Not really. A portfolio is a collection of documents, methods, and procedures that help us manage projects

Does it mean the std regex is buggy?

best regards

#include <string>
#include <regex>
#include <iostream>

using std::string;
using std::regex;
using std::smatch;

int main()
{
    string s = R"(The portfolio manager from your division thought it might be helpful to the project teams if she delivered a short presentation on the elements in her portfolio. A number of team members, after receiving the e- mail announcement for the presentation, come to you and ask if this meeting is worth their time. After all isn’t a portfolio just a big project? As a Senior Project manager your best response would be:
A.  You’re right. The meeting probably would be a waste of your time
B.  Not really. A portfolio is a group of related projects managed together to achieve synergies between the projects and establish common methods and procedures.
C.  Not really. A portfolio can be a group of programs, projects, or sub-projects designed to help the anization meet specific business goals
 
D.  Not really. A portfolio is a collection of documents, methods, and procedures that help us manage projects)";
    string pattern = R"(^([\s\S]+?)\s*(\W)\s*[\r\n]+A\.\s*((?:.+(?:[\r\n](?![B-D]\.))*)+)[\r\n]+B\.\s*((?:.+(?:[\r\n](?![CD]\.))*)+)[\r\n]+C\.\s*((?:.+(?:[\r\n](?!D\.))*)+)[\r\n]+D\.\s*([\s\S]+)$)";

    regex questionRegex(pattern, std::regex_constants::ECMAScript /* | std::regex_constants::collate */);
    smatch question_match;
    try {
        if (regex_search(s, question_match, questionRegex)) {
            std::cout << "Query: " << question_match[1] << "\n";
            std::cout << "A: " << question_match[2] << "\n";
            std::cout << "B: " << question_match[3] << "\n";
            std::cout << "C: " << question_match[4] << "\n";
            std::cout << "D: " << question_match[5] << "\n";
        }
    }
    catch (std::regex_error & error)
    {
        std::cout << "error" << error.what();
    }

    return 0;
}
Share Improve this question edited Mar 14 at 13:03 Mike asked Mar 14 at 12:40 MikeMike 294 bronze badges 14
  • The answer is B, right? ;-) – Peter - Reinstate Monica Commented Mar 14 at 12:59
  • 2 Wouldn't surprise me if this monster of a regexp involves too much backtracking. Break it up into a regexp for the question and a regexp for an answer, applied multiple times. – Botje Commented Mar 14 at 12:59
  • 1 Backtracking patterns are inherently sensitive to input string lengths. – Botje Commented Mar 14 at 13:16
  • 2 Obligatory "std::regex is so problematic, the standards committee debated immediately deprecating it" ... What compiler/version are you using? I'm having trouble reproducing your issue. – Drew Dormann Commented Mar 14 at 13:39
  • 1 As Shown on DrewDormann repro, the answer is not what you expect neither, whereas regex101 gives expected result – Jarod42 Commented Mar 14 at 14:15
 |  Show 9 more comments

1 Answer 1

Reset to default 1

Ok, I found a solution wrote by user557597. By adding these flags:

#define _REGEX_MAX_STACK_COUNT 200000
#define _REGEX_MAX_COMPLEXITY_COUNT   1000000000L

it extends the stack size and loop count.

best regards

本文标签: cstdregex throws exception during parsing stringStack Overflow