regex - Custom String statement to mathematical operation [PYTHON] - Stack Overflow

IT技术

更新时间：2025-01-127

admin管理员组
文章数量:1125624

I am working on a solution to convert the mathematical instruction given in string statement to mathematical formula. In this approach, one column will contain the given string instructions and in second column would contain the mathematical formula. e.g :div(mul(mstr,mul(div(baseline_year,transaction_yr),spnd_val)),1000) -->((mstr*((baseline_year*transaction_yr)/spnd_val))/1000)

To achieve the above output I am using below python code.

import re
import pandas as pd
 
def get_operations(calculation):
    # Use regular expression to find all occurrences of add, mul, div, sub
    operations = re.findall(r'\b(add|mul|div|sub)\b', calculation)
    return operations
 
def sort_operations(operations):
    priority = {'mul': 1, 'div': 2, 'add': 3, 'sub': 4}
    return sorted(operations, key=lambda op: priority[op])
# function to generate the column expression and update the new column name
def replace_operations(expression):
    # Required mathematical operations and theirs corresponding regex
    patterns = {
        'mul': repile(r'mul\(([^,]+),([^)]+)\)'),
        'div': repile(r'div\(([^,]+),([^)]+)\)'),
        'add': repile(r'add\(([^,]+),([^)]+)\)'),
        'sub': repile(r'sub\(([^,]+),([^)]+)\)')
    }
   
    # replace function will help to replace the matched expression with the corresponding mathematical operation
    def replace(match):
        op = match.group(0)
        if 'mul' in op:
            return f"({match.group(1)}*{match.group(2)})"
        elif 'div' in op:
            return f"({match.group(1)}/{match.group(2)})"
        elif 'add' in op:
            return f"({match.group(1)}+{match.group(2)})"
        elif 'sub' in op:
            return f"({match.group(1)}-{match.group(2)})"
   
    # Apply patterns in BODMAS order
    priority = {'mul': 1, 'div': 2, 'add': 3, 'sub': 4}
    while any(pattern.search(expression) for pattern in patterns.values()):
        print("pattern", any(pattern.search(expression) for pattern in patterns.values()))
        math_oprtrs = sort_operations(get_operations(expression))
        print("math_operator",math_oprtrs)
        for key in math_oprtrs:
            print(expression,"pattern_for",patterns[key]) # Change the order here
            expression = patterns[key].sub(replace, expression)
   
    return expression
 
# Create a sample DataFrame
data = {'expression': ['div(mul(mul(div(baseline_year,transaction_year),mstr),spnd_val),1000)']}
df = pd.DataFrame(data)
 
# Apply the replace_operations function to the 'expression' column
df['updated_expression'] = df['expression'].apply(replace_operations)
 
display(df)

Issues

While I am going to generalized it for more than one then I am getting wrong output and also sometime it is not following the BODMASS rule.

Could you please check and look into that?

I have been tried to write code in python and it is working fine if we are passing two argument in a string instruction and when i am passed the more than two argument then I am getting wrong output.

I am working on a solution to convert the mathematical instruction given in string statement to mathematical formula. In this approach, one column will contain the given string instructions and in second column would contain the mathematical formula. e.g :div(mul(mstr,mul(div(baseline_year,transaction_yr),spnd_val)),1000) -->((mstr*((baseline_year*transaction_yr)/spnd_val))/1000)

To achieve the above output I am using below python code.

import re
import pandas as pd
 
def get_operations(calculation):
    # Use regular expression to find all occurrences of add, mul, div, sub
    operations = re.findall(r'\b(add|mul|div|sub)\b', calculation)
    return operations
 
def sort_operations(operations):
    priority = {'mul': 1, 'div': 2, 'add': 3, 'sub': 4}
    return sorted(operations, key=lambda op: priority[op])
# function to generate the column expression and update the new column name
def replace_operations(expression):
    # Required mathematical operations and theirs corresponding regex
    patterns = {
        'mul': re.compile(r'mul\(([^,]+),([^)]+)\)'),
        'div': re.compile(r'div\(([^,]+),([^)]+)\)'),
        'add': re.compile(r'add\(([^,]+),([^)]+)\)'),
        'sub': re.compile(r'sub\(([^,]+),([^)]+)\)')
    }
   
    # replace function will help to replace the matched expression with the corresponding mathematical operation
    def replace(match):
        op = match.group(0)
        if 'mul' in op:
            return f"({match.group(1)}*{match.group(2)})"
        elif 'div' in op:
            return f"({match.group(1)}/{match.group(2)})"
        elif 'add' in op:
            return f"({match.group(1)}+{match.group(2)})"
        elif 'sub' in op:
            return f"({match.group(1)}-{match.group(2)})"
   
    # Apply patterns in BODMAS order
    priority = {'mul': 1, 'div': 2, 'add': 3, 'sub': 4}
    while any(pattern.search(expression) for pattern in patterns.values()):
        print("pattern", any(pattern.search(expression) for pattern in patterns.values()))
        math_oprtrs = sort_operations(get_operations(expression))
        print("math_operator",math_oprtrs)
        for key in math_oprtrs:
            print(expression,"pattern_for",patterns[key]) # Change the order here
            expression = patterns[key].sub(replace, expression)
   
    return expression
 
# Create a sample DataFrame
data = {'expression': ['div(mul(mul(div(baseline_year,transaction_year),mstr),spnd_val),1000)']}
df = pd.DataFrame(data)
 
# Apply the replace_operations function to the 'expression' column
df['updated_expression'] = df['expression'].apply(replace_operations)
 
display(df)

Issues

While I am going to generalized it for more than one then I am getting wrong output and also sometime it is not following the BODMASS rule.

Could you please check and look into that?

I have been tried to write code in python and it is working fine if we are passing two argument in a string instruction and when i am passed the more than two argument then I am getting wrong output.

Share Improve this question edited 2 days ago Steven 15.2k7 gold badges46 silver badges78 bronze badges asked Jan 9 at 6:04 user8423971 32 bronze badges New contributor user8423971 is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

The collection of valid string statements does not constitute a regular language. You will be better off writing a parser (e.g. a recursive descent parser, LALR(1) parser, etc.). – Booboo Commented 2 days ago

Add a comment |

2 Answers 2

Sorted by: Reset to default 0

I propose to do it recursively. Function parse_args gets the inner string from e.g. mul(inner string) and finds recursively all arguments. Function parse_expression recursively gets the args and formats the string.

Solution:

import re

operation_q = re.compile(r"^(?P<operation>(mul|div|add|sub))\((?P<args>.*)\)$")

mapping = {
    "mul": "*",
    "div": "/",
    "add": "+",
    "sub": "-",
}

def parse_args(args_string):
    """returns first and second argument from args string"""
    parentheses_count=0
    for i, character in enumerate(args_string):
        if character == "(":
            parentheses_count += 1
        elif character == ")":
            parentheses_count -= 1
            if parentheses_count == 0:
                # found matching parenthesis
                return [args_string[:i+1].strip()] + parse_args(args_string[i+2:])
        elif character == "," and parentheses_count == 0:
            # no nested operations
            return [args_string[:i].strip()] +  parse_args(args_string[i + 1:])
    return [args_string.strip()] if args_string else []


def parse_expression(exp):
    """recursively parse operation and its arguments"""
    m = operation_q.match(exp)
    if not m:
        # not an expression
        return exp
    groups = m.groupdict()
    operator = f""" {mapping[groups["operation"]]} """
    args = parse_args(groups["args"])
    return f"({operator.join([parse_expression(arg) for arg in args])})"

print(parse_expression("div(mul(mul(div(baseline_year,transaction_year),mstr),spnd_val),1000)"))
# >> ((((baseline_year / transaction_year) * mstr) * spnd_val) / 1000)
print(parse_expression("add(sub(o,p,q),div(p,q,r),mul(c,d,e))"))
# >> ((o - p - q) + (p / q / r) + (c * d * e))

It does not do anything with the order of the operations though. That would be another exercise.

I know I am coming late to the game, but ...

As I mentioned in a comment, the best approach is to use a parser with a simple lexical analyzer. Here the parser uses recursive descent to build a parse tree from which the final translation is produced avoiding unnecessary parentheses.

"""
goal -> expression EOF
expression -> op ( arg {, arg}+ )
arg -> expression | id | number
op -> mul | div | add | sub
"""

import re
import collections

Token = collections.namedtuple('Token', ['name', 'token_number', 'value'])

# Token numbers:
WHITESPACE = -1 # Not an actual token that gets passed to the parser
EOF = 0
MUL = 1
DIV = 2
ADD = 3
SUB = 4
ID = 5
NUMBER = 6
LPAREN = 7
RPAREN = 8
COMMA = 9
ERROR = 10

class Lexer:
    tokens = (
        ('WHITESPACE', WHITESPACE, r'[ \t\n]'),
        ('MUL', MUL, r'mul'),
        ('DIV', DIV, r'div'),
        ('ADD', ADD, r'add'),
        ('SUB', SUB, r'sub'),
        ('ID', ID, r'[A-Za-z_]([A-Za-z0-9_])*'),
        ('NUMBER', NUMBER, r'(\.\d+|\d+(\.\d*)?)'),
        ('LPAREN', LPAREN, r'\('),
        ('RPAREN', RPAREN, r'\)'),
        ('COMMA', COMMA, r','),
        ('EOF', EOF, r'\Z'),
        # must be the last token and matches anything the prior expressions do not match:
        ('ERROR', ERROR, r'.')
    )

    regex = re.compile('|'.join('(?P<%s>%s)' % (token[0], token[2]) for token in tokens))
    token_numbers = {token[0]: token[1] for token in tokens}

    def __init__ (self, text):

        def generate_tokens(text):
            scanner = Lexer.regex.finditer(text)
            for m in scanner:
                token_name = m.lastgroup
                token_number = Lexer.token_numbers[m.lastgroup]
                token_value = m.group()
                if token_number == ERROR:
                    raise RuntimeError('Invalid input:')
                if token_number == WHITESPACE:
                    continue # don't generate a token
                yield Token(token_name, token_number, token_value)

        self._token_generator = generate_tokens(text)

    def next_token(self):
        return self._token_generator.__next__()

class Node:
    def __init__(self, token_number, value):
        self.token_number = token_number
        self.value = value
        self.children = []

class Parser:
    def __init__(self, text):
        self.lexer = Lexer(text)

    def next_token(self):
        self.token = self.lexer.next_token()

    def syntax_error(self):
        raise RuntimeError(f'Unexpected input: {self.token.value}')

    def parse(self):
        try:
            self.next_token()
            self.tree = self.goal()
        except Exception as e:
            print(e)

    def goal(self):
        self.tree = self.expression()
        self.next_token()
        if self.token.token_number != EOF:
            self.syntax_error()

    def expression(self):
        if self.token.token_number not in (MUL, DIV, ADD, SUB):
            self.syntax_error()
        node = Node(self.token.token_number, self.token.value)
        self.next_token()
        if self.token.token_number != LPAREN:
            self.syntax_error()
        self.next_token()
        node.children.append(self.arg())
        if self.token.token_number != COMMA:
            self.syntax_error()
        self.next_token()
        node.children.append(self.arg())
        while self.token.token_number == COMMA:
            self.next_token()
            node.children.append(self.arg())
        if self.token.token_number != RPAREN:
            self.syntax_error()
        self.next_token()
        return node

    def arg(self):
        if self.token.token_number in (MUL, DIV, ADD, SUB):
            return self.expression()
        if self.token.token_number in (ID, NUMBER):
            node = Node(self.token.token_number, self.token.value)
            self.next_token()
            return node
        self.syntax_error()

    def output_tree(self):
        def process_node(node, needs_paren):
            if node.token_number in (ID, NUMBER):
                return node.value

            str_node = []

            # Might need parentheses around + and -:
            if node.token_number == ADD:
                if needs_paren:
                    str_node.append('(')
                str_node.append(process_node(node.children[0], False))
                for child in node.children[1:]:
                    str_node.append('+')
                    str_node.append(process_node(child, False))
                if needs_paren:
                    str_node.append(')')
            elif node.token_number == SUB:
                if needs_paren:
                    str_node.append('(')
                str_node.append(process_node(node.children[0], False))
                for child in node.children[1:]:
                    str_node.append('-')
                    str_node.append(process_node(child, True))
                if needs_paren:
                    str_node.append(')')
            elif node.token_number in (MUL, DIV):
                str_node.append(process_node(node.children[0], True))
                op = '*' if node.token_number == MUL else '/'
                for child in node.children[1:]:
                    str_node.append(op)
                    str_node.append(process_node(child, True))
            else:
                raise RuntimeError(f'Unexpect node: {node.token_number}, {node.token_value!r}')

            return ''.join(str_node)

        return process_node(self.tree, False)

expressions = [
    'mul(add(x, sub(y, 3)), 5)',
    'add(mul(x, div(y, 3)), 5)',
    'sub(5, add(x, div(y, 3)))',
    'sub(5, mul(x, div(y, 3)))',
    'sub(add(x, div(y, 3)), 5)',
    'sub(1,2,3,4)',
    'div(mul(mstr,mul(div(baseline_year,transaction_yr),spnd_val)),1000)'
]

for expression in expressions:
    parser = Parser(expression)
    parser.parse()
    print(expression)
    print(parser.output_tree())
    print()

Prints:

mul(add(x, sub(y, 3)), 5)
(x+y-3)*5


add(mul(x, div(y, 3)), 5)
x*y/3+5


sub(5, add(x, div(y, 3)))
5-(x+y/3)


sub(5, mul(x, div(y, 3)))
5-x*y/3


sub(add(x, div(y, 3)), 5)
x+y/3-5


sub(1,2,3,4)
1-2-3-4


div(mul(mstr,mul(div(baseline_year,transaction_yr),spnd_val)),1000)
mstr*baseline_year/transaction_yr*spnd_val/1000

本文标签： regexCustom String statement to mathematical operation PYTHONStack Overflow

版权声明：本文标题：regex - Custom String statement to mathematical operation [PYTHON] - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736669781a1946856.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

regex - Custom String statement to mathematical operation [PYTHON] - Stack Overflow

Issues

Issues

2 Answers 2

更多相关文章

regex - Custom String statement to mathematical operation [PYTHON] - Stack Overflow

发表评论

推荐文章

How to upload SVG in WordPress 4.9.8?

c# - How to get a custom DataLoader inside a code-first field resolver? - Stack Overflow

python - Methods to reduce a Tensor embedding to x,y,z coordinates - Stack Overflow

linux - Do all fragments of an IP packet greater than MTU carry the full PPPoE header when modified in an eBPF tc program? - Sta

Allow editor user to full permission to access plugin settings

热门文章

python - How to modify YOLOv9 to handle additional parameters in ground truth? - Stack Overflow

html - css margin-left vs margin-right behavior - Stack Overflow

verilog - Output is X when 1, but correct as zero - Stack Overflow

aws cli - How to associate a Trust store to an ALB with AWS CLI - Stack Overflow

zabbix - SNMP Trap not logging remote traps - Stack Overflow

php - Adding One Array Set with another to insert in database table - Stack Overflow

fullcalendar - use FontAwesome with full calendar - Stack Overflow

pointers - Splitting a MemoryMappedFile into 2 chunks of data in C# - Stack Overflow

plugins - Customize WP Filter Hook

Installing crossplane package from private gitlab repo using deploy token - Stack Overflow

最新文章

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

华硕笔记本电脑用U盘重装windows系统

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

如何一键安装win7系统(一键安装win7系统步骤)

Windows 11最稳定版本详解

vue.js - How do I get the page to render via the slug set as a frontmatter property in Nuxt Content? - Stack Overflow

javascript - document.getElementById vs jQuery $() - Stack Overflow

Fix count of lines of excerpt block (or any block)

javascript - Instagram scroll through comments Selenium Python - Stack Overflow

javascript - How do I split a string, breaking at a particular character? - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价