admin管理员组

文章数量:1287060

I have a text field in my table where SOME of the data ends with one or more characters in parentheses. However, the number of characters is not consistent, the spacing is not consistent, and the number of instances is not consistent. **

Some samples:

Teacher 1 (L)
Teacher 1 (L) (H)
Secretary 5 (MN
Assistant 6(L)
Principal 9 (XYZ)(A)
Custodian 12
Bus Driver 16 (ABC DEF)
Music (TT) 18

No, that's not a typo in Secretary 5, I have multiple jobs where the trailing parenthesis is missing from the name. It's also not a typo in Assistant 6, not all of them have a space between the end of the job name and the parenthesis.

SO FAR, I have not found any jobs that have information in parentheses in the MIDDLE of the name, but I wouldn't put anything past them. The hypothetical for that would be Music (TT) 18, in which case I need to turn that into Music 18.

So, here's my question - I need to remove: all characters that are ENCLOSED in parentheses, all characters that are after an open parenthesis if it's at the end of the line - like in Secretary 5, all parentheses, all multiple spaces changed to one space, and remove all trailing spaces which is currently anywhere from one to three of them after all the removals.

My final desired output based on the above:
Teacher 1
Secretary 5
Assistant 6
Principal 9
Custodian 12
Bus Driver 16
Music 18

I have successfully done a few of the above with reg ex, but right now it's a MESS and is multiple nested expressions that don't catch everything.

**NOTE: I can not change anything about this data. I have no control at all over what I am given, or the total lack of consistency in what is being appended to it. I have to take what I get and then deal with it. So please don't ask me why I'm getting data that looks like this. I am well aware that it makes no sense.

I have a text field in my table where SOME of the data ends with one or more characters in parentheses. However, the number of characters is not consistent, the spacing is not consistent, and the number of instances is not consistent. **

Some samples:

Teacher 1 (L)
Teacher 1 (L) (H)
Secretary 5 (MN
Assistant 6(L)
Principal 9 (XYZ)(A)
Custodian 12
Bus Driver 16 (ABC DEF)
Music (TT) 18

No, that's not a typo in Secretary 5, I have multiple jobs where the trailing parenthesis is missing from the name. It's also not a typo in Assistant 6, not all of them have a space between the end of the job name and the parenthesis.

SO FAR, I have not found any jobs that have information in parentheses in the MIDDLE of the name, but I wouldn't put anything past them. The hypothetical for that would be Music (TT) 18, in which case I need to turn that into Music 18.

So, here's my question - I need to remove: all characters that are ENCLOSED in parentheses, all characters that are after an open parenthesis if it's at the end of the line - like in Secretary 5, all parentheses, all multiple spaces changed to one space, and remove all trailing spaces which is currently anywhere from one to three of them after all the removals.

My final desired output based on the above:
Teacher 1
Secretary 5
Assistant 6
Principal 9
Custodian 12
Bus Driver 16
Music 18

I have successfully done a few of the above with reg ex, but right now it's a MESS and is multiple nested expressions that don't catch everything.

**NOTE: I can not change anything about this data. I have no control at all over what I am given, or the total lack of consistency in what is being appended to it. I have to take what I get and then deal with it. So please don't ask me why I'm getting data that looks like this. I am well aware that it makes no sense.

Share Improve this question edited Jan 23 at 21:32 Barmar 781k56 gold badges545 silver badges659 bronze badges asked Jan 23 at 21:17 Michelle HMichelle H 455 bronze badges 1
  • This looks difficult to do with a regexp. Can you do it outside the database? – Barmar Commented Jan 23 at 21:32
Add a comment  | 

1 Answer 1

Reset to default 2

I need to remove: all characters that are ENCLOSED in parentheses, all characters that are after an open parenthesis if it's at the end of the line - like in Secretary 5, all parentheses, all multiple spaces changed to one space, and remove all trailing spaces which is currently anywhere from one to three of them after all the removals.

You can use two passes of REGEXP_REPLACE:

  • First remove characters from an opening round bracket to either a closing round bracket (or the end of the string), inclusive.
  • Second to remove leading or trailing whitespace or to replace multiple whitespace characters with a single space.

Like this:

SELECT value,
       REGEXP_REPLACE(
         REGEXP_REPLACE(
           value,
           '\(.*?(\)|$)'
         ),
         '^\s+|\s+$|(\s){2,}',
         '\1'
       ) AS replaced_value
FROM   table_name

Which, for the sample data:

CREATE TABLE table_name (value) AS
SELECT 'Teacher 1 (L)' FROM DUAL UNION ALL
SELECT 'Teacher 1 (L) (H)' FROM DUAL UNION ALL
SELECT 'Secretary 5 (MN' FROM DUAL UNION ALL
SELECT 'Assistant 6(L)' FROM DUAL UNION ALL
SELECT 'Principal 9 (XYZ)(A)' FROM DUAL UNION ALL
SELECT 'Custodian 12' FROM DUAL UNION ALL
SELECT 'Bus Driver 16 (ABC DEF)' FROM DUAL UNION ALL
SELECT 'Music (TT) 18' FROM DUAL;

Outputs:

VALUE REPLACED_VALUE
Teacher 1 (L) Teacher 1
Teacher 1 (L) (H) Teacher 1
Secretary 5 (MN Secretary 5
Assistant 6(L) Assistant 6
Principal 9 (XYZ)(A) Principal 9
Custodian 12 Custodian 12
Bus Driver 16 (ABC DEF) Bus Driver 16
Music (TT) 18 Music 18

fiddle

本文标签: regexOracle SQL remove characters and spaces from end of fieldStack Overflow