admin管理员组文章数量:1389853
Need to separate the following long string using oracle regexp -
Row1 has following value- 'van dam is brother of Prince Charles(12345). Mathew Perker is son of Prince Charles(12345).'
Row2- 'Madam Currie is grandmother of Albert Eistine(56789). Pieer Currie is grandfather of Albert Eistine(56789). CV Raman is friend of Albert Eistine(56789).'
Now by split I need the following separate string -
From Row1 - 'van dam is brother of Prince Charles(12345).' 'Mathew Perker is son of Prince Charles(12345).'
From Row2- 'Madam Currie is grandmother of Albert Eistine(56789).' 'Pieer Currie is grandfather of Albert Eistine(56789).' 'CV Raman is friend of Albert Eistine(56789).'
These separate strings can be presented in separate column. The numbers in brackets are actually ID stored in ID field of the table.
Is it possible to achieve such split using Oracle regexp?
Need to separate the following long string using oracle regexp -
Row1 has following value- 'van dam is brother of Prince Charles(12345). Mathew Perker is son of Prince Charles(12345).'
Row2- 'Madam Currie is grandmother of Albert Eistine(56789). Pieer Currie is grandfather of Albert Eistine(56789). CV Raman is friend of Albert Eistine(56789).'
Now by split I need the following separate string -
From Row1 - 'van dam is brother of Prince Charles(12345).' 'Mathew Perker is son of Prince Charles(12345).'
From Row2- 'Madam Currie is grandmother of Albert Eistine(56789).' 'Pieer Currie is grandfather of Albert Eistine(56789).' 'CV Raman is friend of Albert Eistine(56789).'
These separate strings can be presented in separate column. The numbers in brackets are actually ID stored in ID field of the table.
Is it possible to achieve such split using Oracle regexp?
Share Improve this question edited Mar 13 at 8:14 MT0 169k12 gold badges67 silver badges129 bronze badges asked Mar 13 at 6:36 Kaustav NandyKaustav Nandy 511 bronze badge 3- This question is similar to: Oracle: Connect by Level & regexp_substr. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. – p3consulting Commented Mar 13 at 6:45
- Do you simply want to break the whole string in multiple parts based on period(.)? – Ankit Bajpai Commented Mar 13 at 7:06
- In one word Yes. Not sure if we can also use the ID field as well for each row as ID would be unique for each row and it is present at the end of every part before (.) – Kaustav Nandy Commented Mar 13 at 8:55
4 Answers
Reset to default 2Regular expressions would work, but - on large data sets - string functions (such as combination of substr
and instr
) would perform better. Here's how.
Sample data:
SQL> WITH
2 test (col)
3 AS
4 (SELECT 'van dam is brother of Prince Charles(12345). Mathew Perker is son of Prince Charles(12345).'
5 FROM DUAL
6 UNION ALL
7 SELECT 'Madam Currie is grandmother of Albert Eistine(56789). Pieer Currie is grandfather of Albert Eistine(56789). CV Raman is friend of Albert Eistine(56789).'
8 FROM DUAL)
Query begins here; it splits source value on a dot (.
) character.
9 SELECT trim(substr(col, 1, instr(col, '.', 1, 1))) val_1,
10 --
11 trim(substr(col, instr(col, '.', 1, 1) + 1,
12 instr(col, '.', 1, 2) - instr(col, '.', 1, 1))) val_2,
13 --
14 trim(substr(col, instr(col, '.', 1, 2) + 1,
15 instr(col, '.', 1, 3) - instr(col, '.', 1, 2))) val_3
16 FROM test;
VAL_1 VAL_2 VAL_3
----------------------------------------------------- ------------------------------------------------------ -----------------------------------------------------
van dam is brother of Prince Charles(12345). Mathew Perker is son of Prince Charles(12345).
Madam Currie is grandmother of Albert Eistine(56789). Pieer Currie is grandfather of Albert Eistine(56789). CV Raman is friend of Albert Eistine(56789).
SQL>
You'd add as many val_n
s as necessary.
Can it be dynamic? Not that easy, I think, because you want every value in its own column. If you'd just want to split the source value into separate rows, that would be easy - and regular expressions handle that nicely.
You can use:
SELECT REGEXP_SUBSTR(
column_name,
'(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*',
1,
1
) AS relationship1,
REGEXP_SUBSTR(
column_name,
'(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*',
1,
2
) AS relationship2,
REGEXP_SUBSTR(
column_name,
'(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*',
1,
3
) AS relationship3,
REGEXP_SUBSTR(
column_name,
'(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*',
1,
4
) AS relationship4
FROM table_name
Which, for the sample data:
CREATE TABLE table_name(column_name) AS
SELECT 'van dam is brother of Prince Charles(12345). Mathew Perker is son of Prince Charles(12345).' FROM DUAL UNION ALL
SELECT 'Madam Currie is grandmother of Albert Eistine(56789). Pieer Currie is grandfather of Albert Eistine(56789). CV Raman is friend of Albert Eistine(56789).' FROM DUAL;
Outputs:
RELATIONSHIP1 | RELATIONSHIP2 | RELATIONSHIP3 | RELATIONSHIP4 |
---|---|---|---|
van dam is brother of Prince Charles(12345). | Mathew Perker is son of Prince Charles(12345). | null | null |
Madam Currie is grandmother of Albert Eistine(56789). | Pieer Currie is grandfather of Albert Eistine(56789). | CV Raman is friend of Albert Eistine(56789). | null |
If you want a more detailed breakdown, you can extract the sub-groups from the expression:
SELECT REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 1, NULL, 1) AS from1,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 1, NULL, 2) AS relationship1,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 1, NULL, 3) AS to1,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 1, NULL, 4) AS id1,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 2, NULL, 1) AS from2,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 2, NULL, 2) AS relationship2,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 2, NULL, 3) AS to2,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 2, NULL, 4) AS id2,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 3, NULL, 1) AS from3,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 3, NULL, 2) AS relationship3,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 3, NULL, 3) AS to3,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 3, NULL, 4) AS id3,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 4, NULL, 1) AS from4,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 4, NULL, 2) AS relationship4,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 4, NULL, 3) AS to4,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, 4, NULL, 4) AS id4
FROM table_name
Which outputs:
FROM1 | RELATIONSHIP1 | TO1 | ID1 | FROM2 | RELATIONSHIP2 | TO2 | ID2 | FROM3 | RELATIONSHIP3 | TO3 | ID3 | FROM4 | RELATIONSHIP4 | TO4 | ID4 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
van dam | brother | Prince Charles | 12345 | Mathew Perker | son | Prince Charles | 12345 | null | null | null | null | null | null | null | null |
Madam Currie | grandmother | Albert Eistine | 56789 | Pieer Currie | grandfather | Albert Eistine | 56789 | CV Raman | friend | Albert Eistine | 56789 | null | null | null | null |
If you want it to have a dynamic number of matches then output the data in rows, not columns:
SELECT item,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, item ) AS relationship
FROM table_name
CROSS APPLY (
SELECT LEVEL AS item
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*')
)
or:
SELECT item,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, item, NULL, 1) AS from_name,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, item, NULL, 2) AS relationship,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, item, NULL, 3) AS to_name,
REGEXP_SUBSTR( column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*', 1, item, NULL, 4) AS id
FROM table_name
CROSS APPLY (
SELECT LEVEL AS item
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(column_name, '(.*?) is (.*?) of (.*?)\((\d+)\)\.\s*')
)
Which the latter outputs:
ITEM | FROM_NAME | RELATIONSHIP | TO_NAME | ID |
---|---|---|---|---|
1 | van dam | brother | Prince Charles | 12345 |
2 | Mathew Perker | son | Prince Charles | 12345 |
1 | Madam Currie | grandmother | Albert Eistine | 56789 |
2 | Pieer Currie | grandfather | Albert Eistine | 56789 |
3 | CV Raman | friend | Albert Eistine | 56789 |
fiddle
Here's another way of thinking about it. Use CONNECT BY to traverse the string. Assumption is the substrings you want are always separated by a period-space. Uses a Common Table Expression (CTE) to set up test data. This handles variable amounts of substrings. Since the ending period is consumed when matching, it's added back on in the select. This may cause issues if you have a null row, as it will return just the period.
with tbl(id, data) as (
select 1, 'van dam is brother of Prince Charles(12345). Mathew Perker is son of Prince Charles(12345).'
from dual union all
select 2, 'Madam Currie is grandmother of Albert Eistine(56789). Pieer Currie is grandfather of Albert Eistine(56789). CV Raman is friend of Albert Eistine(56789).'
from dual
)
select id,
regexp_substr(data, '(.*?)(\. |\.$)', 1, level, NULL, 1) || '.' substring
from tbl
connect by level <= regexp_count(data, '\. ')+1
and prior id = id
and prior sys_guid() is not null;
ID SUBSTRING
-- ------------------------------------------------------------
1 van dam is brother of Prince Charles(12345).
1 Mathew Perker is son of Prince Charles(12345).
2 Madam Currie is grandmother of Albert Eistine(56789).
2 Pieer Currie is grandfather of Albert Eistine(56789).
2 CV Raman is friend of Albert Eistine(56789).
5 rows selected.
I thought about this further, and if you were to construct the select into it's own CTE, you could select from that to get more detail if you need it.
with tbl(id, data) as (
select 1, 'van dam is brother of Prince Charles(12345). Mathew Perker is son of Prince Charles(12345).'
from dual union all
select 2, 'Madam Currie is grandmother of Albert Eistine(56789). Pieer Currie is grandfather of Albert Eistine(56789). CV Raman is friend of Albert Eistine(56789).'
from dual
),
tbl_substrings(id, sub_id, substring) as (
select id, level as sub_id,
regexp_substr(data, '(.*?)(\. |\.$)', 1, level, NULL, 1) || '.' substring
from tbl
connect by level <= regexp_count(data, '\. ')+1
and prior id = id
and prior sys_guid() is not null
)
-- Uncomment detail below if needed.
select id, sub_id, substring
--, regexp_replace(substring, '(.*?) is .*$', '\1') rel_person
--, regexp_replace(substring, '.* is (.*?) of .*$', '\1') relation
, regexp_replace(substring, '.* of (.*?)\(.*$', '\1') orig_person
, regexp_replace(substring, '.*\((.*?)\).*$', '\1') orig_person_id
from tbl_substrings
order by id, sub_id;
ID SUB_ID SUBSTRING ORIG_PERSON ORIG_PERSON_ID
-- ------ ------------------------------------------------------- --------------- --------------
1 1 van dam is brother of Prince Charles(12345). Prince Charles 12345
1 2 Mathew Perker is son of Prince Charles(12345). Prince Charles 12345
2 1 Madam Currie is grandmother of Albert Eistine(56789). Albert Eistine 56789
2 2 Pieer Currie is grandfather of Albert Eistine(56789). Albert Eistine 56789
2 3 CV Raman is friend of Albert Eistine(56789). Albert Eistine 56789
5 rows selected.
try
WITH data AS (
SELECT 'van dam is brother of Prince Charles(12345). Mathew Perker is son of Prince Charles(12345).' AS row1,
'Madam Currie is grandmother of Albert Eistine(56789). Pieer Currie is grandfather of Albert Eistine(56789). CV Raman is friend of Albert Eistine(56789).' AS row2
FROM dual
)
SELECT
REGEXP_SUBSTR(row_value, '[^\.]+(\(\d+\))\.', 1, level) AS split_string
FROM (
SELECT row1 AS row_value FROM data
UNION ALL
SELECT row2 FROM data
) t
CONNECT BY REGEXP_SUBSTR(row_value, '[^\.]+(\(\d+\))\.', 1, level) IS NOT NULL
AND PRIOR row_value = row_value
AND PRIOR dbms_random.value IS NOT NULL;
本文标签: sqlString separation with oracle regexpStack Overflow
版权声明:本文标题:sql - String separation with oracle regexp - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744716924a2621457.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论