admin管理员组

文章数量:1316018

I have the following string stored in a TEXT datatype which I want to extract the values for

Date:  
Queue:  
File Name: 

and return them in their own columns.

STRING:

If you are able to, please correct the issue and resubmit the file.        

Date: 10/8/2024  
Queue: ENTRY 
File Name: TEST_FILE.PDF

Columns:

Date           Queue          File Name
-------------------------------------------
10/8/2024      ENTRY          TEST_FILE.PDF

I have come up with the following code but have been unable to exclude additional information that comes back.

I get the following data returned:

Date                      Queue                  File Name
--------------------------------------------------------------
10/8/2024    Queue:       ENTRY    File Na       TEST_FILE.PDF
SELECT
    SUBSTRING(CAST(em.body AS NVARCHAR(300)), 
              CHARINDEX('Date:', CAST(em.body AS NVARCHAR(300))) + 6, 
              (CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))) - CHARINDEX('Date:', CAST(em.body AS NVARCHAR(300))))) 'Date',
    SUBSTRING(CAST(em.body AS NVARCHAR(300)), 
              CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))) + 7, 
              (CHARINDEX('File Name:', CAST(em.body AS NVARCHAR(300))) - CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))))) 'Queue',
    RIGHT(CAST(em.body AS NVARCHAR(300)), (LEN(CAST(em.body AS NVARCHAR(300))) - 10) - CHARINDEX('File Name:', CAST(em.body AS NVARCHAR(300)))) 'File Name'
FROM
    email em WITH(NOLOCK)

I know I need to decrease the length value for the SUBSTRING calls, but no matter where I put in a value to decrease them, I get the following error:

Msg 537, Level 16, State 3, Line 2
Invalid length parameter passed to the LEFT or SUBSTRING function

I have the following string stored in a TEXT datatype which I want to extract the values for

Date:  
Queue:  
File Name: 

and return them in their own columns.

STRING:

If you are able to, please correct the issue and resubmit the file.        

Date: 10/8/2024  
Queue: ENTRY 
File Name: TEST_FILE.PDF

Columns:

Date           Queue          File Name
-------------------------------------------
10/8/2024      ENTRY          TEST_FILE.PDF

I have come up with the following code but have been unable to exclude additional information that comes back.

I get the following data returned:

Date                      Queue                  File Name
--------------------------------------------------------------
10/8/2024    Queue:       ENTRY    File Na       TEST_FILE.PDF
SELECT
    SUBSTRING(CAST(em.body AS NVARCHAR(300)), 
              CHARINDEX('Date:', CAST(em.body AS NVARCHAR(300))) + 6, 
              (CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))) - CHARINDEX('Date:', CAST(em.body AS NVARCHAR(300))))) 'Date',
    SUBSTRING(CAST(em.body AS NVARCHAR(300)), 
              CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))) + 7, 
              (CHARINDEX('File Name:', CAST(em.body AS NVARCHAR(300))) - CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))))) 'Queue',
    RIGHT(CAST(em.body AS NVARCHAR(300)), (LEN(CAST(em.body AS NVARCHAR(300))) - 10) - CHARINDEX('File Name:', CAST(em.body AS NVARCHAR(300)))) 'File Name'
FROM
    email em WITH(NOLOCK)

I know I need to decrease the length value for the SUBSTRING calls, but no matter where I put in a value to decrease them, I get the following error:

Msg 537, Level 16, State 3, Line 2
Invalid length parameter passed to the LEFT or SUBSTRING function

Share Improve this question edited Jan 30 at 4:45 Dale K 27.5k15 gold badges58 silver badges83 bronze badges asked Jan 30 at 4:33 OfficerSpockOfficerSpock 31 silver badge1 bronze badge 3
  • 4 I really recommend fixing your design here. These values should be in their own strongly typed column, not a text column (which has been deprecated for 20 years). – Thom A Commented Jan 30 at 8:53
  • While asking a question, you need to provide a minimal reproducible example: (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT T-SQL statements. (2) What you need to do, i.e. logic and your code attempt implementation of it in T-SQL. (3) Desired output, based on the sample data in the #1 above. (4) Your SQL Server version (SELECT @@version;). – Yitzhak Khabinsky Commented Jan 30 at 13:01
  • Duly noted for any future posts! Can't do much about the TEXT datatype, it's in a 3rd party database. – OfficerSpock Commented Jan 30 at 19:29
Add a comment  | 

3 Answers 3

Reset to default 1

Update for @DaleK observations.

This should parse single or multiple entries within a text block, even those which have extra text. Note the char(10) are replaced with a space to ensure a proper delimiter.

Example or dbFiddle

Declare @YourTable Table (id int,[SomeCol] varchar(max))  Insert Into @YourTable Values 
 (1,'This is a long sentence.
  
  Date: 10/8/2024 
  Queue: ENTRY 
  File Name: TEST_FILE.PDF 

  Date: 11/10/2024 
  Queue: ENTRY 
  File Name: SomeOFileName.PDF

  '),
 (2,'
  Date: 11/9/2024 
  Queue: ENTRY 
  File Name: OtherFileName.PDF
 ');


with cte as (
Select ID
      ,B.* 
      ,LV = lead(value,1) over (partition by ID order by try_convert(int,[key]))
      ,Grp= sum(case when value='Date:' then 1 else 0 end) over (partition by ID order by try_convert(int,[key]))
 From @YourTable A
 Cross Apply OpenJSON ('["'+replace(string_escape(replace([SomeCol],char(10),' '),'json'),' ','","')+'"]')    B
)
Select ID
      ,Date  = max( case when value='Date:'  then LV end )
      ,Queue = max( case when value='Queue:' then LV end )
      ,FName = max( case when value='Name:'  then LV end )
 From  cte
 Where Grp>0
 Group By ID,Grp
 Order By ID,Grp

Results

ID  Date        Queue   FName
1   10/8/2024   ENTRY   TEST_FILE.PDF
1   11/10/2024  ENTRY   SomeOFileName.PDF
2   11/9/2024   ENTRY   OtherFileName.PDF

If you find yourself with no choice but to do messy string extractions then the trick is to methodically build up your logic testing each bit at a time.

Personally I like to use a DRY approach, even though its not typical for SQL, because it reduces the chance of the mistakes which can occur when you repeat logic. This can be done with use of the CROSS APPLY operator.

This is far from the most concise, but is easier (IMO) to build and maintain.

You can see that all I define are the 3 identification strings specified and everything else is derived from that.

CREATE TABLE Email (Body TEXT);
INSERT INTO Email (Body)
VALUES
('If you are able to, please correct the issue and resubmit the file.        

Date: 10/8/2024  
Queue: ENTRY 
File Name: TEST_FILE.PDF');

SELECT
  -- Extract the string segments we require
  SUBSTRING(em.body, c3.DateEndIdx, c3.QueueStartIdx - c3.DateEndIdx) [Date]
  , SUBSTRING(em.body, c3.QueueEndIdx, c3.FileNameStartIdx - c3.QueueEndIdx) Queue
  , SUBSTRING(em.body, c3.FileNameEndIdx, c3.EndOfText - c3.FileNameEndIdx) FileName
FROM (
  -- Convert to VARCHAR in order to use all string functions
  SELECT CONVERT(VARCHAR(MAX), Body) Body
  FROM Email
) em
-- Capture the strings we are trying to find
CROSS APPLY (
  VALUES (
    'Date:'
    , 'Queue:'
    , 'File Name:'
  )
) c1 (DateLabel, QueueLabel, FileNameLabel)
-- Find the starts and ends of the strings we are trying to find
CROSS APPLY (
  VALUES (
    CHARINDEX(c1.DateLabel, em.body)
    , CHARINDEX(c1.QueueLabel, em.body)
    , CHARINDEX(c1.FileNameLabel, em.body)
  )
) c2 (DateIdx, QueueIdx, FileNameIdx)
CROSS APPLY (
  VALUES (
    c2.DateIdx
    , c2.DateIdx + LEN(c1.DateLabel)
    , c2.QueueIdx
    , c2.QueueIdx + LEN(c1.QueueLabel)
    , c2.FileNameIdx
    , c2.FileNameIdx + LEN(c1.FileNameLabel)
    , len(em.body) + 1
  )
) c3 (DateStartIdx, DateEndIdx, QueueStartIdx, QueueEndIdx, FileNameStartIdx, FileNameEndIdx, EndOfText);
Date Queue FileName
10/8/2024
ENTRY
TEST_FILE.PDF

db<>fiddle

You can try content parsing with substring and char string. If required apply LTRIM / RTRIM based on requirement.

DECLARE @data NVARCHAR(100) = 'Date: 10/8/2024 Queue: ENTRY File Name: TEST_FILE.PDF';

SELECT 
    (SUBSTRING(@data, CHARINDEX('Date: ', @data) + 6, CHARINDEX('Queue:', @data) - CHARINDEX('Date: ', @data) - 6)) AS 'Date',
    (SUBSTRING(@data, CHARINDEX('Queue: ', @data) + 7, CHARINDEX('File Name:', @data) - CHARINDEX('Queue:', @data) - 7)) AS 'Queue',
    (SUBSTRING(@data, CHARINDEX('File Name: ', @data) + 10, LEN(@data))) AS 'File_Name'

本文标签: sql serverParse TEXT datatype column to grab values based on TAGSStack Overflow