admin管理员组

文章数量:1122846

I’m working on a Python script to extract emails, dates, and phone numbers from a text file and save them into a table or CSV file without regex. While I'm able to extract the data successfully, I'm facing an issue where the extracted data isn't aligned correctly.

Minimal Reproducible Example:

Input File (text):

Hello, you can reach out to us at [email protected] or [email protected].  
Our customer service is available 24/7. Call us at (123) 456-7890 or 987-654-3210.  
Important Dates:  
- Application Deadline: 12-08-2024  
- Event Date: 2024/11/20

Issue: When extracting the data, the emails and dates are extracted fine, but the phone numbers don't align with their respective entries in the table. Specifically, the second row contains a date, but no phone number is assigned to it, even though the phone number appears earlier in the text.

What I've Tried: I wrote functions to extract emails, dates, and phone numbers using basic string methods. I also tried aligning the extracted data, but some phone numbers are not properly assigned to the correct rows.

Code Snippet:

def extract_phone_numbers(text):  
    phone_numbers = []  
    for word in text.split():  
        clean_word = word.strip(",.()")  
        if clean_word.isdigit() and len(clean_word) in [10, 11]:  
            phone_numbers.append(clean_word)  
        elif "-" in clean_word:  
            parts = clean_word.split("-")  
            if all(part.isdigit() for part in parts) and len(clean_word.replace("-", "")) in [10, 11]:  
                phone_numbers.append(clean_word)  
        elif clean_word.startswith("(") and ")" in clean_word:  
            phone_numbers.append(clean_word)  
    return phone_numbers  

Current Output :

+---------------------+------------+-----------------+  
|   Email Address     |    Date    |  Phone Number   |  
+---------------------+------------+-----------------+  
| [email protected]  | 12-08-2024 | 987-654-3210    |  
| [email protected]  | 2024/11/20 |                 |  
+---------------------+------------+-----------------+  

And expected is:

+---------------------+------------+-----------------+  
|   Email Address     |    Date    |  Phone Number   |  
+---------------------+------------+-----------------+  
| [email protected]  | 12-08-2024 | (123) 456-7890  |  
| [email protected]  | 2024/11/20 | 987-654-3210    |  

+---------------------+------------+-----------------+  

本文标签: