admin管理员组

文章数量:1406949

I have been trying to write a tool to look at Word documents in a specified path for broken links.

I gave up on having it search a folder, thinking I just need to get it to do a document first. With my limited skills, reading, and trying some suggestions from copilot. (Breaking it down to individual tasks) I put this together:

import docx
import os
import url

doc = docx.Document('C:/Users/demo.docx')
allText = []

def find_hyperlinks(doc):
    hyperlinks = []
    rels = doc.part.rels
    for rel in rels:
        if "hyperlink" in rels[rel].target_ref:
            hyperlinks.append(rels[rel].target_ref)
    return hyperlinks

def find_broken_links_in_docx(doc):
    broken_links = []
    hyperlinks = find_hyperlinks(doc)
    for url in hyperlinks:
        try:
            response = requests.head(url, allow_redirects=True)
            if response.status_code >= 400:
                broken_links.append(url)
        except requests.RequestException:
            broken_links.append(url)
    return broken_links

def write_report(report, output_file):
    with open(output_file, 'w') as f:
        for file_path, links in report.items():
            f.write(f"File: {file_path}\n")
            for link in links:
                f.write(f"  Broken link: {link}\n")
            f.write("\n")

if __name__ == "__main__":
    
    output_file = "C:/Results/broken_links_report.txt"
    report = find_broken_links_in_docx(doc)
    write_report(report, output_file)
    print(f"Report written to {output_file}")

Here is the error:

 File "c:\Users\Scripts\playground\openinganddocx.py", line 41, in <module>
    write_report(report, output_file)
  File "c:\Users\Scripts\playground\openinganddocx.py", line 31, in write_report
    for file_path, links in report.items():
    AttributeError: 'list' object has no attribute 'items'

For reference:

  • Line 31 f.write(f"File: {file_path}\n")

  • Line 41 print(f"Report written to {output_file}")

I have been trying to write a tool to look at Word documents in a specified path for broken links.

I gave up on having it search a folder, thinking I just need to get it to do a document first. With my limited skills, reading, and trying some suggestions from copilot. (Breaking it down to individual tasks) I put this together:

import docx
import os
import url

doc = docx.Document('C:/Users/demo.docx')
allText = []

def find_hyperlinks(doc):
    hyperlinks = []
    rels = doc.part.rels
    for rel in rels:
        if "hyperlink" in rels[rel].target_ref:
            hyperlinks.append(rels[rel].target_ref)
    return hyperlinks

def find_broken_links_in_docx(doc):
    broken_links = []
    hyperlinks = find_hyperlinks(doc)
    for url in hyperlinks:
        try:
            response = requests.head(url, allow_redirects=True)
            if response.status_code >= 400:
                broken_links.append(url)
        except requests.RequestException:
            broken_links.append(url)
    return broken_links

def write_report(report, output_file):
    with open(output_file, 'w') as f:
        for file_path, links in report.items():
            f.write(f"File: {file_path}\n")
            for link in links:
                f.write(f"  Broken link: {link}\n")
            f.write("\n")

if __name__ == "__main__":
    
    output_file = "C:/Results/broken_links_report.txt"
    report = find_broken_links_in_docx(doc)
    write_report(report, output_file)
    print(f"Report written to {output_file}")

Here is the error:

 File "c:\Users\Scripts\playground\openinganddocx.py", line 41, in <module>
    write_report(report, output_file)
  File "c:\Users\Scripts\playground\openinganddocx.py", line 31, in write_report
    for file_path, links in report.items():
    AttributeError: 'list' object has no attribute 'items'

For reference:

  • Line 31 f.write(f"File: {file_path}\n")

  • Line 41 print(f"Report written to {output_file}")

Share edited Mar 6 at 22:27 Ryan M 20.3k34 gold badges73 silver badges82 bronze badges asked Mar 6 at 21:31 Travis WebbTravis Webb 32 bronze badges 4
  • Can you include the entire traceback? There's critical error information missing – C.Nivs Commented Mar 6 at 21:35
  • report is a list which does not support .items(). I'm assuming the error message says AttributeError: 'list' object has no attribute 'items' – C.Nivs Commented Mar 6 at 21:38
  • It does! Copy paste error on my part. I am not good at this at all. I am pretty sure you are right. I am 100% sure a list would be generated by the find broken hyperlinks to add to the output file. But I am not sure how to capture that. I tried searching and asking copilot, and I am at a loss. I am trying to punch way above my weight class here. – Travis Webb Commented Mar 6 at 21:45
  • We all start somewhere. It looks like what you really want is a dictionary of type {str: list}. Tackle the problem with that in mind. The keys to your dict are filepaths, the values are the urls per file. Don't worry about writing the file just yet, parse the links out and see if you can fill out your data structure correctly. I'll add an answer, but I won't solve the whole problem – C.Nivs Commented Mar 6 at 21:49
Add a comment  | 

1 Answer 1

Reset to default 0

The issue is you are treating a list (report) like a dict, which it is not. That's why you are getting an AttributeError: list has no attribute 'items'.

What you want is a dictionary that has the structure {'filepath': [<urls>]}. So start there:

def find_hyperlinks(doc_path: str):
    doc = docx.Document(doc_path)
    hyperlinks = []
    rels = doc.part.rels
    for rel in rels:
        if "hyperlink" in rels[rel].target_ref:
            hyperlinks.append(rels[rel].target_ref)

    # here is an example of how I might return that value
    return {doc_path: hyperlinks}

# from here, prune the hyperlinks that work
broken_links = {}

for doc_path, links in links_dict.items():
    broken = []
    for link in links:
        if link_works(link):
           continue

        broken.append(link)
    broken_links[doc_path] = broken

# etc

This isn't perfect, but will get you on the path to success

本文标签: pythonAttributeError 39list39 object has no attribute 39items39Stack Overflow