python - Handling complex nested data structures with recursion – performance issues with deep nesting - Stack Overflow

IT技术

更新时间：2025-01-0811

admin管理员组
文章数量:1122846

I’m working on a Python project where I need to process a nested data structure. The structure consists of lists and dictionaries, and the nesting level can vary from a few levels to potentially hundreds. I need to flatten this data structure into a single list while preserving the values. However, I am facing performance issues when dealing with deep nesting.

Here is the simplified data structure I’m working with:

data = {
    "name": "John",
    "contacts": [
        {
            "type": "email",
            "value": "[email protected]",
        },
        {
            "type": "phone",
            "value": [
                {
                    "country": "US",
                    "number": "123-456-7890"
                },
                {
                    "country": "UK",
                    "number": "987-654-3210"
                }
            ]
        }
    ],
    "address": {
        "city": "New York",
        "postal_code": "10001",
        "coordinates": [
            {
                "lat": 40.7128,
                "lon": -74.0060
            }
        ]
    }
}

I need to create a function that will flatten this structure such that all values are extracted into a single list. The output for the above input would look something like:

["John", "email", "[email protected]", "phone", "123-456-7890", "US", "987-654-3210", "UK", "New York", "10001", 40.7128, -74.0060]

I’ve tried using recursion, but I’m running into issues with handling very deep structures. Here is my initial attempt:

def flatten(data):
    flat_list = []
    
    if isinstance(data, dict):
        for key, value in data.items():
            flat_list.extend(flatten(value))
    elif isinstance(data, list):
        for item in data:
            flat_list.extend(flatten(item))
    else:
        flat_list.append(data)
    
    return flat_list

flattened_data = flatten(data)
print(flattened_data)

This works fine for small and medium-sized structures, but when the nesting gets deeper (hundreds of levels deep), I run into recursion depth issues and performance bottlenecks.

What I’ve Tried:

Increasing the recursion limit with sys.setrecursionlimit() but it only marginally helps and doesn’t fully address the performance concerns.
Optimizing the recursive function by converting it to an iterative approach, but I’m unsure how to manage the recursion manually for deeply nested structures.

Questions:

How can I improve the recursion or refactor this code to handle much deeper structures efficiently?
Is there an iterative way to flatten this data structure without running into recursion depth limitations?
Are there any known libraries or patterns that can handle very deep and complex data structures like this more efficiently

The structure is dynamic and may not always follow the same pattern (dictionaries may not always contain the same keys, lists may not always contain the same types of data), so the function should be as generic as possible.

Here is the simplified data structure I’m working with:

data = {
    "name": "John",
    "contacts": [
        {
            "type": "email",
            "value": "[email protected]",
        },
        {
            "type": "phone",
            "value": [
                {
                    "country": "US",
                    "number": "123-456-7890"
                },
                {
                    "country": "UK",
                    "number": "987-654-3210"
                }
            ]
        }
    ],
    "address": {
        "city": "New York",
        "postal_code": "10001",
        "coordinates": [
            {
                "lat": 40.7128,
                "lon": -74.0060
            }
        ]
    }
}

I need to create a function that will flatten this structure such that all values are extracted into a single list. The output for the above input would look something like:

["John", "email", "[email protected]", "phone", "123-456-7890", "US", "987-654-3210", "UK", "New York", "10001", 40.7128, -74.0060]

I’ve tried using recursion, but I’m running into issues with handling very deep structures. Here is my initial attempt:

def flatten(data):
    flat_list = []
    
    if isinstance(data, dict):
        for key, value in data.items():
            flat_list.extend(flatten(value))
    elif isinstance(data, list):
        for item in data:
            flat_list.extend(flatten(item))
    else:
        flat_list.append(data)
    
    return flat_list

flattened_data = flatten(data)
print(flattened_data)

This works fine for small and medium-sized structures, but when the nesting gets deeper (hundreds of levels deep), I run into recursion depth issues and performance bottlenecks.

What I’ve Tried:

Increasing the recursion limit with sys.setrecursionlimit() but it only marginally helps and doesn’t fully address the performance concerns.
Optimizing the recursive function by converting it to an iterative approach, but I’m unsure how to manage the recursion manually for deeply nested structures.

Questions:

How can I improve the recursion or refactor this code to handle much deeper structures efficiently?
Is there an iterative way to flatten this data structure without running into recursion depth limitations?
Are there any known libraries or patterns that can handle very deep and complex data structures like this more efficiently

Share Improve this question edited yesterday John Kugelman 361k69 gold badges546 silver badges591 bronze badges asked Jan 4 at 17:30 ahmad 91 silver badge6 bronze badges

2 What is the deepest level of nesting you have? Are you sure your input does not have circular references? – trincot Commented Jan 4 at 18:28
1 How do you get that recursive structure in the first place? If you read it from a sequential file, it would probably be simpler and less resource consuming to directly read it into the final flat list. – Serge Ballesta Commented Jan 4 at 18:42
Is your data really nested more than hundreds of levels deep? That seems unlikely. – Jeremy Banks Commented yesterday
1 @Ahmad, how come you edit your question and shout it is not duplicate, but do not answer comments that have been made 2 days ago? If you don't react to comments, nothing good will happen with your question. – trincot Commented yesterday
1 And why do you shout "THERE ARE NO ANSWERS", when there is an answer waiting for you since yesterday? Why is it not an answer to you? – trincot Commented yesterday

| Show 3 more comments

2 Answers 2

Sorted by: Reset to default 1

First of all, if you have nested lists with mostly 2 entries, and dicts with mostly 2 keys, and your nesting has an average depth of about one hundred levels deep, you have a number of values in the order of 2¹⁰⁰, i.e. ~10³⁰. Even if we only count 4 bytes per collected (leaf) value, that represents more volume than today's computers can store, and even more than the whole internet holds at the time of writing.

Either your nesting is not really that deep, but your program suffers from infinite recursion because the data has cyclic references, or your hierarchy is really narrow where the number of long root-to-leaf paths is not that huge.

You could avoid some allocation by using generators instead of collecting all values in a list.

If the size of the call stack is still a problem, even when you have ensured your data does not have cyclic references, then you can always go for an iterative version:

# helper function
def get_iterator(data):
    if isinstance(data, list):
        return iter(data)
    elif isinstance(data, dict):
        return iter(data.values())

def flatten(data):
    stack = [iter([data])]
    while stack:
        try:
            value = next(stack[-1])
            iterator = get_iterator(value)
            if iterator:
                stack.append(iterator)
            else:
                yield value
        except StopIteration:
            stack.pop()

A problem you're running into is that you're repeatedly making recursive calls and copying the results of those recursive calls. You can avoid all that copying:

def flatten(data):
    flat_list = []

    def inner(data):
        if isinstance(data, dict):
            for _, value in data.items():  # for value in data.values()
                inner(value)
        elif isinstance(data, list):
            for item in data:
                inner(item)
        else:
            flat_list.append(data)
            
    inner(data)
    return flat_list

As you build up the results, they are added once to the outer list.

本文标签：

版权声明：本文标题：python - Handling complex nested data structures with recursion – performance issues with deep nesting - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736284464a1927255.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

发表评论

全部评论 0

暂无评论

编程频道|软件玩家 - 软件改变生活！

python - Handling complex nested data structures with recursion – performance issues with deep nesting - Stack Overflow

What I’ve Tried:

Questions:

What I’ve Tried:

Questions:

2 Answers 2

更多相关文章

PC系统安装&amp;引导：2、安装windows系统维护环境(微PE工具箱)

Windows 11最稳定版本详解

c++ - AutoMake Conditional build Multple Projects - Stack Overflow

python 3.x - AWS Lambda code to connect with EKS cluster - Stack Overflow

android - How to force Jetpack compose LazyHorizontalGrid to fill row by row - Stack Overflow

Custom Labelling in Multi-Class Classification in XGBoost LightGBM - Stack Overflow

swift - Cannot launch maps in CarPlay from my app - Stack Overflow

华硕笔记本电脑用U盘重装windows系统

assembly - Calling the world&#39;s simplest NASM function from C - segfault - Stack Overflow

python - Mocking imported class set to attribute in constructor with custom init of tested class - Stack Overflow

Diagnostic analyzer runner is currently unavailable doe to an internal error (with CodeRush) - Stack Overflow

How do I partition disks in a VM instance using cloud-init - Stack Overflow

kubernetes - istio canary strategy with dynamic routing rules with different apps - Stack Overflow

scalatest - Scala-cli test doesnt exit after test run - Stack Overflow

Color a portion of a minipage in Manim - Stack Overflow

ros2 - how to modify imu_filter_madgwick to transform RPY from imu_sensor frame to base_link frame? - Stack Overflow

Kubernetes: How can I run pods but reference of Volume on a different node? - Stack Overflow

pac4j v6 replacement of Pac4JHttpServletRequestWrapper - Stack Overflow

How to get Graalvm to convert AWT Java program to exe - Stack Overflow

winapi - Win32 DrawText() ignores text color set on the device context and draws text in background color - Stack Overflow

发表评论

推荐文章

c# - Nhibernate: Why session cache and query results mismatch? - Stack Overflow

php - Take input from form and pass it to function using a wp-plugin

$wpdb query for price in custom field value

How to properly decode the (timestamp signature) response of freetsa.org TSR in PHP? - Stack Overflow

django - Moving User pk to another field - Stack Overflow

热门文章

How do I create a search form that searches only within a custom post type?

functions - Get post id outside loop : Notice: Trying to get property of non-object

php - Uncaught TypeError: Cannot read property &#39;firstChild&#39; of null after upgrading to WordPress 5.5

Windows7环境下安装配置ElasticSearch及插件（图文）

excel - using Openpyxl and Excel365 have array formula applied to single cell (no spillover) using ws.cell - Stack Overflow

Custom quicktags not working after Wordpress 6.0

android - Google Cloud API does not recognize my Expo SHA-1 fingerprint - Stack Overflow

rest api - How to call Wordpress API Internally

apache superset - HTML entities not rendering in query results - Stack Overflow

java - Are supertypes of a type just the ones that are directly extended or implemented? - Stack Overflow

最新文章

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

华硕笔记本电脑用U盘重装windows系统

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

如何一键安装win7系统(一键安装win7系统步骤)

Windows 11最稳定版本详解

winapi - Win32 DrawText() ignores text color set on the device context and draws text in background color - Stack Overflow

How to get Graalvm to convert AWT Java program to exe - Stack Overflow

Embedding of sequence of events sets - Stack Overflow

hcl - How to create parallel builds foreach item in list using packer template - Stack Overflow

react hooks - My browser localstorage clears everytime i refresh - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

PC系统安装&引导：2、安装windows系统维护环境(微PE工具箱)

assembly - Calling the world's simplest NASM function from C - segfault - Stack Overflow

php - Uncaught TypeError: Cannot read property 'firstChild' of null after upgrading to WordPress 5.5