admin管理员组

文章数量:1313386

Description
I am trying to make a paginated request to a public api.
The request limit is 100 and so looped paginated requests are required to pull all records.
The api contains some number of erroneous records that, if contained within the paginated request, will cause to the request to fail.
I have attempted to create a loop that identifies and then skips the erroneous records in an efficient way while maintaining the largest batch limits where possible. However, I feel my approach of halving the batch limit is a bit simplistic and I wonder if there is a more efficient approach than mine?

Current Approach

  1. Set the initial parameters for the API request, including limit and offset.
  2. Create a loop that continues until all records are fetched.
  3. In each iteration, make a request to the API with the current limit and offset.
  4. If the request is successful (status code 200), process the data and extend a results list.
  5. If an error 500 occurs an erroneous record is contained with the batch, halve the current batch limit until a successful request can be made.
  6. If the limit has been reduced to 1 and a 500 error is received an erroneous record is identified. Increment the offset by 1 to skip the record and return the batch limit to maximum.
  7. Continue until all records are fetched.

Code

import requests

# set initial parameters
url = ";
headers = {'accept': 'application/json'}
params = {
    'limit':100,
    'offset':0,
    'timescales' : 'LONG_TERM',
    "statuses": "",
    'productTypes': 'LT_EXPLICIT_ANNUAL, LT_EXPLICIT_SEASONAL, LT_EXPLICIT_QUARTERLY, LT_EXPLICIT_MONTHLY',
    'sortBy': 'BIDDING_PERIOD_START_ASC'
}

# empty list to store data
all_data = []
# initial pull request to get total records count
total_records = requests.get(url = url, headers = headers,params = params).json()['totalCount']

# loop to paginate and get all records
while params['offset'] < total_records:
    
    response = requests.get(url=url, headers=headers, params=params)
    
    # successful request
    if response.status_code == 200:
        print(f"Success: {params['offset']} to {params['offset'] + params['limit']}")
        data = response.json()
        all_data.extend(data['entries'])
        params['offset'] += params['limit']  # Move to the next set of records
        
    # failed request    
    elif response.status_code == 500:
        print(f"Fail: {params['offset']} to {params['offset'] + params['limit']}")
        if params['limit'] > 1:
            params['limit'] = params['limit'] // 2  # Halve the limit to narrow down the search
        else:
            # If limit is 1 and we get a 500 error, skip the problematic record
            params['offset'] += 1  # Increment offset to skip the problematic record
            params['limit'] = 100  # Reset limit back to 100 for the next batch

本文标签: pythonPaginated API requestsStack Overflow