admin管理员组

文章数量:1394099

I am working with pytorch-forecasting to create a TimeSeriesDataSet where I have 30 target variables that I want to predict. However, when I pass this dataset to a DataLoader, I encounter an issue:

Expected Behavior Since I have 30 target variables, I expect TimeSeriesDataSet to return:

A batch where the targets are in the shape (batch_size, 30) as a single torch.Tensor. The dataset should be structured so that the DataLoader can correctly package it into mini-batches without issues. In other words, I expect each batch to contain: A dict of inputs with the necessary features. A torch.Tensor for the targets, with shape (batch_size, 30). Actual Behavior Instead, TimeSeriesDataSet returns a tuple with two elements:

The first element is a dict containing the input tensors, which is fine. The second element is another tuple with two elements: sample[1][0]: A list of 30 tensors instead of a single tensor. sample[1][1]: None, which causes an error when passed to PyTorch's default_collate.

Error Message from DataLoader:

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts, or lists; found <class 'NoneType'>

This suggests that PyTorch cannot handle the None value being returned by TimeSeriesDataSet.

Dataset Information: 7061 time steps. Each row contains 30 numerical values (representing different features). Goal: Predict the values for the next step based on the previous 10 time steps. The dataset is structured as a single time series (one group).

Code:

import pandas as pd
import torch
from pytorch_forecasting.data.encoders import TorchNormalizer
from pytorch_forecasting import TimeSeriesDataSet, MultiNormalizer
from torch.utils.data import DataLoader

# Load dataset
file_name = 'DATA'  # CSV file
data = pd.read_csv(f'{file_name}.CSV')

# Drop unnecessary columns
if "Date" in data.columns:
    data = data.drop(columns=["Date"])

# Add time index
data["time_idx"] = range(len(data))
data["time_idx"] = data["time_idx"].astype(int)

# Add a dummy group column (since all data belongs to one group)
data["group"] = "single_group"

# Rename columns for uniformity
data.columns = ["num_" + str(i+1) for i in range(30)] + ["time_idx", "group"]

# Convert 'group' to category codes
data["group"] = data["group"].astype("category").cat.codes

# Fill any NaN values
if data.isna().sum().sum() > 0:
    print("⚠️ Found NaN values, filling with 0.")
    data.fillna(0, inplace=True)

# TimeSeriesDataSet configuration
max_encoder_length = 10  # Past observations
max_prediction_length = 1  # Future prediction
target_cols = ["num_" + str(i+1) for i in range(30)]

# Create TimeSeriesDataSet
training = TimeSeriesDataSet(
    data=data,
    time_idx="time_idx",
    target=target_cols,  # 30 targets
    group_ids=["group"],
    max_encoder_length=max_encoder_length,
    max_prediction_length=max_prediction_length,
    time_varying_unknown_reals=target_cols,
    target_normalizer=MultiNormalizer([TorchNormalizer(method="identity") for _ in range(30)]),
    add_relative_time_idx=True,
    add_target_scales=False,
    add_encoder_length=True
)

# DataLoader
batch_size = 32
train_dataloader = DataLoader(
    training,
    batch_size=batch_size,
    shuffle=False  
)

# DEBUG: Inspect the DataLoader output
for batch in train_dataloader:
    print("

本文标签: tensorflowPyTorch Forecasting TimeSeriesDataSet Returns None in DataLoader BatchStack Overflow