admin管理员组文章数量:1344470
I have a dataset containing semantic vectors of length 384. I group them to windows each containing a 100. I batch them to batch sizes of 32. However, I get an error when fitting the model. I feel like it might be because it creates 48 32-sized batches out of 1515 windows making the last batch incomplete given that 1515/32=47,34375
, but I don't know whether that should really cause an issue - I never had the problem in the past.
These are my fit parameters:
model.fit(training_data, epochs=50, validation_data=validation_data)
so explicitly, I don't set the steps which should mean it's none that should result in tensorflow running through the whole training_data on each epoch.
Is the fix for this to somehow drop the incomplete batch or to fill it up with values somehow? If so, could you provide an example how to drop it?
Full error message
2025-04-03 12:20:03 [INFO]: Training (151508, 2)
2025-04-03 12:20:03 [INFO]: Validation (26514, 2)
2025-04-03 12:20:03 [INFO]: Test (22728, 2)
I0000 00:00:1743675604.435011 6880 cuda_executor:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at .0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-04-03 12:20:04.435109: W tensorflow/core/common_runtime/gpu/gpu_device:2343] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2025-04-03 12:20:04.838231: I tensorflow/core/framework/local_rendezvous:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Number of windows created: 1515
Input shape: (32, 100, 384)
Label shape: (32, 100, 384)
2025-04-03 12:20:07.074530: I tensorflow/core/framework/local_rendezvous:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Counted training batches: 48
Epoch 1/50
2025-04-03 12:20:09.798596: E tensorflow/core/util/util:131] oneDNN supports DT_INT32 only on platforms with AVX-512. Falling back to the default Eigen-based implementation if present.
47/Unknown 5s 48ms/step - loss: 0.00172025-04-03 12:20:12.186355: I tensorflow/core/framework/local_rendezvous:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
[[{{node IteratorGetNext}}]]
/home/xaver/Documents/Repos/logAnomalyDetection/venv/lib/python3.12/site-packages/keras/src/trainers/epoch_iterator.py:151: UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset.
self._interrupted_warning()
Code
"""
Semi-supervised log anomaly detection using sentence vector embeddings and a sliding window
"""
import logging
import pickle
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras import layers, models, Input
from sklearn.metrics import (
confusion_matrix, matthews_corrcoef, precision_score, recall_score,
f1_score, roc_auc_score, average_precision_score
)
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
WINDOW_SIZE = 100
BATCH_SIZE = 32
# Configure logging
logging.basicConfig(
format="%(asctime)s [%(levelname)s]: %(message)s",
level=logging.INFO,
datefmt="%Y-%m-%d %H:%M:%S"
)
# Step 1: Load and prepare data
input_pickle = "outputs/parsed_openstack_logs_with_embeddings_final.pickle"
with open(input_pickle, "rb") as handle:
df = pickle.load(handle)
logging.info("DataFrame loaded successfully.")
# embeddings will be the input and output since AE tries to create the representation from the latent space
# label contains whether the data is anomalous (1) or normal (0).
logging.debug(f"{df.columns}")
logging.debug(f"\n{df.head()}")
# split data so that training data contains no anomalies, validation and testing contain an equal number of normal and abnormal entries
embedding_length = len(df["embeddings"][0])
print("Length of embedding vector:", embedding_length)
normal_data = df[df['label'] == 0]
test_data_abnormal = df[df['label'] == 1]
abnormal_normal_ratio = len(test_data_abnormal) / len(normal_data)
logging.info(f"N normal: {len(normal_data)}; N abnormal: {len(test_data_abnormal)} -- Ratio: {abnormal_normal_ratio}")
training_data, rest_data = train_test_split(normal_data, train_size=0.8, shuffle=False)
validation_data, test_data_normal = train_test_split(rest_data, test_size=0.3, shuffle=False)
test_data_normal = test_data_normal.head(len(test_data_abnormal))
test_data_abnormal = test_data_abnormal.head(len(test_data_normal))
test_data = pd.concat([test_data_normal, test_data_abnormal])
def create_window_tf_dataset(dataset):
# transforms the embeddings into a windowed dataset
embeddings_list = dataset["embeddings"].tolist()
embeddings_array = np.array(embeddings_list)
embedding_tensor = tf.convert_to_tensor(embeddings_array)
df_tensor = tf.convert_to_tensor(embedding_tensor)
tensor_dataset = tf.data.Dataset.from_tensor_slices(df_tensor)
windowed_dataset = tensor_dataset.window(WINDOW_SIZE, shift=WINDOW_SIZE, drop_remainder=True)
windowed_dataset = windowed_dataset.flat_map(lambda window: window.batch(WINDOW_SIZE))
return windowed_dataset
logging.info(f"Training {training_data.shape}")
logging.info(f"Validation {validation_data.shape}")
logging.info(f"Test {test_data.shape}")
num_windows = sum(1 for _ in create_window_tf_dataset(training_data))
print(f"Number of windows created: {num_windows}") # 1515
training_data = create_window_tf_dataset(training_data).map(lambda window: (window, window)) # training data is normal
validation_data = create_window_tf_dataset(validation_data).map(lambda window: (window, window)) # training data is normal
test_data_normal = create_window_tf_dataset(test_data_normal).map(lambda window: (window, window)) # normal training data is normal
test_data_abnormal = create_window_tf_dataset(test_data_abnormal).map(lambda window: (window, window)) # abnormal training data is abnormal
# group normal and abnormal data
test_data = test_data_normal.concatenate(test_data_abnormal)
training_data = training_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(BATCH_SIZE)
test_data = test_data.batch(BATCH_SIZE)
for x, y in training_data.take(1):
print("Input shape:", x.shape) # (32, 100, 384)
print("Label shape:", y.shape) # (32, 100, 384)
train_batches = sum(1 for _ in training_data)
print(f"Counted training batches: {train_batches}") # 48
# Build the Autoencoder model
model = models.Sequential([
Input(shape=(WINDOW_SIZE, embedding_length)),
layers.LSTM(64, return_sequences=True),
layers.LSTM(32, return_sequences=False),
layers.RepeatVector(WINDOW_SIZE),
layers.LSTM(32, return_sequences=True),
layers.LSTM(64, return_sequences=True),
layers.TimeDistributed(layers.Dense(embedding_length))
])
# Compile the model
modelpile(optimizer='adam', loss='mse')
# Train the model
model.fit(training_data, epochs=50, validation_data=validation_data) # error occurs here
# [... evaluation ...]
Additional Info
My tensorflow version is 2.17.0
.
本文标签:
版权声明:本文标题:python - Tensorflow runs out of range during fitting: Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence - 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743783749a2538305.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论