admin管理员组

文章数量:1122846

I am trying to use TabNetClassifier but my model seems to be overfitting. Here is my current setup:

import pandas as pd
import torch
from sklearn.preprocessing import MinMaxScaler
from imblearn.over_sampling import SMOTE
from pytorch_tabnet.tab_model import TabNetClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load the dataset
data = pd.read_csv('/content/augmented_cleaned_data.csv')

# Define features and target
X = data.drop(columns=['GRADE', 'COURSE ID', 'STUDENT ID', '30', '29'])
y = data['GRADE']

# Normalize features
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Split the dataset before applying SMOTE to avoid data leakage
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Handle class imbalance with SMOTE on training set only
smote = SMOTE(random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

# Train TabNet model
device = 'cuda' if torch.cuda.is_available() else 'cpu'

tabnet_model = TabNetClassifier(
    n_d=4, n_a=4, n_steps=2, gamma=1.3,
    lambda_sparse=1e-2,
    optimizer_fn=torch.optim.Adam,
    optimizer_params=dict(lr=1e-3, weight_decay=1e-4),
    mask_type='entmax',
    scheduler_params={"step_size": 50, "gamma": 0.9},
    scheduler_fn=torch.optim.lr_scheduler.StepLR,
    seed=42,
    device_name=device
)

tabnet_model.fit(
    X_train_resampled, y_train_resampled,
    eval_set=[(X_test, y_test)],
    eval_metric=['balanced_accuracy'],
    max_epochs=200,
    patience=30,
    batch_size=256
)

# Predict and evaluate
y_pred = tabnet_model.predict(X_test)
print(classification_report(y_test, y_pred, zero_division=1))

Notes:

  • Make sure there is no indentation inconsistency, such as a mixture of tabs and spaces, as this can confuse the parser.
  • Always use Preview to verify how the code appears before submitting.

If you follow these guidelines, your code should be correctly formatted on StackOverflow, and you will avoid the warning about code formatting issues. Properly formatted code not only makes your post more readable but also attracts more responses from the community.

本文标签: google colaboratoryWhy Is My TabNet Model Achieving Unrealistically High Validation AccuracyStack Overflow