admin管理员组

文章数量:1406911

I am trying to apply target encoding to categorical features using the category_encoders.TargetEncoder in Python. However, I keep getting the following error:

AttributeError: 'numpy.ndarray' object has no attribute 'groupby'
from category_encoders import TargetEncoder
from sklearn.model_selection import train_test_split

# Features for target encoding
encoding_cols = ['grade', 'sub_grade', 'home_ownership', 'verification_status', 
                 'purpose', 'application_type', 'zipcode']

# Train-Test Split
X_train_cv, X_test, y_train_cv, y_test = train_test_split(x, y, test_size=0.25, random_state=1)
X_train, X_test_cv, y_train, y_test_cv = train_test_split(X_train_cv, y_train_cv, test_size=0.25, random_state=1)

# Initialize the Target Encoder
encoder = TargetEncoder()

# Apply Target Encoding
for i in encoding_cols:
    X_train[i] = encoder.fit_transform(X_train[i], y_train)  # **Error occurs here**
    X_test_cv[i] = encoder.transform(X_test_cv[i])
    X_test[i] = encoder.transform(X_test[i])

want to successfully apply target encoding to the categorical columns without encountering the 'numpy.ndarray' object has no attribute 'groupby' error.

I am trying to apply target encoding to categorical features using the category_encoders.TargetEncoder in Python. However, I keep getting the following error:

AttributeError: 'numpy.ndarray' object has no attribute 'groupby'
from category_encoders import TargetEncoder
from sklearn.model_selection import train_test_split

# Features for target encoding
encoding_cols = ['grade', 'sub_grade', 'home_ownership', 'verification_status', 
                 'purpose', 'application_type', 'zipcode']

# Train-Test Split
X_train_cv, X_test, y_train_cv, y_test = train_test_split(x, y, test_size=0.25, random_state=1)
X_train, X_test_cv, y_train, y_test_cv = train_test_split(X_train_cv, y_train_cv, test_size=0.25, random_state=1)

# Initialize the Target Encoder
encoder = TargetEncoder()

# Apply Target Encoding
for i in encoding_cols:
    X_train[i] = encoder.fit_transform(X_train[i], y_train)  # **Error occurs here**
    X_test_cv[i] = encoder.transform(X_test_cv[i])
    X_test[i] = encoder.transform(X_test[i])

want to successfully apply target encoding to the categorical columns without encountering the 'numpy.ndarray' object has no attribute 'groupby' error.

Share edited Mar 15 at 18:57 desertnaut 60.5k32 gold badges155 silver badges182 bronze badges asked Mar 4 at 8:00 IronmanIronman 132 bronze badges 6
  • 3 always put full error message because there are other useful information. – furas Commented Mar 4 at 8:19
  • 1 maybe it needs pandas.DataFrame because it has function groupby – furas Commented Mar 4 at 8:19
  • i tried to run TargetEncoder with different objects- dataframe, list, numpy.array - and it always works, I can't reproduce problem with simple code. Maybe later I would try to run your colab code. At this moment you could use print() to check type() of data before fit_transform. Maybe it can explain what can make problem – furas Commented Mar 4 at 16:01
  • 2 (1) always put full error message because there are other useful information. (2) in colab you have little different code than in your question - it can make difference. Always show code which gives you error. (3) you could add link in question - it will be more visible, so more people may help you. – furas Commented Mar 4 at 16:07
  • Please try to provide a minimal reproducible example. When I run most of your code I get the error you report, but when I try running just the data import, split, target definition, and encoder fit (without specifying columns) it works fine. – Ben Reiniger Commented Mar 5 at 2:27
 |  Show 1 more comment

2 Answers 2

Reset to default 2

This is interesting. I can reproduce your error.

It is related to the dtype. To solve the issue you need to force a conversion using its list values and set the name and index explicitly.

y_train = pd.Series(y_train.tolist(), name='loan_status', index=y_train.index)

This will convert your initial dtype of CategoricalDtype(categories=[1, 0], ordered=False, categories_dtype=int64) to dtype('int64')

So you last cell in the Colab is now:

# Initialize TargetEncoder
encoder = ce.TargetEncoder(cols=encoding_cols)

# Here is the list conversion and back to series
y_train = pd.Series(y_train.tolist(), index=y_train.index)

# Fit and transform the training data
X_train = encoder.fit_transform(X_train, y_train)

and this works fine.

I'm the maintainer of Category Encoders. There was a problem in the library, I've fixed it now in version 2.8.1

本文标签: python39numpyndarray39 object has no attribute 39groupby39Stack Overflow