admin管理员组

文章数量:1289557

I found that setting pandas DataFrame column with numpy array whose dtype is object will cause a wierd error. I wonder why it happens.

The code I ran is as follows:

import numpy as np
import pandas as pd

print(f"numpy version: {np.__version__}")
print(f"pandas version: {pd.__version__}")

data = pd.DataFrame({
    "c1": [1, 2, 3, 4, 5],
})

print("-" * 10)

t1 = np.array([["A"], ["B"], ["C"], ["D"], ["E"]])
data["c1"] = t1 # This works well

print("-" * 10)

t2 = np.array([["A"], ["B"], ["C"], ["D"], ["E"]], dtype=object)
data["c1"] = t2 # This throws an error

print("-" * 10)

The output is:

numpy version: 1.26.4
pandas version: 2.2.2
----------
----------
Traceback (most recent call last):
  File "...\test.py", line 19, in <module>
    data["c1"] = t2 # This throws an error
    ~~~~^^^^^^
  File "...\pandas\core\frame.py", line 4311, in __setitem__
    self._set_item(key, value)
  File "...\pandas\core\frame.py", line 4524, in _set_item
    value, refs = self._sanitize_column(value)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\pandas\core\frame.py", line 5267, in _sanitize_column
    arr = sanitize_array(value, self.index, copy=True, allow_2d=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\pandas\core\construction.py", line 606, in sanitize_array
    subarr = maybe_infer_to_datetimelike(data)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\pandas\core\dtypes\cast.py", line 1182, in maybe_infer_to_datetimelike
    raise ValueError(value.ndim)  # pragma: no cover
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 2

I found that setting pandas DataFrame column with numpy array whose dtype is object will cause a wierd error. I wonder why it happens.

The code I ran is as follows:

import numpy as np
import pandas as pd

print(f"numpy version: {np.__version__}")
print(f"pandas version: {pd.__version__}")

data = pd.DataFrame({
    "c1": [1, 2, 3, 4, 5],
})

print("-" * 10)

t1 = np.array([["A"], ["B"], ["C"], ["D"], ["E"]])
data["c1"] = t1 # This works well

print("-" * 10)

t2 = np.array([["A"], ["B"], ["C"], ["D"], ["E"]], dtype=object)
data["c1"] = t2 # This throws an error

print("-" * 10)

The output is:

numpy version: 1.26.4
pandas version: 2.2.2
----------
----------
Traceback (most recent call last):
  File "...\test.py", line 19, in <module>
    data["c1"] = t2 # This throws an error
    ~~~~^^^^^^
  File "...\pandas\core\frame.py", line 4311, in __setitem__
    self._set_item(key, value)
  File "...\pandas\core\frame.py", line 4524, in _set_item
    value, refs = self._sanitize_column(value)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\pandas\core\frame.py", line 5267, in _sanitize_column
    arr = sanitize_array(value, self.index, copy=True, allow_2d=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\pandas\core\construction.py", line 606, in sanitize_array
    subarr = maybe_infer_to_datetimelike(data)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\pandas\core\dtypes\cast.py", line 1182, in maybe_infer_to_datetimelike
    raise ValueError(value.ndim)  # pragma: no cover
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 2
Share Improve this question asked Feb 21 at 10:15 Tony DingTony Ding 1
Add a comment  | 

1 Answer 1

Reset to default 1

I'm not sure why this is causing an error with dtype=object, but your arrays are 2D.

A Series is a 1D object.

If you convert them to 1D this works fine:

data['c1'] = t2.ravel()    # works fine
data['c1'] = t2.squeeze()  # also works fine

本文标签: Setting pandas DataFrame column with numpy object array causes errorStack Overflow