admin管理员组

文章数量:1122832

The help doc for pandas.Series.nbytes shows the following example:

s = pd.Series(['Ant', 'Bear', 'Cow'])
s  

0 Ant
1 Bear
2 Cow
dtype: object

s.nbytes

24
<< end example >>

How is that 24 bytes?
I tried looking at three different encodings, none of which seems to yield that total.

print(s.str.encode('utf-8').str.len().sum())
print(s.str.encode('utf-16').str.len().sum())
print(s.str.encode('ascii').str.len().sum())

10
26
10

The help doc for pandas.Series.nbytes shows the following example:

s = pd.Series(['Ant', 'Bear', 'Cow'])
s  

0 Ant
1 Bear
2 Cow
dtype: object

s.nbytes

24
<< end example >>

How is that 24 bytes?
I tried looking at three different encodings, none of which seems to yield that total.

print(s.str.encode('utf-8').str.len().sum())
print(s.str.encode('utf-16').str.len().sum())
print(s.str.encode('ascii').str.len().sum())

10
26
10

Share Improve this question asked Nov 21, 2024 at 16:43 MCornejoMCornejo 3571 silver badge13 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 3

Pandas nbytes does not refer to the bytes required to store the string data encoded in specific formats like UTF-8, UTF-16, or ASCII. It refers to the total number of bytes consumed by the underlying array of the Series data in memory.

Pandas stores a NumPy array of pointers to these Python objects when using the object dtype.

On a 64-bit system, each pointer/reference takes 8 bytes.

3 × 8 bytes =24 bytes.

Link: nbyte source code

Link: ndarray documentation

本文标签: pythonHow does PandasSeriesnbytes work for strings Results don39t seem to match expectationsStack Overflow