admin管理员组

文章数量:1122846

Understandable and expected (tz-aware):

import datetime
import numpy as np
import pandas as pd

aware = pd.DatetimeIndex(["2024-11-21", "2024-11-21 12:00"], tz="UTC")
eod = datetime.datetimebine(aware[-1].date(), datetime.time.max, aware.tz)
aware, eod, np.concat([aware, [eod]])

returns

(DatetimeIndex(['2024-11-21 00:00:00+00:00', '2024-11-21 12:00:00+00:00'],
               dtype='datetime64[ns, UTC]', freq=None),
 datetime.datetime(2024, 11, 21, 23, 59, 59, 999999,
                   tzinfo=datetime.timezone.utc),
 array([Timestamp('2024-11-21 00:00:00+0000', tz='UTC'),
        Timestamp('2024-11-21 12:00:00+0000', tz='UTC'),
        datetime.datetime(2024, 11, 21, 23, 59, 59, 999999,
                          tzinfo=datetime.timezone.utc)],
       dtype=object))

note Timestamps (and a datetime) in the return value of np.concat.

Unexpected (tz-naive):

naive = pd.DatetimeIndex(["2024-11-21", "2024-11-21 12:00"])
eod = datetime.datetimebine(naive[-1].date(), datetime.time.max, aware.tz)
naive, eod, np.concat([naive, [eod]])

returns

(DatetimeIndex(['2024-11-21 00:00:00', '2024-11-21 12:00:00'],
               dtype='datetime64[ns]', freq=None),
 datetime.datetime(2024, 11, 21, 23, 59, 59, 999999),
 array([1732147200000000000, 1732190400000000000,
        datetime.datetime(2024, 11, 21, 23, 59, 59, 999999)], dtype=object))

note intergers (and a datetime) in the return value of np.concat.

  1. why do I get integers in the concatenated array for a tz-naive index?
  2. how do I avoid it? I.e., how do I append EOD to a tz-naive DatetimeIndex?

PS. Interestingly enough, at the numpy level the indexes are identical:

np.testing.assert_array_equal(aware.values, naive.values)

Understandable and expected (tz-aware):

import datetime
import numpy as np
import pandas as pd

aware = pd.DatetimeIndex(["2024-11-21", "2024-11-21 12:00"], tz="UTC")
eod = datetime.datetime.combine(aware[-1].date(), datetime.time.max, aware.tz)
aware, eod, np.concat([aware, [eod]])

returns

(DatetimeIndex(['2024-11-21 00:00:00+00:00', '2024-11-21 12:00:00+00:00'],
               dtype='datetime64[ns, UTC]', freq=None),
 datetime.datetime(2024, 11, 21, 23, 59, 59, 999999,
                   tzinfo=datetime.timezone.utc),
 array([Timestamp('2024-11-21 00:00:00+0000', tz='UTC'),
        Timestamp('2024-11-21 12:00:00+0000', tz='UTC'),
        datetime.datetime(2024, 11, 21, 23, 59, 59, 999999,
                          tzinfo=datetime.timezone.utc)],
       dtype=object))

note Timestamps (and a datetime) in the return value of np.concat.

Unexpected (tz-naive):

naive = pd.DatetimeIndex(["2024-11-21", "2024-11-21 12:00"])
eod = datetime.datetime.combine(naive[-1].date(), datetime.time.max, aware.tz)
naive, eod, np.concat([naive, [eod]])

returns

(DatetimeIndex(['2024-11-21 00:00:00', '2024-11-21 12:00:00'],
               dtype='datetime64[ns]', freq=None),
 datetime.datetime(2024, 11, 21, 23, 59, 59, 999999),
 array([1732147200000000000, 1732190400000000000,
        datetime.datetime(2024, 11, 21, 23, 59, 59, 999999)], dtype=object))

note intergers (and a datetime) in the return value of np.concat.

  1. why do I get integers in the concatenated array for a tz-naive index?
  2. how do I avoid it? I.e., how do I append EOD to a tz-naive DatetimeIndex?

PS. Interestingly enough, at the numpy level the indexes are identical:

np.testing.assert_array_equal(aware.values, naive.values)
Share Improve this question edited Nov 22, 2024 at 8:22 FObersteiner 25.4k8 gold badges57 silver badges90 bronze badges asked Nov 21, 2024 at 20:51 sdssds 60k31 gold badges172 silver badges296 bronze badges 10
  • My assumption would be for efficiency for deltas between timestamps without going through an additional correction layer. Basically a fast path. Be interesting to see the technical answer – roganjosh Commented Nov 21, 2024 at 20:57
  • @roganjosh: I am not sure what "efficiency" could be accomplished here. Looks like a bug, TBH. – sds Commented Nov 21, 2024 at 21:26
  • Because it's just integer subtraction without having to modify the integers beforehand with some offset. A non-naive date time has to carry metadata that will require a non-vectorized handling before a delta could be calculated – roganjosh Commented Nov 21, 2024 at 21:33
  • I've go a different representation of the concatenated array concat: [1732147200000000000 1732190400000000000 datetime.datetime(2024, 11, 21, 23, 59, 59, 999999, tzinfo=<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>)] . I think it's for compatibility, it's the only representation that has implicit and explicit tz information. – LMC Commented Nov 21, 2024 at 21:44
  • 1 (additionally, ceil(d) may return the same date as the current date!) --- but thanks for the pointer! – sds Commented Nov 22, 2024 at 18:41
 |  Show 5 more comments

1 Answer 1

Reset to default 1

From Data type promotion in NumPy

When mixing two different data types, NumPy has to determine the appropriate dtype for the result of the operation. This step is referred to as promotion or finding the common dtype.
In typical cases, the user does not need to worry about the details of promotion, since the promotion step usually ensures that the result will either match or exceed the precision of the input.

np.concat() accepts a casting keyword argument (casting="same_kind" default).
If using casting='no' fails

naive_no = np.concat([naive, [eod]], casting='no')

TypeError: Cannot cast array data from dtype('<M8[ns]') to dtype('O') according to the rule 'no'

See Array-protocol type strings.

In both cases the type is object

naive_sk = np.concat([naive, [eod]], casting='same_kind')
print(naive_sk.dtype, naive_sk)

Result

object [1732147200000000000 1732190400000000000
 datetime.datetime(2024, 11, 21, 23, 59, 59, 999999, tzinfo=<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>)]

python 3.9
pandas 2.2.2

本文标签: pythonWhy is tznaive Timestamp converted to integer while tzaware is kept as TimestampStack Overflow