pandas - Constant Volume chart in python - Stack Overflow

IT技术

更新时间：2025-02-220

admin管理员组
文章数量:1246835

I want to create a constant volume chart in python. Here is an example with a constant volume of 50 and some sample data:

import pandas as pd
import numpy as np
date_rng = pd.date_range(start='2024-01-01', end='2024-12-31 23:00:00', freq='h')

# Create a dataframe with the date range
df = pd.DataFrame(date_rng, columns=['timestamp'])

# Add the 'price' column with random floating numbers between 70 and 100
df['price'] = np.round(np.random.uniform(70, 100, size=(len(date_rng))), 2)

# Add the 'volume' column with random integers between 1 and 10
df['volume'] = np.random.randint(1, 11, size=(len(date_rng)))

constantvolume = 50
df['cumsum'] = np.cumsum(df['volume'])
df['mod'] = df['cumsum']/  constantvolume
df['whole'] = np.ceil(df['mod'])
df['next_num'] = df['whole'].shift(-1) - df['whole']
df['mod2'] = df[df['next_num'] > 0]['cumsum'] % constantvolume
df['mod2'] = df['mod2'].fillna(0)
dfa = df.groupby(df['whole']).agg({'price': ['min', 'max', 'last', 'first'], 'timestamp': 'first', 'volume': 'sum'})
dfa.columns = ['low', 'high', 'close', 'open', 'timestamp', 'volume']
dfa['timestamp'] = pd.to_datetime(dfa['timestamp'])
dfa.set_index('timestamp', inplace=True)
dfa

Now this is very close to what I want to do. The only issue is that the volume in each row is not exactly the defined quantity of 50 because the cumsum doesnt always add to 50.

So what I would have to do is where next_num >0, see if there is the volume = to the defined constant volume, if yes good if no then split the next row with the same timestamp and same price but split the volume in two parts so that the mod is zero and then move on.

The desired result is that in the final dataframe the volume = constantvolume in all rows exactly, with the exception of the last row where it could be different.

The only way I can think of is a loop which I dont think is the best way and will be very slow as the actual dataframe as 1mn rows...

I want to create a constant volume chart in python. Here is an example with a constant volume of 50 and some sample data:

import pandas as pd
import numpy as np
date_rng = pd.date_range(start='2024-01-01', end='2024-12-31 23:00:00', freq='h')

# Create a dataframe with the date range
df = pd.DataFrame(date_rng, columns=['timestamp'])

# Add the 'price' column with random floating numbers between 70 and 100
df['price'] = np.round(np.random.uniform(70, 100, size=(len(date_rng))), 2)

# Add the 'volume' column with random integers between 1 and 10
df['volume'] = np.random.randint(1, 11, size=(len(date_rng)))

constantvolume = 50
df['cumsum'] = np.cumsum(df['volume'])
df['mod'] = df['cumsum']/  constantvolume
df['whole'] = np.ceil(df['mod'])
df['next_num'] = df['whole'].shift(-1) - df['whole']
df['mod2'] = df[df['next_num'] > 0]['cumsum'] % constantvolume
df['mod2'] = df['mod2'].fillna(0)
dfa = df.groupby(df['whole']).agg({'price': ['min', 'max', 'last', 'first'], 'timestamp': 'first', 'volume': 'sum'})
dfa.columns = ['low', 'high', 'close', 'open', 'timestamp', 'volume']
dfa['timestamp'] = pd.to_datetime(dfa['timestamp'])
dfa.set_index('timestamp', inplace=True)
dfa

Now this is very close to what I want to do. The only issue is that the volume in each row is not exactly the defined quantity of 50 because the cumsum doesnt always add to 50.

So what I would have to do is where next_num >0, see if there is the volume = to the defined constant volume, if yes good if no then split the next row with the same timestamp and same price but split the volume in two parts so that the mod is zero and then move on.

The desired result is that in the final dataframe the volume = constantvolume in all rows exactly, with the exception of the last row where it could be different.

The only way I can think of is a loop which I dont think is the best way and will be very slow as the actual dataframe as 1mn rows...

Share Improve this question edited Feb 16 at 17:28 asked Feb 16 at 16:02 nik 1,7842 gold badges17 silver badges37 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

Here's one approach:

Minimal reproducible example

np.random.seed(42) # for reproducibility
date_rng = pd.date_range(start='2024-01-01', periods=5, freq='h')

# ... as above

df['cumsum'] = df['volume'].cumsum()

Data

Using cv = 10 (constantvolume), we want to use index 3 for both the 20 and 27 groups:

            timestamp  price  volume  cumsum
0 2024-01-01 00:00:00  81.24       3       3    # 10
1 2024-01-01 01:00:00  98.52       7      10    # 10
2 2024-01-01 02:00:00  91.96       8      18    # 20
3 2024-01-01 03:00:00  87.96       5      23    # split: 20 | 27
4 2024-01-01 04:00:00  74.68       4      27    # 27

Code

cv = 10

cv_rng = range(cv, df['cumsum'].max(), cv)

s_rng = pd.Series(list(set(cv_rng).difference(df['cumsum'])), 
              dtype=df['cumsum'].dtype, 
              name='cumsum')

df = (pd.concat([df, s_rng])
      .sort_values('cumsum')
      .bfill()
      .assign(
          group=lambda x: np.minimum(np.ceil(x['cumsum'] / cv) * cv, 
                                 x['cumsum'].max())
          )
      )

out = (df.groupby('group').agg(
    **{k:('price', v) for k, v in zip(['low', 'high', 'close', 'open'], 
                                      ['min', 'max', 'last', 'first'])},
    volume=('group', 'max'),
    timestamp=('timestamp', 'first')
    )
    .assign(volume=lambda x: np.where(x['volume'] % cv == 0, 
                                      cv, 
                                      x['volume'] % cv)
            )
    .set_index('timestamp')
    )

Output:

                       low   high  close   open  volume
timestamp                                              
2024-01-01 00:00:00  81.24  98.52  98.52  81.24    10.0
2024-01-01 02:00:00  87.96  91.96  87.96  91.96    10.0
2024-01-01 03:00:00  74.68  87.96  74.68  87.96     7.0 # ts/high/open = index `3`

Explanation / intermediates

First, create a range with start=cv, stop=df['cumsum'].max() and step=cv.

list(cv_rng)
[10, 20]

We want to add rows to df for values from cv_rng if they do not already exist in df['cumsum']: use set + set.difference + df['cumsum'] and create a pd.Series (s_rng).
Now, use pd.concat to add s_rng to df + df.sort_values to order 'cumsum' values + df.bfill to add the rows from s_rng correctly sorted and taking the values of the next row.
Also add 'group' column with df.assign based on np.ceil or the max from 'cumsum' for the last group (via np.minimum).

(pd.concat([df, s_rng], keys=['df', 's_rng'])
 .sort_values('cumsum')
 .bfill()
 .assign(...))
# including `keys` for demonstration purposes only; not needed

                  timestamp  price  volume  cumsum  group
df    0 2024-01-01 00:00:00  81.24     3.0       3   10.0
      1 2024-01-01 01:00:00  98.52     7.0      10   10.0
      2 2024-01-01 02:00:00  91.96     8.0      18   20.0
s_rng 0 2024-01-01 03:00:00  87.96     5.0      20   20.0 # vals from next row
df    3 2024-01-01 03:00:00  87.96     5.0      23   27.0 # note 27 last group
      4 2024-01-01 04:00:00  74.68     4.0      27   27.0

Finally, we use df.groupby with 'group' and use named aggregation.
For 'volume', we chain df.assign to overwrite max for each group with cv if remainder equals 0, else remainder via np.where.
Lastly, use df.set_index to get 'timestamp' as the index.

本文标签： pandasConstant Volume chart in pythonStack Overflow

版权声明：本文标题：pandas - Constant Volume chart in python - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1740201205a2240246.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

pandas - Constant Volume chart in python - Stack Overflow

1 Answer 1

更多相关文章