admin管理员组文章数量:1246835
I want to create a constant volume chart in python. Here is an example with a constant volume of 50 and some sample data:
import pandas as pd
import numpy as np
date_rng = pd.date_range(start='2024-01-01', end='2024-12-31 23:00:00', freq='h')
# Create a dataframe with the date range
df = pd.DataFrame(date_rng, columns=['timestamp'])
# Add the 'price' column with random floating numbers between 70 and 100
df['price'] = np.round(np.random.uniform(70, 100, size=(len(date_rng))), 2)
# Add the 'volume' column with random integers between 1 and 10
df['volume'] = np.random.randint(1, 11, size=(len(date_rng)))
constantvolume = 50
df['cumsum'] = np.cumsum(df['volume'])
df['mod'] = df['cumsum']/ constantvolume
df['whole'] = np.ceil(df['mod'])
df['next_num'] = df['whole'].shift(-1) - df['whole']
df['mod2'] = df[df['next_num'] > 0]['cumsum'] % constantvolume
df['mod2'] = df['mod2'].fillna(0)
dfa = df.groupby(df['whole']).agg({'price': ['min', 'max', 'last', 'first'], 'timestamp': 'first', 'volume': 'sum'})
dfa.columns = ['low', 'high', 'close', 'open', 'timestamp', 'volume']
dfa['timestamp'] = pd.to_datetime(dfa['timestamp'])
dfa.set_index('timestamp', inplace=True)
dfa
Now this is very close to what I want to do. The only issue is that the volume in each row is not exactly the defined quantity of 50 because the cumsum doesnt always add to 50.
So what I would have to do is where next_num >0, see if there is the volume = to the defined constant volume, if yes good if no then split the next row with the same timestamp and same price but split the volume in two parts so that the mod is zero and then move on.
The desired result is that in the final dataframe the volume = constantvolume in all rows exactly, with the exception of the last row where it could be different.
The only way I can think of is a loop which I dont think is the best way and will be very slow as the actual dataframe as 1mn rows...
I want to create a constant volume chart in python. Here is an example with a constant volume of 50 and some sample data:
import pandas as pd
import numpy as np
date_rng = pd.date_range(start='2024-01-01', end='2024-12-31 23:00:00', freq='h')
# Create a dataframe with the date range
df = pd.DataFrame(date_rng, columns=['timestamp'])
# Add the 'price' column with random floating numbers between 70 and 100
df['price'] = np.round(np.random.uniform(70, 100, size=(len(date_rng))), 2)
# Add the 'volume' column with random integers between 1 and 10
df['volume'] = np.random.randint(1, 11, size=(len(date_rng)))
constantvolume = 50
df['cumsum'] = np.cumsum(df['volume'])
df['mod'] = df['cumsum']/ constantvolume
df['whole'] = np.ceil(df['mod'])
df['next_num'] = df['whole'].shift(-1) - df['whole']
df['mod2'] = df[df['next_num'] > 0]['cumsum'] % constantvolume
df['mod2'] = df['mod2'].fillna(0)
dfa = df.groupby(df['whole']).agg({'price': ['min', 'max', 'last', 'first'], 'timestamp': 'first', 'volume': 'sum'})
dfa.columns = ['low', 'high', 'close', 'open', 'timestamp', 'volume']
dfa['timestamp'] = pd.to_datetime(dfa['timestamp'])
dfa.set_index('timestamp', inplace=True)
dfa
Now this is very close to what I want to do. The only issue is that the volume in each row is not exactly the defined quantity of 50 because the cumsum doesnt always add to 50.
So what I would have to do is where next_num >0, see if there is the volume = to the defined constant volume, if yes good if no then split the next row with the same timestamp and same price but split the volume in two parts so that the mod is zero and then move on.
The desired result is that in the final dataframe the volume = constantvolume in all rows exactly, with the exception of the last row where it could be different.
The only way I can think of is a loop which I dont think is the best way and will be very slow as the actual dataframe as 1mn rows...
Share Improve this question edited Feb 16 at 17:28 nik asked Feb 16 at 16:02 niknik 1,7842 gold badges17 silver badges37 bronze badges1 Answer
Reset to default 2Here's one approach:
Minimal reproducible example
np.random.seed(42) # for reproducibility
date_rng = pd.date_range(start='2024-01-01', periods=5, freq='h')
# ... as above
df['cumsum'] = df['volume'].cumsum()
Data
Using cv = 10
(constantvolume), we want to use index 3
for both the 20
and 27
groups:
timestamp price volume cumsum
0 2024-01-01 00:00:00 81.24 3 3 # 10
1 2024-01-01 01:00:00 98.52 7 10 # 10
2 2024-01-01 02:00:00 91.96 8 18 # 20
3 2024-01-01 03:00:00 87.96 5 23 # split: 20 | 27
4 2024-01-01 04:00:00 74.68 4 27 # 27
Code
cv = 10
cv_rng = range(cv, df['cumsum'].max(), cv)
s_rng = pd.Series(list(set(cv_rng).difference(df['cumsum'])),
dtype=df['cumsum'].dtype,
name='cumsum')
df = (pd.concat([df, s_rng])
.sort_values('cumsum')
.bfill()
.assign(
group=lambda x: np.minimum(np.ceil(x['cumsum'] / cv) * cv,
x['cumsum'].max())
)
)
out = (df.groupby('group').agg(
**{k:('price', v) for k, v in zip(['low', 'high', 'close', 'open'],
['min', 'max', 'last', 'first'])},
volume=('group', 'max'),
timestamp=('timestamp', 'first')
)
.assign(volume=lambda x: np.where(x['volume'] % cv == 0,
cv,
x['volume'] % cv)
)
.set_index('timestamp')
)
Output:
low high close open volume
timestamp
2024-01-01 00:00:00 81.24 98.52 98.52 81.24 10.0
2024-01-01 02:00:00 87.96 91.96 87.96 91.96 10.0
2024-01-01 03:00:00 74.68 87.96 74.68 87.96 7.0 # ts/high/open = index `3`
Explanation / intermediates
- First, create a range with
start=cv
,stop=df['cumsum'].max()
andstep=cv
.
list(cv_rng)
[10, 20]
- We want to add rows to
df
for values fromcv_rng
if they do not already exist indf['cumsum']
: use set + set.difference +df['cumsum']
and create apd.Series
(s_rng
). - Now, use
pd.concat
to adds_rng
todf
+df.sort_values
to order 'cumsum' values +df.bfill
to add the rows froms_rng
correctly sorted and taking the values of the next row. - Also add 'group' column with
df.assign
based onnp.ceil
or the max from 'cumsum' for the last group (vianp.minimum
).
(pd.concat([df, s_rng], keys=['df', 's_rng'])
.sort_values('cumsum')
.bfill()
.assign(...))
# including `keys` for demonstration purposes only; not needed
timestamp price volume cumsum group
df 0 2024-01-01 00:00:00 81.24 3.0 3 10.0
1 2024-01-01 01:00:00 98.52 7.0 10 10.0
2 2024-01-01 02:00:00 91.96 8.0 18 20.0
s_rng 0 2024-01-01 03:00:00 87.96 5.0 20 20.0 # vals from next row
df 3 2024-01-01 03:00:00 87.96 5.0 23 27.0 # note 27 last group
4 2024-01-01 04:00:00 74.68 4.0 27 27.0
- Finally, we use
df.groupby
with 'group' and use named aggregation. - For 'volume', we chain
df.assign
to overwritemax
for each group withcv
if remainder equals 0, else remainder vianp.where
. - Lastly, use
df.set_index
to get 'timestamp' as the index.
本文标签: pandasConstant Volume chart in pythonStack Overflow
版权声明:本文标题:pandas - Constant Volume chart in python - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1740201205a2240246.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论