admin管理员组文章数量:1400161
I am new to dash and plotly (second day), and my python is still limited.
I have a simple dataframe from a cvs with date(%d-%m/%Y),value
I wanted to have some graph, I have some, for that, because I am new, I added two "columns" one with the month and one other with the year (extracted from the date)
df["date_year"] = pd.to_datetime(df['date'], format = '%d/%m/%Y').dt.strftime('%Y')
df["date_months"] = pd.to_datetime(df['date'], format = '%d/%m/%Y').dt.strftime('%B')
With that, I was able to "easily" produce a graph with the sum of value by month for a selected year (year selected via a dropdown)
fig = px.histogram(df_filtered[mask], x='date_months', y='value', histfunc='sum'
Probably not pretty, but working. What I can't have it is almost the same thing, but with a line by year
I have something "near" with
df_filtered["value_cumsum"] = df_filtered.groupby(['date_months','date_year'])['value'].cumsum()
fig2 = px.line(df_filtered, x="date_months", y="value_cumsum", color='date_year')
But... That's not it: There is all the data of each month, so lines produces a "saw". I want to have only 1 point by month, the last "value_cumsum".
Does someone have an idea before I do more spaghetti code?
I am new to dash and plotly (second day), and my python is still limited.
I have a simple dataframe from a cvs with date(%d-%m/%Y),value
I wanted to have some graph, I have some, for that, because I am new, I added two "columns" one with the month and one other with the year (extracted from the date)
df["date_year"] = pd.to_datetime(df['date'], format = '%d/%m/%Y').dt.strftime('%Y')
df["date_months"] = pd.to_datetime(df['date'], format = '%d/%m/%Y').dt.strftime('%B')
With that, I was able to "easily" produce a graph with the sum of value by month for a selected year (year selected via a dropdown)
fig = px.histogram(df_filtered[mask], x='date_months', y='value', histfunc='sum'
Probably not pretty, but working. What I can't have it is almost the same thing, but with a line by year
I have something "near" with
df_filtered["value_cumsum"] = df_filtered.groupby(['date_months','date_year'])['value'].cumsum()
fig2 = px.line(df_filtered, x="date_months", y="value_cumsum", color='date_year')
But... That's not it: There is all the data of each month, so lines produces a "saw". I want to have only 1 point by month, the last "value_cumsum".
Does someone have an idea before I do more spaghetti code?
Share Improve this question edited Mar 24 at 17:46 toolic 62.3k20 gold badges79 silver badges128 bronze badges asked Mar 24 at 16:43 DavidDavid 12 bronze badges 1- Is this a calculation or visualization problem? Did you manage to create a dataframe with the correct data? – Niko Fohr Commented Mar 28 at 14:16
1 Answer
Reset to default 0I think want you want to do is to plot scatter instead of a line.
Not trying to confuse you, but there is a better way to calculate the cumulative sum and get only the highest one if you use a Grouper
you can do it this way:
summary = df.groupby(pd.Grouper(key='date', freq='ME')['value'].sum().reset_index()
Then extract the month and year just like you did before
summary["date_year"] = pd.to_datetime(summary['date'], format='%d/%m/%Y').dt.strftime('%Y')
summary["date_months"] = pd.to_datetime(summary['date'], format='%d/%m/%Y').dt.strftime('%B')
And finally you can plot scatter instead of line:
fig4 = px.scatter(summary, x="date_months", y="value", color="date_year",
labels={"date_months": "Month", "value": "Cumulative Value"})
fig4.show()
本文标签: Python plotly express line chart with cumulative sum 2Stack Overflow
版权声明:本文标题:Python plotly express line chart with cumulative sum 2 - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744239497a2596718.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论