admin管理员组

文章数量:1400161

I am new to dash and plotly (second day), and my python is still limited.

I have a simple dataframe from a cvs with date(%d-%m/%Y),value

I wanted to have some graph, I have some, for that, because I am new, I added two "columns" one with the month and one other with the year (extracted from the date)

df["date_year"] = pd.to_datetime(df['date'], format = '%d/%m/%Y').dt.strftime('%Y')
df["date_months"] = pd.to_datetime(df['date'], format = '%d/%m/%Y').dt.strftime('%B')

With that, I was able to "easily" produce a graph with the sum of value by month for a selected year (year selected via a dropdown)

fig = px.histogram(df_filtered[mask], x='date_months', y='value', histfunc='sum'

Probably not pretty, but working. What I can't have it is almost the same thing, but with a line by year

I have something "near" with

df_filtered["value_cumsum"] = df_filtered.groupby(['date_months','date_year'])['value'].cumsum()
fig2 = px.line(df_filtered, x="date_months", y="value_cumsum", color='date_year')

But... That's not it: There is all the data of each month, so lines produces a "saw". I want to have only 1 point by month, the last "value_cumsum".

Does someone have an idea before I do more spaghetti code?

I am new to dash and plotly (second day), and my python is still limited.

I have a simple dataframe from a cvs with date(%d-%m/%Y),value

I wanted to have some graph, I have some, for that, because I am new, I added two "columns" one with the month and one other with the year (extracted from the date)

df["date_year"] = pd.to_datetime(df['date'], format = '%d/%m/%Y').dt.strftime('%Y')
df["date_months"] = pd.to_datetime(df['date'], format = '%d/%m/%Y').dt.strftime('%B')

With that, I was able to "easily" produce a graph with the sum of value by month for a selected year (year selected via a dropdown)

fig = px.histogram(df_filtered[mask], x='date_months', y='value', histfunc='sum'

Probably not pretty, but working. What I can't have it is almost the same thing, but with a line by year

I have something "near" with

df_filtered["value_cumsum"] = df_filtered.groupby(['date_months','date_year'])['value'].cumsum()
fig2 = px.line(df_filtered, x="date_months", y="value_cumsum", color='date_year')

But... That's not it: There is all the data of each month, so lines produces a "saw". I want to have only 1 point by month, the last "value_cumsum".

Does someone have an idea before I do more spaghetti code?

Share Improve this question edited Mar 24 at 17:46 toolic 62.3k20 gold badges79 silver badges128 bronze badges asked Mar 24 at 16:43 DavidDavid 12 bronze badges 1
  • Is this a calculation or visualization problem? Did you manage to create a dataframe with the correct data? – Niko Fohr Commented Mar 28 at 14:16
Add a comment  | 

1 Answer 1

Reset to default 0

I think want you want to do is to plot scatter instead of a line.

Not trying to confuse you, but there is a better way to calculate the cumulative sum and get only the highest one if you use a Grouper you can do it this way:

summary = df.groupby(pd.Grouper(key='date', freq='ME')['value'].sum().reset_index()

Then extract the month and year just like you did before

summary["date_year"] = pd.to_datetime(summary['date'], format='%d/%m/%Y').dt.strftime('%Y')
summary["date_months"] = pd.to_datetime(summary['date'], format='%d/%m/%Y').dt.strftime('%B')

And finally you can plot scatter instead of line:

fig4 = px.scatter(summary, x="date_months", y="value", color="date_year",
                  labels={"date_months": "Month", "value": "Cumulative Value"})
fig4.show()

本文标签: Python plotly express line chart with cumulative sum 2Stack Overflow