admin管理员组

文章数量:1123030

So I've been working with a LOT of data, but for simplicity we can use the sample data below. However, what I'm trying to do is plot a line chart that will let me see the extremes, compress the Y axis where no values are present, and expand the Y axis where data is present. The problem I have is the picture below. We can see that there is no data between 3,500 and 500, yet there is a huge gap, and then an almost solid line at the bottom.

What I'd like to have is the line chart be displayed where we also include the extreme (totals), but not have the huge gap between the sales data, and still be able to see the sales data like this, but also include the line at the top for the Totals:

Here's the code I have that does the charts so far, but I need to be able to apply this to a much larger set of data. The data set I would use for this would contain hundreds of "employees" across multiple stores. So the extreme values would be something like "Store_A":9560, "Store_B":6470, but the 'normal' values for the employee sales would only range between 0 and 300 (but 300 is variable, some weeks we've got guys that do more than 300).

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Sample data
data = {
    'Date': ['2025-01-10', '2025-01-17', '2025-01-24', '2025-01-31'],
    'Bob': [156, 60, 58, 62],
    'Joe': [37, 40, 139, 42],
    'Sally': [62, 265, 63, 67],
    'Total': [3698, 3750, 3720, 3800]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Melt the DataFrame to long format
df_melted = df.melt(id_vars='Date', var_name='Employee', value_name='Sales')

# Convert Date column to datetime
df_melted['Date'] = pd.to_datetime(df_melted['Date'])

# Create the line plot
plt.figure(figsize=(12, 8))
sns.lineplot(x='Date', y='Sales', hue='Employee', data=df_melted, marker='o')

# Set y-axis to display whole numbers only
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{int(x):,}'))

# Add labels and title
plt.xlabel('Date')
plt.ylabel('Number of Sales')
plt.title('Employee Sales Data Over Time')

# Set custom y-axis ticks with greater spacing for lower numbers and compressed ranges without data
custom_ticks = [0, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100] #+ list(range(200, 4001, 1000))
plt.gca().set_yticks(custom_ticks)

# Show the plot
plt.show()

So I've been working with a LOT of data, but for simplicity we can use the sample data below. However, what I'm trying to do is plot a line chart that will let me see the extremes, compress the Y axis where no values are present, and expand the Y axis where data is present. The problem I have is the picture below. We can see that there is no data between 3,500 and 500, yet there is a huge gap, and then an almost solid line at the bottom.

What I'd like to have is the line chart be displayed where we also include the extreme (totals), but not have the huge gap between the sales data, and still be able to see the sales data like this, but also include the line at the top for the Totals:

Here's the code I have that does the charts so far, but I need to be able to apply this to a much larger set of data. The data set I would use for this would contain hundreds of "employees" across multiple stores. So the extreme values would be something like "Store_A":9560, "Store_B":6470, but the 'normal' values for the employee sales would only range between 0 and 300 (but 300 is variable, some weeks we've got guys that do more than 300).

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Sample data
data = {
    'Date': ['2025-01-10', '2025-01-17', '2025-01-24', '2025-01-31'],
    'Bob': [156, 60, 58, 62],
    'Joe': [37, 40, 139, 42],
    'Sally': [62, 265, 63, 67],
    'Total': [3698, 3750, 3720, 3800]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Melt the DataFrame to long format
df_melted = df.melt(id_vars='Date', var_name='Employee', value_name='Sales')

# Convert Date column to datetime
df_melted['Date'] = pd.to_datetime(df_melted['Date'])

# Create the line plot
plt.figure(figsize=(12, 8))
sns.lineplot(x='Date', y='Sales', hue='Employee', data=df_melted, marker='o')

# Set y-axis to display whole numbers only
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{int(x):,}'))

# Add labels and title
plt.xlabel('Date')
plt.ylabel('Number of Sales')
plt.title('Employee Sales Data Over Time')

# Set custom y-axis ticks with greater spacing for lower numbers and compressed ranges without data
custom_ticks = [0, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100] #+ list(range(200, 4001, 1000))
plt.gca().set_yticks(custom_ticks)

# Show the plot
plt.show()

Share Improve this question asked 2 hours ago Ryan BarnesRyan Barnes 1071 silver badge11 bronze badges 1
  • 1 A broken y-axis would remove part of the y range. (If log scale fits your use case, that would be an easier option.) – JohanC Commented 1 hour ago
Add a comment  | 

1 Answer 1

Reset to default 0

What about a log scale on Y ?

plt.yscale('log')

本文标签: pandasPythonSeaborne line chart extremes and normal values legibleStack Overflow