admin管理员组文章数量:1124017
I'm taking an online class, and there appears to be a glitch in the coursework, which seems to derive from different versions of Pandas. The code the course is providing does not run. However, the course provides a type of patch or update to jog in a repair - but - that repair/updated code patch doesn't appear to be necessary in one instance, and it malfunctions in another part.
We want to group by planet type and then subdivide further on the basis of whether the planet has a magnetic ring. So six groups in theory - but only four actual existing groups arise from the data. And once we have these groups, we want to perform some sum and agg operations.
The code patch was delivered immediately prior, where we are messaged that to avoid an error, we need to tweak a parameter for the sum() function, where we need to insert: (numeric_only=True). Although my current IDE doesn't throw an error without this tweak, instead it just concats the non-numerics.
But the real problem is where we are asked to run the code at the bottom of the block, with the agg() function. I think the problem derives from the fact that I'm seeking to perform mathematical operations on non-numeric data - specifically, the column for "rings" is a bool type. But while I have been able to adjust the parameters for the mean and max functions individually (so that they are only assessing numeric columns) but I can't make this adjustment on the agg() function because it does not have this parameter. And without being able to make this adjustment for numeric only on the agg(), the coursework itself produces and error.
And if I pursue my own fix as outlined above, and seperate the mean() and max() operations and perform them individually - I can apparently tweak this parameter for "numeric_only=True" for each:
print(planets.groupby(['type', 'magnetic_field']).max(numeric_only=True))
print(planets.groupby(['type', 'magnetic_field']).mean(numeric_only=True))
This does produce all the correct data, albeit less efficiently - Bnd shouldn't these two functions have the same parameters as agg() since they are here part of pandas aggregate functions?
And aside from that question, there is the issue of reproducing the coursework results and just getting it right - I want all the data on the same output dataframe. Ultimately, if I separate these functions and adjust the parameters individually, then I can collect the data correctly - but much less efficiently. And the course work wants all the output in the same printout. Any ideas what I'm missing on this syntax to get do in one execution? THANKS!!
import numpy as np
import pandas as pd
data = {'planet': ['Mercury', 'Venus', 'Earth', 'Mars',
'Jupiter', 'Saturn', 'Uranus', 'Neptune'],
'radius_km': [2440, 6052, 6371, 3390, 69911, 58232,
25362, 24622],
'moons': [0, 0, 1, 2, 80, 83, 27, 14],
'type': ['terrestrial', 'terrestrial', 'terrestrial', 'terrestrial',
'gas giant', 'gas giant', 'ice giant', 'ice giant'],
'rings': ['no', 'no', 'no', 'no', 'yes', 'yes', 'yes','yes'],
'mean_temp_c': [167, 464, 15, -65, -110, -140, -195, -200],
'magnetic_field': ['yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'yes'] }
planets = pd.DataFrame(data)
P = planets.groupby(['type', 'magnetic_field']).agg(['mean', 'max'])
print(P)
I'm taking an online class, and there appears to be a glitch in the coursework, which seems to derive from different versions of Pandas. The code the course is providing does not run. However, the course provides a type of patch or update to jog in a repair - but - that repair/updated code patch doesn't appear to be necessary in one instance, and it malfunctions in another part.
We want to group by planet type and then subdivide further on the basis of whether the planet has a magnetic ring. So six groups in theory - but only four actual existing groups arise from the data. And once we have these groups, we want to perform some sum and agg operations.
The code patch was delivered immediately prior, where we are messaged that to avoid an error, we need to tweak a parameter for the sum() function, where we need to insert: (numeric_only=True). Although my current IDE doesn't throw an error without this tweak, instead it just concats the non-numerics.
But the real problem is where we are asked to run the code at the bottom of the block, with the agg() function. I think the problem derives from the fact that I'm seeking to perform mathematical operations on non-numeric data - specifically, the column for "rings" is a bool type. But while I have been able to adjust the parameters for the mean and max functions individually (so that they are only assessing numeric columns) but I can't make this adjustment on the agg() function because it does not have this parameter. And without being able to make this adjustment for numeric only on the agg(), the coursework itself produces and error.
And if I pursue my own fix as outlined above, and seperate the mean() and max() operations and perform them individually - I can apparently tweak this parameter for "numeric_only=True" for each:
print(planets.groupby(['type', 'magnetic_field']).max(numeric_only=True))
print(planets.groupby(['type', 'magnetic_field']).mean(numeric_only=True))
This does produce all the correct data, albeit less efficiently - Bnd shouldn't these two functions have the same parameters as agg() since they are here part of pandas aggregate functions?
And aside from that question, there is the issue of reproducing the coursework results and just getting it right - I want all the data on the same output dataframe. Ultimately, if I separate these functions and adjust the parameters individually, then I can collect the data correctly - but much less efficiently. And the course work wants all the output in the same printout. Any ideas what I'm missing on this syntax to get do in one execution? THANKS!!
import numpy as np
import pandas as pd
data = {'planet': ['Mercury', 'Venus', 'Earth', 'Mars',
'Jupiter', 'Saturn', 'Uranus', 'Neptune'],
'radius_km': [2440, 6052, 6371, 3390, 69911, 58232,
25362, 24622],
'moons': [0, 0, 1, 2, 80, 83, 27, 14],
'type': ['terrestrial', 'terrestrial', 'terrestrial', 'terrestrial',
'gas giant', 'gas giant', 'ice giant', 'ice giant'],
'rings': ['no', 'no', 'no', 'no', 'yes', 'yes', 'yes','yes'],
'mean_temp_c': [167, 464, 15, -65, -110, -140, -195, -200],
'magnetic_field': ['yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'yes'] }
planets = pd.DataFrame(data)
P = planets.groupby(['type', 'magnetic_field']).agg(['mean', 'max'])
print(P)
Share
Improve this question
edited yesterday
BigBen
50k7 gold badges27 silver badges44 bronze badges
asked yesterday
PleaseBeNicePleaseBeNice
294 bronze badges
2
- You should check the data types in your DataFrame. The rings column seems to contain "yes"/"no" strings, which may be causing issues during aggregation. If you convert it to bool (True/False), does that help? – bsraskr Commented yesterday
- what is your expected output? – iBeMeltin Commented yesterday
3 Answers
Reset to default 3Not sure what is the exact expected output, but you can convert the yes/no to booleans, select the desired dtypes before aggregation:
planets = (planets
.replace({'yes': True, 'no': False})
.convert_dtypes()
)
cols = planets.select_dtypes(['number', 'boolean']).columns
P = (planets.groupby(['type', 'magnetic_field'])[cols]
.agg(['mean', 'max'])
)
Output:
radius_km moons rings mean_temp_c magnetic_field
mean max mean max mean max mean max mean max
type magnetic_field
gas giant True 64071.5 69911 81.5 83 1.0 True -125.0 -110 1.0 True
ice giant True 24992.0 25362 20.5 27 1.0 True -197.5 -195 1.0 True
terrestrial False 4721.0 6052 1.0 2 0.0 False 199.5 464 0.0 False
True 4405.5 6371 0.5 1 0.0 False 91.0 167 1.0 True
You can give specific aggregations for each column, and just leave out the non-numeric columns.
planets.groupby(['type', 'magnetic_field']).agg(
mean_radius=('radius_km', 'mean'), max_radius=('radius_km', 'max'),
mean_moons=('moons', 'mean'), max_moons=('moons', 'max'),
mean_temp=('mean_temp_c', 'mean'), max_temp=('mean_temp_c', 'max')
)
mean_radius max_radius mean_moons max_moons mean_temp max_temp
type magnetic_field
gas giant yes 64071.5 69911 81.5 83 -125.0 -110
ice giant yes 24992.0 25362 20.5 27 -197.5 -195
terrestrial no 4721.0 6052 1.0 2 199.5 464
yes 4405.5 6371 0.5 1 91.0 167
You can select the numeric columns with .select_dtypes('number') [and .select_dtypes('object') for the categorical ones]:
numeric_cols = planets.select_dtypes('number').columns
P = planets.groupby(['type', 'magnetic_field'])[numeric_cols].agg(['mean', 'max'])
display(P)
本文标签: pythonMissing some part of the groupby() syntaxStack Overflow
版权声明:本文标题:python - Missing some part of the groupby() syntax - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736610752a1945436.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论