admin管理员组文章数量:1406453
I’m performing linear regression in Python with statsmodels. I have two categorical predictors:
- sample: a factor with 8 levels
- distractor: a factor with 2 levels
My goal is to determine the “absolute” beta (effect) for each level of each variable. When I fit the model with an intercept using treatment (dummy) coding (the default), statsmodels reports coefficients as differences relative to the reference (baseline) level. For example, consider the following output:
Intercept 5.076e-04 C(sample)[T.1] -2.333e-18 C(sample)[T.2] -1.558e-18 C(sample)[T.3] -7.167e-19 C(sample)[T.4] -1.402e-18 C(sample)[T.5] 7.694e-04 C(sample)[T.6] 5.478e-19 C(sample)[T.7] 4.516e-03 C(distractor)[T.9] -1.015e-03
Here, the intercept represents the predicted response when sample is at its reference level (level 0) and distractor is at its reference level (level 8). The coefficient for C(distractor)[T.9] is then the difference from distractor level 8. That means that the “absolute” beta for distractor level 8 is just the intercept, and for distractor level 9 it is (Intercept + -1.015e-03).
My confusion is:
- Is it expected that the reference level for both factors is represented solely by the intercept (i.e. that the first level of all variables always has the same beta value)?
- How do I extract a full set of betas (i.e. 8 for sample and 2 for distractor) from themodel?
I tried removing the intercept (using - 1 in the formula), but then statsmodels still dropped one dummy variable for distractor due to collinearity (even though distractor clearly has two levels when modeled alone, as shown by fitting response ~ C(distractor) - 1 which returns two coefficients). The two factors are independent.
What is the proper way to obtain “absolute” beta values for all levels? Is it correct to compute them by adding the intercept to the reported contrasts (using zero for the reference level)? If so, is there any cleaner method in statsmodels to directly return a parameter for each level?
Example dummy code:
import pandas as pd
import statsmodels.formula.api as smf
# Create dummy data
data = pd.DataFrame({
'response': [0.51, 0.52, 0.53, 0.54, 0.60, 0.61, 0.62, 0.63, 0.55, 0.56],
'sample': ['0', '1', '2', '3', '4', '5', '6', '7', '0', '1'], # 8 levels (as strings)
'distractor': ['8', '8', '8', '8', '9', '9', '9', '9', '8', '9'] # 2 levels
})
# Model with intercept (default treatment coding)
model_with_int = smf.ols('response ~ C(sample) + C(distractor)', data=data).fit()
print("Model with intercept:")
print(model_with_int.params)
# Expected output example:
# Intercept 0.000508 (this is the effect at sample=0, distractor=8)
# C(sample)[T.1] (difference between sample 1 and sample 0)
# ...
# C(distractor)[T.9] (difference between distractor 9 and distractor 8)
# To get the "absolute" beta for each level:
# For sample:
# Level 0 beta = Intercept
# Level 1 beta = Intercept + C(sample)[T.1]
# ... and so on.
# For distractor:
# Level 8 beta = Intercept
# Level 9 beta = Intercept + C(distractor)[T.9]
print("\nAbsolute beta values:")
abs_beta_sample = {}
abs_beta_distractor = {}
intercept = model_with_int.params['Intercept']
# For sample, assume reference level is '0'
abs_beta_sample['0'] = intercept
for lvl in ['1', '2', '3', '4', '5', '6', '7']:
coef_name = f"C(sample)[T.{lvl}]"
abs_beta_sample[lvl] = intercept + model_with_int.params.get(coef_name, 0)
# For distractor, assume reference level is '8'
abs_beta_distractor['8'] = intercept
abs_beta_distractor['9'] = intercept + model_with_int.params.get("C(distractor)[T.9]", 0)
print("Sample beta values:", abs_beta_sample)
print("Distractor beta values:", abs_beta_distractor)
I would appreciate any guidance on whether this is the correct approach or if there’s a better way to directly obtain the full set of betas from the model.
本文标签:
版权声明:本文标题:python - How can I extract absolute beta coefficients for all levels of multiple categorical variables in statsmodels? - Stack O 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1745012313a2637625.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论