admin管理员组

文章数量:1122846

I'm outside my domain expertise and even though I've created an 'answer' (with lots of googling and ChatGPT) I need someone to let me know if I'm doing what I intended to do.

I'm following a tutorial on Creating an Expected Points Model Inspired by Pythagorean Expectation. He uses a solver from Excel to find the optimal exponent coefficient c by looking for the lowest mse. It is my understanding that optimize() does a similar job.

I have come up with the following function but don't understand how the values are getting passed to optimize() and why it looks like the function produces three values predicted_xPts_perc, se, and mean(se). Shouldn't it provide two?

# Objective function
objective_function <- function(c) {
  predicted_xPts_perc <- data$xg_for_per_match^c / (data$xg_for_per_match^c + data$xg_against_per_match^c)
  se <- (predicted_xPts_perc - data$actual_points_perc)^2
  mean(se)
}

# Optimize the objective
result <- optimize(f = objective_function, interval = c(0, 5), lower = 1, upper = 3)
optimal_c <- result$minimum
print(optimal_c)

Here is the data (sample) and full code block:

data <- structure(list(xg_for_per_match = c(2.431, 2.256, 2.399, 1.86777777777778, 
                                    2.56, 1.776, 1.59222222222222, 1.665, 0.907, 1.691, 1.52555555555556, 
                                    0.56, 1.46555555555556, 0.754, 1.955, 1.285, 1.38428571428571, 
                                    1.282, 1.86625, 1.5075), xg_against_per_match = c(0.757, 0.682, 
                                                                                      0.739, 1.40555555555556, 0.849, 1.409, 1.17777777777778, 1.822, 
                                                                                      1.938, 2.489, 1.70666666666667, 3.136, 2.37555555555556, 2.985, 
                                                                                      0.78375, 1.56, 1.43, 0.812, 0.945, 0.94875), actual_points_perc = c(0.866666666666667, 
                                                                                                                                                          0.766666666666667, 0.733333333333333, 0.666666666666667, 0.6, 
                                                                                                                                                          0.6, 0.592592592592593, 0.566666666666667, 0.4, 0.4, 0.37037037037037, 
                                                                                                                                                          0.2, 0.111111111111111, 0, 0.791666666666667, 0.625, 0.619047619047619, 
                                                                                                                                                          0.6, 0.541666666666667, 0.416666666666667), se = c(0.0020194397996264, 
                                                                                                                                                                                                             0.0223794165299103, 0.0323996328399904, 0.000796304748754376, 
                                                                                                                                                                                                             0.0905484326863811, 0.00018817673840017, 0.00288909832711458, 
                                                                                                                                                                                                             0.0124545498241967, 0.0485423096732081, 0.0070889390780061, 0.0054423086667482, 
                                                                                                                                                                                                             0.0285940157162698, 0.027082829359042, 0.00359736139900119, 0.00488179512531288, 
                                                                                                                                                                                                             0.048737642866935, 0.0183025693952632, 0.0129244422758347, 0.0646460987510766, 
                                                                                                                                                                                                             0.0897732500472823)), row.names = c(NA, -20L), class = c("tbl_df", 
                                                                                                                                                                                                                                                                      "tbl", "data.frame"))

# Objective function
objective_function <- function(c) {
  predicted_xPts_perc <- data$xg_for_per_match^c / (data$xg_for_per_match^c + data$xg_against_per_match^c)
  se <- (predicted_xPts_perc - data$actual_points_perc)^2
  mean(se)
}

# Optimize the objective
result <- optimize(f = objective_function, interval = c(0, 5), lower = 1, upper = 3)
optimal_c <- result$minimum
print(optimal_c)

I'm outside my domain expertise and even though I've created an 'answer' (with lots of googling and ChatGPT) I need someone to let me know if I'm doing what I intended to do.

I'm following a tutorial on Creating an Expected Points Model Inspired by Pythagorean Expectation. He uses a solver from Excel to find the optimal exponent coefficient c by looking for the lowest mse. It is my understanding that optimize() does a similar job.

I have come up with the following function but don't understand how the values are getting passed to optimize() and why it looks like the function produces three values predicted_xPts_perc, se, and mean(se). Shouldn't it provide two?

# Objective function
objective_function <- function(c) {
  predicted_xPts_perc <- data$xg_for_per_match^c / (data$xg_for_per_match^c + data$xg_against_per_match^c)
  se <- (predicted_xPts_perc - data$actual_points_perc)^2
  mean(se)
}

# Optimize the objective
result <- optimize(f = objective_function, interval = c(0, 5), lower = 1, upper = 3)
optimal_c <- result$minimum
print(optimal_c)

Here is the data (sample) and full code block:

data <- structure(list(xg_for_per_match = c(2.431, 2.256, 2.399, 1.86777777777778, 
                                    2.56, 1.776, 1.59222222222222, 1.665, 0.907, 1.691, 1.52555555555556, 
                                    0.56, 1.46555555555556, 0.754, 1.955, 1.285, 1.38428571428571, 
                                    1.282, 1.86625, 1.5075), xg_against_per_match = c(0.757, 0.682, 
                                                                                      0.739, 1.40555555555556, 0.849, 1.409, 1.17777777777778, 1.822, 
                                                                                      1.938, 2.489, 1.70666666666667, 3.136, 2.37555555555556, 2.985, 
                                                                                      0.78375, 1.56, 1.43, 0.812, 0.945, 0.94875), actual_points_perc = c(0.866666666666667, 
                                                                                                                                                          0.766666666666667, 0.733333333333333, 0.666666666666667, 0.6, 
                                                                                                                                                          0.6, 0.592592592592593, 0.566666666666667, 0.4, 0.4, 0.37037037037037, 
                                                                                                                                                          0.2, 0.111111111111111, 0, 0.791666666666667, 0.625, 0.619047619047619, 
                                                                                                                                                          0.6, 0.541666666666667, 0.416666666666667), se = c(0.0020194397996264, 
                                                                                                                                                                                                             0.0223794165299103, 0.0323996328399904, 0.000796304748754376, 
                                                                                                                                                                                                             0.0905484326863811, 0.00018817673840017, 0.00288909832711458, 
                                                                                                                                                                                                             0.0124545498241967, 0.0485423096732081, 0.0070889390780061, 0.0054423086667482, 
                                                                                                                                                                                                             0.0285940157162698, 0.027082829359042, 0.00359736139900119, 0.00488179512531288, 
                                                                                                                                                                                                             0.048737642866935, 0.0183025693952632, 0.0129244422758347, 0.0646460987510766, 
                                                                                                                                                                                                             0.0897732500472823)), row.names = c(NA, -20L), class = c("tbl_df", 
                                                                                                                                                                                                                                                                      "tbl", "data.frame"))

# Objective function
objective_function <- function(c) {
  predicted_xPts_perc <- data$xg_for_per_match^c / (data$xg_for_per_match^c + data$xg_against_per_match^c)
  se <- (predicted_xPts_perc - data$actual_points_perc)^2
  mean(se)
}

# Optimize the objective
result <- optimize(f = objective_function, interval = c(0, 5), lower = 1, upper = 3)
optimal_c <- result$minimum
print(optimal_c)
Share Improve this question asked Nov 22, 2024 at 1:50 seansteeleseansteele 6813 silver badges14 bronze badges 1
  • 1 everything you did is correct – Onyambu Commented Nov 22, 2024 at 2:23
Add a comment  | 

1 Answer 1

Reset to default 1

That does work, but R has the nls() function for performing non-linear least square fits.

model <- nls(actual_points_perc ~ xg_for_per_match^c / (xg_for_per_match^c + xg_against_per_match^c), data=data, start=list(c=1))

summary(model)
# Formula: actual_points_perc ~ xg_for_per_match^c/(xg_for_per_match^c + 
#                                                      xg_against_per_match^c)
# 
# Parameters:
#    Estimate Std. Error t value Pr(>|t|)    
# c   1.0249     0.1935   5.297 4.11e-05 ***
#    ---
#    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.1242 on 19 degrees of freedom
# 
# Number of iterations to convergence: 2 
# Achieved convergence tolerance: 9.647e-06

本文标签: optimizationUsing optimize() in R for the first timeam I doing this rightStack Overflow