admin管理员组文章数量:1357712
I am investigating profiles of downwelling light within the ocean, and attempting the 'dark signal identification' process in .1175/JTECH-D-15-0193.1.
I have data that is formatted like this (super simplified - most profiles have around 200-300 observations):
Profile | Depth | light |
---|---|---|
1 | 0.2 | 2 |
1 | 1 | 3 |
2 | 0.1 | 2 |
2 | 1 | 3 |
I am investigating profiles of downwelling light within the ocean, and attempting the 'dark signal identification' process in https://doi./10.1175/JTECH-D-15-0193.1.
I have data that is formatted like this (super simplified - most profiles have around 200-300 observations):
Profile | Depth | light |
---|---|---|
1 | 0.2 | 2 |
1 | 1 | 3 |
2 | 0.1 | 2 |
2 | 1 | 3 |
essentially for each individual profile, I need to test the light values for normal distribution with a Lillifors test.
if it is not normal, the shallowest point (according to depth) is removed and the test repeated, until the light data tests positive for normal distribution. These remaining points would then be labeled 'dark'.
I would like to have an output like this:
Profile | Depth | light | signal |
---|---|---|---|
1 | 0.2 | 2 | light |
1 | 1 | 3 | dark |
2 | 0.1 | 2 | light |
2 | 1 | 3 | dark |
I have tried
library(nortest)
library(dplyr)
# test dataset
profile = c(1)
depth = seq(1,250,1)
PAR = seq(10,1,along.with = depth)
PAR <- log(PAR) # to try and mimic exponential change in light with depth
p1 <- data.frame(profile = profile,
depth = depth,
PAR = PAR)
profile = c(2)
p2 <- data.frame(profile = profile,
depth = depth,
PAR = PAR)
data <- rbind(p1,p2)
p1[ , 'signal'] = NA
on single 'profile': 1 to see if I can get the basics right.
for (i in p1) {
ltest <- lillie.test(p1$PAR)
if (ltest$p.value < 0.01) {
p1 <- mutate(p1$signal == "light")
p1 <- filter(p1, depth != min(depth))
} else if (ltest$p.value > 0.01) {
p1 <- mutate(p1$signal == "dark" )
}
}
I get an error saying:
"Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "logical"
Share Improve this question edited Mar 31 at 15:41 r2evans 162k7 gold badges86 silver badges168 bronze badges Recognized by R Language Collective asked Mar 31 at 15:15 amazing_zebraamazing_zebra 1 New contributor amazing_zebra is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 5 |1 Answer
Reset to default 0Your sample data
(even per profile) is way too normal, I'll fake it in order to produce a failing p-value.
set.seed(42)
depth = seq(1,250,1)
PAR = log(seq(10, 1, along.with = depth))
p1 <- data.frame(profile = 1, depth = depth, PAR = PAR)
p2 <- data.frame(profile = 2, depth = depth, PAR = PAR)
p1$PAR[1:110] <- runif(110)
p2$PAR[1:150] <- runif(150)
data <- rbind(p1, p2)
nortest::lillie.test(p1$PAR)$p.value
# [1] 0.01083541
nortest::lillie.test(p2$PAR)$p.value
# [1] 0.05409578
Here's a helper function that, when given one profile's worth of data at a time, will return "light"
and "dark"
per your spec:
fun <- function(val, dep, cutoff = 0.01) {
ord <- order(dep)
val <- val[ord]
good <- FALSE
while (length(val) > 3) {
if (nortest::lillie.test(val)$p.value < cutoff) {
good <- TRUE
break
}
val <- val[-1]
}
if (good) {
out <- rep("light", length(dep))
out[seq_len(length(dep) - length(val))] <- "dark"
} else {
out <- rep("dark", length(dep))
}
out[order(ord)] # yes, another order
}
This can be done using dplyr
if desired:
data <- data |>
mutate(.by = profile, signal = fun(PAR, depth))
head(data)
# profile depth PAR signal
# 1 1 1 0.9148060 dark
# 2 1 2 0.9370754 dark
# 3 1 3 0.2861395 light
# 4 1 4 0.8304476 light
# 5 1 5 0.6417455 light
# 6 1 6 0.5190959 light
filter(data, profile == 2) |>
head()
# profile depth PAR signal
# 1 2 1 0.6089375 dark
# 2 2 2 0.8368016 dark
# 3 2 3 0.7515226 dark
# 4 2 4 0.4527316 dark
# 5 2 5 0.5357900 dark
# 6 2 6 0.5373767 dark
### profile==2 is light starting at depth==27 with this random data
A base R version could be
data$signal = ave(seq_len(nrow(data)), profile,
FUN = function(i) fun(data$PAR[i], data$depth[i]))
where ave
is being used to do the grouped calculations. Since ave
only works on one "column"/vector at a time, I'm using an index seq_along(PAR)
instead as a cheater to be able to process two (or more) columns per group of profile
.
本文标签:
版权声明:本文标题:loops - in R, How to test a dataset, then remove a point, and repeat the process for grouped data? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743938642a2565151.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
mutate(.)
is not correct. Its first argument (whether literal or implicitly after a pipe) must be a frame, always, as inmutate(p1, ...)
orp1 |> mutate(...)
. "Almost" never use$
in a pipe, the exceptions when it makes sense are few and far between; at best it is inefficient, next-best is a clear error, but it can easily produce incorrect/corrupted results. Further, the use of==
in your expression suggests you might be intending to filter the data, as infilter(p1, signal == "light")
, though perhaps you are intending to "assign" instead of test for value equality? – r2evans Commented Mar 31 at 15:38profile
column. – r2evans Commented Mar 31 at 16:32set.seed(28931);nortest::lillie.test(runif(250))$p.value
. – jay.sf Commented Mar 31 at 17:05