admin管理员组

文章数量:1357712

I am investigating profiles of downwelling light within the ocean, and attempting the 'dark signal identification' process in .1175/JTECH-D-15-0193.1.

I have data that is formatted like this (super simplified - most profiles have around 200-300 observations):

Profile Depth light
1 0.2 2
1 1 3
2 0.1 2
2 1 3

I am investigating profiles of downwelling light within the ocean, and attempting the 'dark signal identification' process in https://doi./10.1175/JTECH-D-15-0193.1.

I have data that is formatted like this (super simplified - most profiles have around 200-300 observations):

Profile Depth light
1 0.2 2
1 1 3
2 0.1 2
2 1 3

essentially for each individual profile, I need to test the light values for normal distribution with a Lillifors test.

if it is not normal, the shallowest point (according to depth) is removed and the test repeated, until the light data tests positive for normal distribution. These remaining points would then be labeled 'dark'.

I would like to have an output like this:

Profile Depth light signal
1 0.2 2 light
1 1 3 dark
2 0.1 2 light
2 1 3 dark

I have tried

library(nortest)
library(dplyr)

# test dataset
profile = c(1)
depth = seq(1,250,1)
PAR = seq(10,1,along.with = depth)
PAR <- log(PAR) # to try and mimic exponential change in light with depth

p1 <- data.frame(profile = profile,
                   depth = depth,
                   PAR = PAR)

profile = c(2)
p2 <- data.frame(profile = profile,
                 depth = depth,
                 PAR = PAR)

data <- rbind(p1,p2)

p1[ , 'signal'] = NA

on single 'profile': 1 to see if I can get the basics right.

for (i in p1) {
  ltest <- lillie.test(p1$PAR)
  if (ltest$p.value < 0.01) {
    p1 <- mutate(p1$signal == "light")
    p1 <- filter(p1, depth != min(depth))
  } else if (ltest$p.value > 0.01) {
    p1 <- mutate(p1$signal == "dark" )
  }
}

I get an error saying:

"Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "logical"

Share Improve this question edited Mar 31 at 15:41 r2evans 162k7 gold badges86 silver badges168 bronze badges Recognized by R Language Collective asked Mar 31 at 15:15 amazing_zebraamazing_zebra 1 New contributor amazing_zebra is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 5
  • Your use of mutate(.) is not correct. Its first argument (whether literal or implicitly after a pipe) must be a frame, always, as in mutate(p1, ...) or p1 |> mutate(...). "Almost" never use $ in a pipe, the exceptions when it makes sense are few and far between; at best it is inefficient, next-best is a clear error, but it can easily produce incorrect/corrupted results. Further, the use of == in your expression suggests you might be intending to filter the data, as in filter(p1, signal == "light"), though perhaps you are intending to "assign" instead of test for value equality? – r2evans Commented Mar 31 at 15:38
  • I suggest you find some time to look at dplyr.tidyverse./articles/dplyr.html and other articles, including the discussion on Grouped data that is appropriate here with your profile column. – r2evans Commented Mar 31 at 16:32
  • 2 I'm not sure if this is a good paper, check: set.seed(28931);nortest::lillie.test(runif(250))$p.value. – jay.sf Commented Mar 31 at 17:05
  • @jay.sf I agree ... I took the real question here to be about how to do a reduce-until-pval issue regardless of the actual test being used, KS or otherwise. – r2evans Commented Mar 31 at 17:32
  • @jay.sf is the Lilliefors test unreliable? – amazing_zebra Commented Mar 31 at 18:32
Add a comment  | 

1 Answer 1

Reset to default 0

Your sample data (even per profile) is way too normal, I'll fake it in order to produce a failing p-value.

set.seed(42)
depth = seq(1,250,1)
PAR = log(seq(10, 1, along.with = depth))
p1 <- data.frame(profile = 1, depth = depth, PAR = PAR)
p2 <- data.frame(profile = 2, depth = depth, PAR = PAR)
p1$PAR[1:110] <- runif(110)
p2$PAR[1:150] <- runif(150)
data <- rbind(p1, p2)
nortest::lillie.test(p1$PAR)$p.value
# [1] 0.01083541
nortest::lillie.test(p2$PAR)$p.value
# [1] 0.05409578

Here's a helper function that, when given one profile's worth of data at a time, will return "light" and "dark" per your spec:

fun <- function(val, dep, cutoff = 0.01) {
  ord <- order(dep)
  val <- val[ord]
  good <- FALSE
  while (length(val) > 3) {
    if (nortest::lillie.test(val)$p.value < cutoff) {
      good <- TRUE
      break
    }
    val <- val[-1]
  }
  if (good) {
    out <- rep("light", length(dep))
    out[seq_len(length(dep) - length(val))] <- "dark"
  } else {
    out <- rep("dark", length(dep))
  }
  out[order(ord)] # yes, another order
}

This can be done using dplyr if desired:

data <- data |>
  mutate(.by = profile, signal = fun(PAR, depth))
head(data)
#   profile depth       PAR signal
# 1       1     1 0.9148060   dark
# 2       1     2 0.9370754   dark
# 3       1     3 0.2861395  light
# 4       1     4 0.8304476  light
# 5       1     5 0.6417455  light
# 6       1     6 0.5190959  light
filter(data, profile == 2) |>
  head()
#   profile depth       PAR signal
# 1       2     1 0.6089375   dark
# 2       2     2 0.8368016   dark
# 3       2     3 0.7515226   dark
# 4       2     4 0.4527316   dark
# 5       2     5 0.5357900   dark
# 6       2     6 0.5373767   dark
### profile==2 is light starting at depth==27 with this random data

A base R version could be

data$signal = ave(seq_len(nrow(data)), profile,
  FUN = function(i) fun(data$PAR[i], data$depth[i]))

where ave is being used to do the grouped calculations. Since ave only works on one "column"/vector at a time, I'm using an index seq_along(PAR) instead as a cheater to be able to process two (or more) columns per group of profile.

本文标签: