Daniel Linares
2016
Satellite event: SEPEX tutorial
Fit psychometric functions in R (mostly with quickpsy).
Learn some basic modern R (Hadleyverse).
mydat <- 5 # Many languages use "="
mydat # you can also visualize in environment
[1] 5
mydat2 <- c(5, 4, 7) # Beware the "c"
mydat2
[1] 5 4 7
mean(mydat2) # or mean( x = mydat2)
[1] 5.333333
?mean
Autocompletion (for example, use tab after the bracket of a function to display arguments).
If the package has a github associated repository and you want to install the lastest (in development) version:
library(devtools)
install_github('hadley/ggplot2')
Very often we will not use R base functions.
Instead, we will use functions from packages from Hadley Wickham (and other Rstudio members).
With these packages, it is very easy and intuitive to tidy, transform and visualize data.
R for Data Science: http://r4ds.had.co.nz/index.html
Advanced R: http://adv-r.had.co.nz/
R packages: http://r-pkgs.had.co.nz/
It allows you to have what it is called tidy data: one column for each variable and one row for each observation
mydataframe <- data.frame(participant = c('a', 'a','b','b'), rt = c(.32, .41, .53, .37))
To access one variable (one column)
mydataframe$participant
[1] a a b b
Levels: a b
mylist <- list(mynumber = mydat, df = mydataframe)
mylist
$mynumber
[1] 5
$df
participant rt
1 a 0.32
2 a 0.41
3 b 0.53
4 b 0.37
mylist$mynumber # to access a component
Trial 1: intensity = 0.8; response = Yes
Trial 2: intensity = 0.4; response = No
Trial 3: intensity = 1.0; response = Yes
.
.
dat <- read.table('data/detectdat.txt', header = TRUE)
head(dat,4)
i resp
1 0.8 1
2 0.4 0
3 1.0 1
4 0.4 1
library(dplyr)
datByContrast <- group_by(dat, i)
averages <- summarise(datByContrast, nYes = sum(resp), nNo = 100 - nYes, p = nYes / 100)
averages
# A tibble: 6 x 4
i nYes nNo p
<dbl> <int> <dbl> <dbl>
1 0.0 5 95 0.05
2 0.2 10 90 0.10
3 0.4 38 62 0.38
4 0.6 77 23 0.77
5 0.8 94 6 0.94
6 1.0 97 3 0.97
datByContrast <- group_by(dat, i)
averages <- summarise(datByContrast, n = n(), nYes = sum(resp), nNo = n - nYes, p = nYes / n)
averages
# A tibble: 6 x 5
i n nYes nNo p
<dbl> <int> <int> <int> <dbl>
1 0.0 100 5 95 0.05
2 0.2 100 10 90 0.10
3 0.4 100 38 62 0.38
4 0.6 100 77 23 0.77
5 0.8 100 94 6 0.94
6 1.0 100 97 3 0.97
mydat2 %>% mean() # it is the same that mean(mydat2)
[1] 5.333333
averages <- dat %>% group_by(i) %>%
summarise(n = n(), nYes = sum(resp), nNo = n - nYes, p = nYes / n)
library(ggplot2)
pData <- ggplot(averages, aes(x = i, y = p)) +
geom_point()
pData
It turns out that this sigmoidal shape often occurs in classification tasks.
Green and Swets (1966). Signal detection theory and psychophysics.
Gescheider (1998). Psychophysics: the fundamentals.
Prins and Kingdom (2016). Psychophysics: a practical introduction.
Knoblauch and Maloney (2012). Modeling psychophysical data in R.
library(quickpsy)
fit <- quickpsy(averages, i, nYes, n)
By default the shape of the psychometric function is the cumulative normal distribution.
quickpsy fits the psychometric function using Maximum Likelihood Estimation (MLE).
Linares and López-Moliner (in press) quickpsy: An R package to fit psychometric functions for multiple groups. The R Journal.
http://www.dlinares.org/statistics.html
Prins and Kingdom (2016). Psychophysics: a practical introduction.
Knoblauch and Maloney (2012). Modeling psychophysical data in R.
The output of quickpsy a list of components (many dataframes).
The component curves contains the psychometric function
fit$curves %>% head(4)
x y
1 0.000000000 0.02922206
2 0.003344482 0.03015473
3 0.006688963 0.03111194
4 0.010033445 0.03209416
pDataWithPsy <- pData +
geom_line(data= fit$curves, aes(x = x, y = y))
pDataWithPsy
dat <- read.table('data/detectdat.txt', header = TRUE)
fit <- quickpsy(dat, i, resp)
plot(fit)
Knoblauch and Maloney (2012)
model <- glm(cbind(nYes, nNo) ~ i, data= averages, family = binomial(probit))
xseq <- seq(0, 1, .01)
yseq <- predict(model, data.frame(i = xseq), type = 'response')
curve <- data.frame(xseq, yseq)
Also uses MLE.
ggplot() +
geom_point(data=averages, aes(x = i, y = p)) +
geom_line(data = curve,aes(x = xseq, y = yseq))
library(MPDiR) # contains the Vernier data
fitV <- quickpsy(Vernier, Phaseshift, NumUpward, N, grouping = .(Direction, WaveForm, TempFreq))
plot(fitV)
Easily fit or fix asymptotes.
Obtain bootstrap confidence intervals.
Compare parameters and thresholds using bootstrap.
The advantage of glm is that modern statisticals methods can be used to fit a global model to all conditions.
Knoblauch and Maloney (2012). Modeling psychophysical data in R. Moscatelli et al. (2012). Modeling psychophysical data at the population-level: the generalized linear mixed model. JOV.
A part from curves, quickpsy returns other components
?quickpsy
A part from curves, there are other output dataframes (strickly tibbles)
fit$averages
# A tibble: 6 x 4
i n resp prob
<dbl> <int> <dbl> <dbl>
1 0.0 100 5 0.05
2 0.2 100 10 0.10
3 0.4 100 38 0.38
4 0.6 100 77 0.77
5 0.8 100 94 0.94
6 1.0 100 97 0.97
p <- ggplot()+
geom_point(data = fit$averages, aes(x = i, y = prob)) +
geom_line(data = fit$curves, aes(x = x, y = y))
p
fit$thresholds
thre prob threinf thresup
1 0.4577091 0.5 0.4224058 0.4867858
fitV$thresholds
p <- p +
geom_segment(data = fit$thresholds, aes(x = thre, y = 0, xend = thre, yend = prob))
p
plotthresholds(fit)
plotthresholds(fitV)
fit$par
parn par parinf parsup
1 p1 0.4577091 0.4224058 0.4867858
2 p2 0.2418732 0.2117044 0.2695010
plotpar(fit)
plotpar(fitV)
fit$avbootstrap %>% head(9)
Source: local data frame [9 x 5]
Groups: sample [2]
sample i n resp prob
<int> <dbl> <int> <int> <dbl>
1 1 0.0 100 2 0.02
2 1 0.2 100 15 0.15
3 1 0.4 100 50 0.50
4 1 0.6 100 72 0.72
5 1 0.8 100 94 0.94
6 1 1.0 100 97 0.97
7 2 0.0 100 2 0.02
8 2 0.2 100 14 0.14
9 2 0.4 100 45 0.45
pboot <- p +
geom_point(data = fit$avbootstrap, aes(x = i, y = prob, color = factor(sample)))
pboot
pboot4 <- p +
geom_point(data = fit$avbootstrap %>% filter(sample <= 4), aes(x = i, y = prob, color = factor(sample)))
pboot4
pboot <- p +
geom_point(data = fit$avbootstrap, aes(x = i, y = prob,
color = factor(sample))) + guides(color = FALSE)
pboot
fit$curvesbootstrap %>% head(9)
Source: local data frame [9 x 3]
Groups: sample [1]
sample x y
<int> <dbl> <dbl>
1 1 0.000000000 0.03743167
2 1 0.003344482 0.03854383
3 1 0.006688963 0.03968277
4 1 0.010033445 0.04084893
5 1 0.013377926 0.04204273
6 1 0.016722408 0.04326463
7 1 0.020066890 0.04451504
8 1 0.023411371 0.04579440
9 1 0.026755853 0.04710315
pcurves <- pboot4 +
geom_line(data = fit$curvesbootstrap %>% filter(sample <= 4), aes(x = x, y = y,
color = factor(sample)))
pcurves
fit$thresholdsbootstrap %>% head(9)
Source: local data frame [9 x 3]
Groups: sample [9]
sample thre prob
<int> <dbl> <dbl>
1 1 0.4425859 0.5
2 2 0.4537820 0.5
3 3 0.4545385 0.5
4 4 0.4357098 0.5
5 5 0.4514053 0.5
6 6 0.4736884 0.5
7 7 0.4815944 0.5
8 8 0.4720539 0.5
9 9 0.4484391 0.5
pthre <- pcurves +
geom_segment(data = fit$thresholdsbootstrap %>% filter(sample <=4), aes(x = thre, y = 0, xend = thre, yend = prob, color = factor(sample)))
pthre
ggplot(fit$thresholdsbootstrap, aes(x = thre)) +geom_histogram()
This is the distribution to generate the confidence intervals.
quantile(fit$thresholdsbootstrap$thre, .025)
2.5%
0.4224058
quantile(fit$thresholdsbootstrap$thre, .975)
97.5%
0.4867858
quantile(fit$thresholdsbootstrap$thre, c(.025,.975))
2.5% 97.5%
0.4224058 0.4867858
We verify it
fit$thresholds
thre prob threinf thresup
1 0.4577091 0.5 0.4224058 0.4867858
The same is done to obtain the ci for the parameters.
?quickpsy
fit <- quickpsy(dat, i, resp, parini = c(.5,.5))
plot(fit)
fit <- quickpsy(dat, i, resp, parini = list(c(0.1, .3), c(0, 10)))
plot(fit) # poor choice of boundaries
fit <- quickpsy(dat, i, resp, fun = logistic_fun)
plot(fit)
fit <- quickpsy(dat, i, resp, fun = weibull_fun)
plot(fit)
gompertz_fun <- function(x, p) exp(-p[1] * exp(-p[2] * x))
gompertz_fun(.2, c(1,2))
[1] 0.5115448
gompertz_fun(.6, c(1,1))
[1] 0.5776358
fit <- quickpsy(dat, i, resp, fun = gompertz_fun, parini = c(1, 1))
plot(fit)
fit <- quickpsy(dat, i, resp, fun = gompertz_fun, parini = c(1, 1)) #initial parameters are necessary
plot(fit)
fit <- quickpsy(dat, i, resp, prob=.75)
plot(fit)
fit <- quickpsy(dat %>% filter(i != 0), i, resp, log = TRUE)
plot(fit)
fit <- quickpsy(dat, i, resp, , guess = .1)
plot(fit)
Lapses can also be fixed
fit <- quickpsy(dat, i, resp, lapses = TRUE)
plot(fit)
Guesses can also be variable
We verify that lapses are small
fit$par
parn par parinf parsup
1 p1 0.44260784 0.40047926 0.48574079
2 p2 0.22219988 0.18528446 0.25721994
3 p3 0.01929753 -0.01057085 0.06004908
Lapses (and guesses) are coded as the third parameter. In case, both lapses and guesses are fitted, they are coded as the third and forth parameters.
By default bootstrap is parametric: the number of 'Yes' is obtained from a random sample from the binomial distribution with p given by the psychometric function and n by the number of trials.
To do non-parametric bootstrap
fit <- quickpsy(dat, i, resp, bootstrap = 'nonparametric', B = 10)
where B is the number of sample. By default is 100 which is quite small, typically B is 1000 or 10000.
averages <- averages %>% mutate(condition = 'a')
averages2 <- averages %>% mutate(nYes= c(1, 4, 25, 73, 80, 85), condition = 'b')
averages2
# A tibble: 6 x 6
i n nYes nNo p condition
<dbl> <int> <dbl> <int> <dbl> <chr>
1 0.0 100 1 95 0.05 b
2 0.2 100 4 90 0.10 b
3 0.4 100 25 62 0.38 b
4 0.6 100 73 23 0.77 b
5 0.8 100 80 6 0.94 b
6 1.0 100 85 3 0.97 b
averages <- averages %>% mutate(condition = 'a')
averagesAll <- rbind(averages, averages2)
averagesAll
# A tibble: 12 x 6
i n nYes nNo p condition
* <dbl> <int> <dbl> <int> <dbl> <chr>
1 0.0 100 5 95 0.05 a
2 0.2 100 10 90 0.10 a
3 0.4 100 38 62 0.38 a
4 0.6 100 77 23 0.77 a
5 0.8 100 94 6 0.94 a
6 1.0 100 97 3 0.97 a
7 0.0 100 1 95 0.05 b
8 0.2 100 4 90 0.10 b
9 0.4 100 25 62 0.38 b
10 0.6 100 73 23 0.77 b
11 0.8 100 80 6 0.94 b
12 1.0 100 85 3 0.97 b
fitAll <- quickpsy(averagesAll, i, nYes, n, grouping =.(condition))
library(cowplot)
plot_grid(plot(fitAll), plotthresholds(fitAll), labels = c('A', 'B'))
fitAll$thresholdcomparisons
condition condition2 dif difinf difsup signif
1 a b -0.1173913 -0.1697 -0.07518993 *
fitV$thresholdcomparisons
fitV$thresholdcomparisons %>% filter(WaveForm == WaveForm2)
Wichmann and Hill (2001). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception and Psychophysics, 63(8), 1293–1313.
Q <- seq(0, 420, 60)
n <- 20
k <- c(0, 0, 4, 18, 20, 20, 19, 20) # lapse in the second to last intensity
datLapse <- data.frame(Q, k, n)
withoutlapses <- quickpsy(datLapse, Q, k, n, prob = .75) %>% plot() + ggtitle('Without lapses')
withlapses <- quickpsy(datLapse, Q, k, n, prob = .75, lapses = TRUE) %>% plot() + ggtitle('Wit lapses')
plot_grid(withoutlapses, withlapses)
But allowing lapses not always fix the problem:
Prins, N. (2012). The psychometric function: the lapse rate revisited., 12(6), 25–25.
Linares and López-Moliner (in press) quickpsy: An R package to fit psychometric functions for multiple groups. The R Journal.