Fitting psychometric functions in R

Daniel Linares
2016

Satellite event: SEPEX tutorial

Aim

Fit psychometric functions in R (mostly with quickpsy).

Learn some basic modern R (Hadleyverse).

What do you need?

How did I make this presentation?


You can also create simple html pages: http://www.dlinares.org/

The console

mydat <- 5 # Many languages use "="
mydat # you can also visualize in environment
[1] 5
mydat2 <- c(5, 4, 7) #  Beware the "c"
mydat2
[1] 5 4 7
mean(mydat2) # or mean( x = mydat2)
[1] 5.333333

R scripts

Help

?mean



Autocompletion (for example, use tab after the bracket of a function to display arguments).

Packages

If the package has a github associated repository and you want to install the lastest (in development) version:

library(devtools)
install_github('hadley/ggplot2') 

The Hadleyverse

Very often we will not use R base functions.

Instead, we will use functions from packages from Hadley Wickham (and other Rstudio members).

With these packages, it is very easy and intuitive to tidy, transform and visualize data.

R for Data Science: http://r4ds.had.co.nz/index.html

Advanced R: http://adv-r.had.co.nz/

R packages: http://r-pkgs.had.co.nz/

Basic data structure: dataframe

It allows you to have what it is called tidy data: one column for each variable and one row for each observation

mydataframe <- data.frame(participant = c('a', 'a','b','b'), rt = c(.32, .41, .53, .37))

To access one variable (one column)

mydataframe$participant 
[1] a a b b
Levels: a b

Basic data structure: list

mylist <- list(mynumber = mydat, df = mydataframe)
mylist
$mynumber
[1] 5

$df
  participant   rt
1           a 0.32
2           a 0.41
3           b 0.53
4           b 0.37
mylist$mynumber # to access a component 

Dectection experiment

Trial 1: intensity = 0.8; response = Yes

Trial 2: intensity = 0.4; response = No

Trial 3: intensity = 1.0; response = Yes

.

.

New project to analize detection data

Reading the data

dat <- read.table('data/detectdat.txt', header = TRUE)

head(dat,4)
    i resp
1 0.8    1
2 0.4    0
3 1.0    1
4 0.4    1

Summarising the data

library(dplyr)

datByContrast <- group_by(dat, i)

averages <- summarise(datByContrast, nYes = sum(resp), nNo = 100 - nYes, p = nYes / 100)
averages
# A tibble: 6 x 4
      i  nYes   nNo     p
  <dbl> <int> <dbl> <dbl>
1   0.0     5    95  0.05
2   0.2    10    90  0.10
3   0.4    38    62  0.38
4   0.6    77    23  0.77
5   0.8    94     6  0.94
6   1.0    97     3  0.97

Summarising data (better coding)

datByContrast <- group_by(dat, i)

averages <- summarise(datByContrast, n = n(), nYes = sum(resp), nNo = n - nYes, p = nYes / n)

averages
# A tibble: 6 x 5
      i     n  nYes   nNo     p
  <dbl> <int> <int> <int> <dbl>
1   0.0   100     5    95  0.05
2   0.2   100    10    90  0.10
3   0.4   100    38    62  0.38
4   0.6   100    77    23  0.77
5   0.8   100    94     6  0.94
6   1.0   100    97     3  0.97

Summarising data (better coding: pipes)

mydat2 %>% mean() # it is the same that mean(mydat2)
[1] 5.333333
averages <- dat %>% group_by(i) %>% 
  summarise(n = n(), nYes = sum(resp), nNo = n - nYes, p = nYes / n)

Plotting the data

library(ggplot2)

pData <- ggplot(averages, aes(x = i, y = p)) + 
  geom_point()
pData

plot of chunk unnamed-chunk-13

It turns out that this sigmoidal shape often occurs in classification tasks.

Learn more about the psychometric function

Green and Swets (1966). Signal detection theory and psychophysics.

Gescheider (1998). Psychophysics: the fundamentals.

Prins and Kingdom (2016). Psychophysics: a practical introduction.

Knoblauch and Maloney (2012). Modeling psychophysical data in R.

Fitting psychometric function using quickpsy

library(quickpsy)

fit <- quickpsy(averages, i, nYes, n) 

By default the shape of the psychometric function is the cumulative normal distribution.

quickpsy fits the psychometric function using Maximum Likelihood Estimation (MLE).

Learn more about MLE

Linares and López-Moliner (in press) quickpsy: An R package to fit psychometric functions for multiple groups. The R Journal.

http://www.dlinares.org/statistics.html

Prins and Kingdom (2016). Psychophysics: a practical introduction.

Knoblauch and Maloney (2012). Modeling psychophysical data in R.

The component curves of the quickpsy output

The output of quickpsy a list of components (many dataframes).

The component curves contains the psychometric function

fit$curves %>% head(4)
            x          y
1 0.000000000 0.02922206
2 0.003344482 0.03015473
3 0.006688963 0.03111194
4 0.010033445 0.03209416

Plotting the psychometric function

pDataWithPsy <- pData +
  geom_line(data= fit$curves, aes(x = x, y = y))

pDataWithPsy

plot of chunk unnamed-chunk-17

Doing everything more directly

dat <- read.table('data/detectdat.txt', header = TRUE)

fit <- quickpsy(dat, i, resp)

plot(fit) 

plot of chunk unnamed-chunk-19

Fitting psychometric function using generalize linear models

Knoblauch and Maloney (2012)

model <- glm(cbind(nYes, nNo) ~ i, data= averages, family = binomial(probit))

xseq <- seq(0, 1, .01)

yseq <- predict(model, data.frame(i = xseq), type = 'response')

curve <- data.frame(xseq, yseq)

Also uses MLE.

Fitting psychometric function using generalize linear models

ggplot() +
  geom_point(data=averages, aes(x = i, y = p)) +
  geom_line(data = curve,aes(x = xseq, y = yseq))

plot of chunk unnamed-chunk-22

Given glm, why we developed quickpsy?

library(MPDiR) # contains the Vernier data

fitV <- quickpsy(Vernier, Phaseshift, NumUpward, N, grouping = .(Direction, WaveForm, TempFreq))

plot(fitV) 

plot of chunk unnamed-chunk-24

Some other reasons

Easily fit or fix asymptotes.

Obtain bootstrap confidence intervals.

Compare parameters and thresholds using bootstrap.


The advantage of glm is that modern statisticals methods can be used to fit a global model to all conditions.

Knoblauch and Maloney (2012). Modeling psychophysical data in R. Moscatelli et al. (2012). Modeling psychophysical data at the population-level: the generalized linear mixed model. JOV.

The output list of quickpsy

A part from curves, quickpsy returns other components

?quickpsy

Averages

A part from curves, there are other output dataframes (strickly tibbles)

fit$averages
# A tibble: 6 x 4
      i     n  resp  prob
  <dbl> <int> <dbl> <dbl>
1   0.0   100     5  0.05
2   0.2   100    10  0.10
3   0.4   100    38  0.38
4   0.6   100    77  0.77
5   0.8   100    94  0.94
6   1.0   100    97  0.97

Averages: plotting

p <- ggplot()+
  geom_point(data = fit$averages, aes(x = i, y = prob)) +
  geom_line(data = fit$curves, aes(x = x, y = y))
p

plot of chunk unnamed-chunk-28

Thresholds

fit$thresholds
       thre prob   threinf   thresup
1 0.4577091  0.5 0.4224058 0.4867858
fitV$thresholds

Thresholds: plotting

p <- p +
  geom_segment(data = fit$thresholds, aes(x = thre, y = 0, xend = thre, yend = prob))
p

plot of chunk unnamed-chunk-32

Thresholds: plotting

plotthresholds(fit)

plot of chunk unnamed-chunk-34

Thresholds: plotting

plotthresholds(fitV)

plot of chunk unnamed-chunk-36

Parameters

fit$par
  parn       par    parinf    parsup
1   p1 0.4577091 0.4224058 0.4867858
2   p2 0.2418732 0.2117044 0.2695010

Parameters: plotting

plotpar(fit)

plot of chunk unnamed-chunk-39

Parameters: plotting

plotpar(fitV)

plot of chunk unnamed-chunk-41

Bootstrap averages

fit$avbootstrap %>% head(9)
Source: local data frame [9 x 5]
Groups: sample [2]

  sample     i     n  resp  prob
   <int> <dbl> <int> <int> <dbl>
1      1   0.0   100     2  0.02
2      1   0.2   100    15  0.15
3      1   0.4   100    50  0.50
4      1   0.6   100    72  0.72
5      1   0.8   100    94  0.94
6      1   1.0   100    97  0.97
7      2   0.0   100     2  0.02
8      2   0.2   100    14  0.14
9      2   0.4   100    45  0.45

Bootstrap averages: plotting

pboot <-  p +
  geom_point(data = fit$avbootstrap, aes(x = i, y = prob, color = factor(sample))) 
pboot

plot of chunk unnamed-chunk-44

Bootstrap averages: plotting 4 samples

pboot4 <-  p +
  geom_point(data = fit$avbootstrap %>% filter(sample <= 4), aes(x = i, y = prob, color = factor(sample))) 
pboot4

plot of chunk unnamed-chunk-46

Bootstrap averages: plotting

pboot <-  p +
  geom_point(data = fit$avbootstrap, aes(x = i, y = prob, 
                                         color = factor(sample))) + guides(color = FALSE)
pboot

plot of chunk unnamed-chunk-48

Intuition about bootstrap (non-parametric)

Bootstrap curves

fit$curvesbootstrap %>% head(9)
Source: local data frame [9 x 3]
Groups: sample [1]

  sample           x          y
   <int>       <dbl>      <dbl>
1      1 0.000000000 0.03743167
2      1 0.003344482 0.03854383
3      1 0.006688963 0.03968277
4      1 0.010033445 0.04084893
5      1 0.013377926 0.04204273
6      1 0.016722408 0.04326463
7      1 0.020066890 0.04451504
8      1 0.023411371 0.04579440
9      1 0.026755853 0.04710315

Bootstrap curves: plotting 4 samples

pcurves <-  pboot4 +
  geom_line(data = fit$curvesbootstrap %>% filter(sample <= 4), aes(x = x, y = y, 
                                         color = factor(sample))) 
pcurves

plot of chunk unnamed-chunk-51

Bootstrap thresholds

fit$thresholdsbootstrap %>% head(9)
Source: local data frame [9 x 3]
Groups: sample [9]

  sample      thre  prob
   <int>     <dbl> <dbl>
1      1 0.4425859   0.5
2      2 0.4537820   0.5
3      3 0.4545385   0.5
4      4 0.4357098   0.5
5      5 0.4514053   0.5
6      6 0.4736884   0.5
7      7 0.4815944   0.5
8      8 0.4720539   0.5
9      9 0.4484391   0.5

Bootstrap thresholds: plotting 4 samples

pthre <-  pcurves +
   geom_segment(data = fit$thresholdsbootstrap %>% filter(sample <=4), aes(x = thre, y = 0, xend = thre, yend = prob, color = factor(sample))) 
pthre

plot of chunk unnamed-chunk-54

Bootstrap thresholds: plotting all samples

ggplot(fit$thresholdsbootstrap, aes(x = thre)) +geom_histogram()

plot of chunk unnamed-chunk-56

This is the distribution to generate the confidence intervals.

Bootstrap thresholds: confidence intervals

quantile(fit$thresholdsbootstrap$thre, .025)
     2.5% 
0.4224058 
quantile(fit$thresholdsbootstrap$thre, .975)
    97.5% 
0.4867858 

Bootstrap thresholds: confidence intervals

quantile(fit$thresholdsbootstrap$thre, c(.025,.975))
     2.5%     97.5% 
0.4224058 0.4867858 

We verify it

fit$thresholds
       thre prob   threinf   thresup
1 0.4577091  0.5 0.4224058 0.4867858

The same is done to obtain the ci for the parameters.

The input of quickpsy

?quickpsy

Initial parameters

fit <- quickpsy(dat, i, resp, parini = c(.5,.5))
plot(fit) 

plot of chunk unnamed-chunk-62

Initial parameters: bounded

fit <- quickpsy(dat, i, resp, parini = list(c(0.1, .3), c(0, 10))) 
plot(fit) # poor choice of boundaries

plot of chunk unnamed-chunk-64

Function to fit: logistic

fit <- quickpsy(dat, i, resp, fun = logistic_fun)
plot(fit) 

plot of chunk unnamed-chunk-66

Function to fit: Weibull

fit <- quickpsy(dat, i, resp, fun = weibull_fun)

plot(fit) 

plot of chunk unnamed-chunk-68

Function to fit: defined by the user

gompertz_fun <- function(x, p) exp(-p[1] * exp(-p[2] * x))

gompertz_fun(.2, c(1,2))
[1] 0.5115448
gompertz_fun(.6, c(1,1))
[1] 0.5776358

Function to fit: defined by the user

fit <- quickpsy(dat, i, resp, fun = gompertz_fun, parini = c(1, 1))

plot(fit) 

plot of chunk unnamed-chunk-72

Function to fit: defined by the user

fit <- quickpsy(dat, i, resp, fun = gompertz_fun, parini = c(1, 1)) #initial parameters are necessary

plot(fit) 

plot of chunk unnamed-chunk-74

Threshold probability

fit <- quickpsy(dat, i, resp, prob=.75)
plot(fit) 

plot of chunk unnamed-chunk-76

Logarith of the independent variable

fit <- quickpsy(dat %>% filter(i != 0), i, resp, log = TRUE)
plot(fit) 

plot of chunk unnamed-chunk-78

Asymptotes: guesses fixed

fit <- quickpsy(dat, i, resp, , guess = .1)
plot(fit) 

plot of chunk unnamed-chunk-80

Lapses can also be fixed

Asymptotes: lapses variable

fit <- quickpsy(dat, i, resp, lapses = TRUE)
plot(fit)

plot of chunk unnamed-chunk-82

Guesses can also be variable

Asymptotes: lapses variable

We verify that lapses are small

fit$par 
  parn        par      parinf     parsup
1   p1 0.44260784  0.40047926 0.48574079
2   p2 0.22219988  0.18528446 0.25721994
3   p3 0.01929753 -0.01057085 0.06004908

Lapses (and guesses) are coded as the third parameter. In case, both lapses and guesses are fitted, they are coded as the third and forth parameters.

Bootstrap

By default bootstrap is parametric: the number of 'Yes' is obtained from a random sample from the binomial distribution with p given by the psychometric function and n by the number of trials.

To do non-parametric bootstrap

fit <- quickpsy(dat, i, resp, bootstrap = 'nonparametric', B = 10)

where B is the number of sample. By default is 100 which is quite small, typically B is 1000 or 10000.

More on ouput: comparing conditions

averages <- averages %>% mutate(condition = 'a')

averages2 <- averages %>% mutate(nYes= c(1, 4, 25, 73, 80, 85), condition = 'b')
averages2
# A tibble: 6 x 6
      i     n  nYes   nNo     p condition
  <dbl> <int> <dbl> <int> <dbl>     <chr>
1   0.0   100     1    95  0.05         b
2   0.2   100     4    90  0.10         b
3   0.4   100    25    62  0.38         b
4   0.6   100    73    23  0.77         b
5   0.8   100    80     6  0.94         b
6   1.0   100    85     3  0.97         b

More on ouput: comparing conditions

averages <- averages %>% mutate(condition = 'a')
averagesAll <- rbind(averages, averages2)
averagesAll
# A tibble: 12 x 6
       i     n  nYes   nNo     p condition
*  <dbl> <int> <dbl> <int> <dbl>     <chr>
1    0.0   100     5    95  0.05         a
2    0.2   100    10    90  0.10         a
3    0.4   100    38    62  0.38         a
4    0.6   100    77    23  0.77         a
5    0.8   100    94     6  0.94         a
6    1.0   100    97     3  0.97         a
7    0.0   100     1    95  0.05         b
8    0.2   100     4    90  0.10         b
9    0.4   100    25    62  0.38         b
10   0.6   100    73    23  0.77         b
11   0.8   100    80     6  0.94         b
12   1.0   100    85     3  0.97         b

More on ouput: comparing conditions

fitAll <- quickpsy(averagesAll, i, nYes, n, grouping =.(condition))

library(cowplot)
plot_grid(plot(fitAll), plotthresholds(fitAll), labels = c('A', 'B'))

plot of chunk unnamed-chunk-88

More on ouput: comparing conditions

fitAll$thresholdcomparisons
  condition condition2        dif  difinf      difsup signif
1         a          b -0.1173913 -0.1697 -0.07518993      *

More on ouput: comparing conditions

fitV$thresholdcomparisons

fitV$thresholdcomparisons %>% filter(WaveForm == WaveForm2)

Lapses: the problem of non fitting lapses

Wichmann and Hill (2001). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception and Psychophysics, 63(8), 1293–1313.

Q <- seq(0, 420, 60)
n <- 20
k <- c(0, 0, 4, 18, 20, 20, 19, 20) # lapse in the second to last intensity
datLapse <- data.frame(Q, k, n)

Lapses: the problem of non fitting lapses

withoutlapses <- quickpsy(datLapse, Q, k, n, prob = .75) %>% plot() + ggtitle('Without lapses')

withlapses <- quickpsy(datLapse, Q, k, n, prob = .75, lapses = TRUE) %>% plot() + ggtitle('Wit lapses')

plot_grid(withoutlapses, withlapses)

plot of chunk unnamed-chunk-93

Lapses: the problem of non fitting lapses

But allowing lapses not always fix the problem:

Prins, N. (2012). The psychometric function: the lapse rate revisited., 12(6), 25–25.

Linares and López-Moliner (in press) quickpsy: An R package to fit psychometric functions for multiple groups. The R Journal.