Data Analysis


          We are interested in the relationship between the perceived number of dots (the dependent variable) and the actual number of dots (the independent variable) in each of the different conditions of the experiment. We would also like to compute a best fit of a power function to this relationship. Most of the material in this section is concerned with how to use a spreadsheet to accomplish these goals.

The data will be in the following form:

blck block number
mxcnt maximum number of dots
rtyp response type (linear, logarithmic, ratio estimation)
fdbk feedback (yes, no)
stnd number of dots in standard
scrin scroll display (labels, window, both)
dregl dot regularity (low=5, high=10)
trl trial number
stim number of dots in stimulus
resp response (estimated number of dots or estimated ratio)

Here are some typical lines of data:

blck,mxcnt,rtyp,fdbk,stnd,scrin,dregl,trl,stim,resp

   1, 400,Lin,Yes, 100,both,   5,   1,  89,24.00
   1, 400,Lin,Yes, 100,both,   5,   2, 131,96.00
   1, 400,Lin,Yes, 100,both,   5,   3,  30,26.00
   1, 400,Lin,Yes, 100,both,   5,   4, 259,194.00
   1, 400,Lin,Yes, 100,both,   5,   5, 164,163.00

   2, 400,Log,Yes, 100,wind,   5,   1,  23,12.00
   2, 400,Log,Yes, 100,wind,   5,   2, 184,169.00
   2, 400,Log,Yes, 100,wind,   5,   3, 331,240.00
   2, 400,Log,Yes, 100,wind,   5,   4, 120,114.00
   2, 400,Log,Yes, 100,wind,   5,   5, 201,117.00

   3, 400,Rat,Yes, 100,labs,   10,   1, 311,2.30
   3, 400,Rat,Yes, 100,labs,   10,   2, 390,3.60
   3, 400,Rat,Yes, 100,labs,   10,   3, 349,3.80
   3, 400,Rat,Yes, 100,labs,   10,   4, 319,2.70
   3, 400,Rat,Yes, 100,labs,   10,   5,  54,0.60

          In the Lin and Log conditions, the perceived number of dots is the number in the resp column. In the Rat condition, we can infer the perceived number of dots by multiplying the standard by the ratio given in the resp column. (Alternatively, we could plot the log of the estimated ratio as a function of the log of the actual ratio, given by stnd/stim.)

          After you have set up the spreadsheet to plot resp as a function of stim, change the scale of both the x and y graph dimensions to logarithmic scales. Then, if there is a power function relationship between x and y, the resultant plot will be a straight line.

That is, let the perceived number of dots=p and the actual number of dots=x.

We assume that there is a power function relationship between p and x:

          p=axb          [equation 1]

where p is the perceived magnitude, x is the stimulus intensity, and a and b are the parameters of the power law function. The exponent, b, is an important parameter determined by the particular sensory modality being studied.

Taking the log of each side of equation 1 yields:

          log (p) = log(axb) = log(a) + log(xb)

and then

          log (p) = log(a) + blog(x) = A + blog(x)

where A is a constant equal to log(a).

The point is that by changing the ordinate and abscissa to log scales, we have created the new variables, X and Y, where

          Y = log(p) and X = blog(x).

Thus, the logarithmic scale change causes the stimulus-response function to become the equation of the straight line

          Y=A+bX

where b is the slope of the line and A is the intercept.

          In addition to plotting p and x on logarithmically scaled axes, we need to have the spreadsheet compute the logarithm of p and x. This is because we would like to compute a regression of p on x. Most spreadsheets can't directly compute the regression of a power function, so we need to change p, the apparent number of dots, and x, the actual number of dots, to log(p) and log(x), respectively. Then we can do a regression of log(p) on log(x).This regression will (a) indicate how good a fit the power function is to our data, and (b) give the value of the exponent (and constant term) that produces a best fit of the power function to the data.

          To perform the regression with the spreadsheet, we need to create two new spreadsheet columns, one equal to log(resp) and one equal to log(stim). Then we can do the regression of log(resp) on log(stim).

          Once we have the values of A and b produced by the regression calculation, we wish to calculate values for the predicted perceived number of dots, p'. That is, we could substitute the calculated values of a and b in equation 1 and calculate a predicted value of p for each value of x. The problem is how to get the spreadsheet to perform this calculation. The easiest way is probably the following: First, calculate values for Y', using the obtained values for A and b and the column of values of X. That is, establish a new column for Y' where the cells are equal to A+bX. That will produce the following spreadsheet columns.

x p X P P'   p'
stim resp log(x) log(p) A+blog(x) P'/log(e) xp[P'/log(e)]
45 43 1.653213 1.633468 1.70053 3.915616 50.17997
72 72 1.857332 1.857332 1.872443 4.31146 74.54925
96 99 1.982271 1.995635 1.977669 4.55375 94.98796
116 110 2.064458 2.041393 2.046887 4.713133 111.4006
132 167 2.120574 2.222716 2.094149 4.821957 124.2079
147 138 2.167317 2.139879 2.133517 4.912605 135.9932
180 187 2.255273 2.271842 2.207594 5.083174 161.2851
235 161 2.371068 2.206826 2.305119 5.307732 201.8919
242 224 2.383815 2.350248 2.315855 5.332453 206.9451
335 234 2.525045 2.369216 2.434801 5.606336 272.1452

[In this example, the regression produced values of 0.30817 and 0.842215 respectively, for A and b.]

          Now, we need to convert the Y' values (i.e., the P' values) to p' values. We can do this in two steps: first divide the P' values by log(e) and then exponentiate those values. These steps are accomplished by the following operations:

          P'/log(e) = P'/log(exp(1))

Then,

          eP'/log(e) = exp(P'/log(exp(1)) = p'

          Once we have the p' values, we can plot the x, p, and p' values all on the same graph. Plot the p values as markers (symbols) only and the p' values as a line only (no markers) connecting the p' points. Whether or not to use logarithmic scales, is up to you.

Graph

          The chart shows the plotted data (markers) and the best fit power function (pink line) for the data in the example. (The pink line is beginning to show some curvature at high numbers of dots.)

          What was the purpose of all this fooling around with log(e) and exp(P'/log(e))? It was simply to get around the problem of computing an inverse log using a spreadsheet program. Most spreadsheets can't directly compute the antilog or inverse of log(base-10). However, the spreadsheet does have an exponentiate function, exp(-); so we used that. If you are using a numerical analysis program, you can avoid these shenanigans. Just compute

          p' = 10p'