Neural basis of concurrent deliberation toward a choice and confidence judgment - Nature Neuroscience


Neural basis of concurrent deliberation toward a choice and confidence judgment - Nature Neuroscience

While monkeys performed the task, we recorded population spiking activity in the ventral portion of the lateral intraparietal area, LIPv31. Previous work has shown that LIPv (hereafter LIP) represents a decision variable (DV) that predicts choice and RT32, as well as confidence in an opt-out task28. We found that behavior in the peri-dw task is best explained by concurrent evaluation of evidence for choice and confidence, and that neural activity in LIP reflects the requisite parallel process. The findings support a role for posterior parietal cortex in behaviors guided by an online estimate of confidence, and more broadly favor an architecture for visual metacognition that is fundamentally parallel.

We recorded 407 neurons in area LIP in the right hemisphere of two rhesus monkeys (Macaca mulatta; 207 in monkey H, 200 in monkey G) while they performed the peri-dw task (Fig. 1a). Each saccade target corresponds to a motion direction judgment (left or right) and a wager (high or low) on the correctness of that judgment. Although behaviorally the task amounts to a single choice among four options, we refer to the left-right component as 'choice' and the high-low component as 'wager' or 'bet', both for simplicity, and because the results support this interpretation. Monkeys were rewarded or penalized based on the conjunction of accuracy and wager (Fig. 1b) -- a larger drop of juice for high versus low bets when correct, and a time penalty for high bets when incorrect (no penalty for a low-bet error). As in previous work, monkeys showed greater accuracy (Fig. 1c) and faster RTs (Fig. 1d) when the motion was strong compared to weak. Motion strength also influenced wagering behavior in a sensible manner, namely the probability of betting high increased with greater motion strength in each direction (Fig. 1e). Notably, the behavior shows that the low-bet option did not correspond to opting out or failing to engage in the motion decision -- accuracy remained high on low-bet trials, and choice and RT still varied systematically with motion strength in a manner consistent with a deliberative process (see Model fitting below).

As expected for a behavioral assay of confidence, the monkeys' sensitivity was greater when betting high versus low (Fig. 1c, red versus blue; z = 34.41, P = 1.65 × 10, s.e. = 0.2735, logistic regression). This was true even when controlling for variability in motion energy within each coherence level, by leveraging multiple repeats of the same random seed (Supplementary Fig. 1a (monkey H -- P = 1.3376 × 10, z = 3.8194; monkey G -- P = 3.4828 × 10, z = 5.0953); Methods). Both monkeys also showed faster RTs when betting high versus low, for all but the largest motion strengths (Fig. 1d (red versus blue; asterisks indicate P < 0.0045; Wilcoxon rank-sum test, Šidák corrected)). Greater sensitivity and faster RTs for high-bet choices were evident in the large majority of individual sessions, as assessed with logistic regression (Extended Data Fig. 1 (mean difference in sensitivity, monkey H -- 9.2599, P = 8.2899 × 10, z-score = 7.9646; monkey G -- 9.3, P = 9.4419 × 10, z-score = 9.4101; one-tailed Wilcoxon signed-rank test)) and Gaussian fitting (Extended Data Fig. 1 (mean difference in amplitude, monkey H --  -74.4 ms, P = 1.9011 × 10, z-score = -5.8926; monkey G --  -275.4 ms, P = 7.9589 × 10, z-score = -8.5203; one-tailed Wilcoxon signed-rank test)). Finally, we examined wagering behavior as a function of RT quantile, separately for each individual motion strength (Extended Data Fig. 2a,b). For most motion strengths, the monkeys were less likely to bet high on trials with longer RTs (Extended Data Fig. 2b; P < 0.0085 for both monkeys for every coherence except 51.2%, Cochran-Armitage test with Bonferroni correction). This pattern is strikingly similar to human behavior in a similar task. As in the previous study, the trend remained statistically significant when controlling for variability in motion energy across trials of a given coherence (Extended Data Fig. 2d (monkey H -- P = 3.2351 × 10, F = 5.24; monkey G -- P = 0.0056, F = 3.65; interaction term between motion energy and RT quintile using analysis of covariance)). An inverse relationship between RT and confidence is a classic psychophysical result, replicated in a more recent human work. Observing it in our monkeys supports the peri-dw assay as a valid measure of confidence, and it is consistent with a family of accumulator models as described below.

Although the choice and wager were indicated with a single eye movement, this does not necessitate simultaneity in the processing of evidence. Different temporal windows of the stimulus could covertly be used to support the two elements of the decision, which would then only be reported when both were resolved. To test whether monkeys use a consistent serial strategy (resolving choice first and then confidence, or vice versa), we calculated the influence of stimulus fluctuations on choice and confidence as a function of time (psychophysical kernels). Briefly, we quantified the motion energy for each trial and video frame by convolving the random-dot pattern with two pairs of spatiotemporal filters aligned to leftward and rightward motion. We then partitioned trials by outcome and plotted the average relative motion energy (residuals) for each outcome as a function of time.

Psychophysical kernels for choice are plotted in Fig. 2a,b. Rightward choices were preceded by more rightward motion energy throughout most of the trials (red line), and the same was true for leftward choices and leftward motion (blue). The kernels for right and left choices began to separate about 100 ms after motion onset and remained so until ~100 ms before saccade initiation. This clear separation suggests that the monkeys were using essentially the entire stimulus epoch, on average, to decide motion direction. For confidence, we calculated the kernels by taking the difference between motion energy time series for high and low bets associated with a specific choice. We found that there was an excess of rightward motion energy on rightward high-bet choices compared to rightward low-bet choices (Fig. 2c,d (green traces above zero)), and similarly, there was more leftward motion energy on high-bet versus low-bet leftward choices (Fig. 2c,d, purple traces below zero). This analysis shows that both early and late motion evidence are leveraged to inform confidence in both monkeys. Comparison of the traces in Fig. 2a,b versus 2c,d suggests that the usage of the stimulus for confidence does not identically overlap with choice, especially for monkey H. However, the substantial overlap does appear to rule out a consistent temporal segregation, such as an obligatory postdecision mechanism for confidence.

In the previous work where decisions were reported with an arm movement, participants altered their reach trajectory in a manner that suggests a 'change of mind' (CoM) based on the continued processing of evidence after movement initiation. Saccadic choices, being fast and ballistic, are often assumed to be incompatible with CoMs; nevertheless, we identified a small subset of trials with multiple saccades in quick succession that showed certain characteristic features (Fig. 2e-j). These putative CoMs were more frequent on difficult versus easy trials (Fig. 2g,j (monkey H -- P = 0.007, z = -3.70, n = 52,993; monkey G -- P = 3.1983 × 10, z = -31.15, n = 118,105; Cochran-Armitage test)), and changes from incorrect to correct were more likely when motion strength was high (Fig. 2e,h (light red; monkey H -- P = 5.7576 × 10, z = 10.86, n = 8,579; monkey G -- P = 1.5973 × 10, z = 14.13, n = 24,480; Cochran-Armitage test)). Correct-to-error CoMs, occurring sparingly, were more likely when motion strength was low (Fig. 2e,h (dark red; monkey H -- P = 0.001, z = -5.88, n = 44,414; monkey G -- P = 6.9991 × 10, z = -16.72)). We also observed changes from low to high confidence, which, for both monkeys, were more frequent with greater motion strength (Fig. 2f,i (blue line; monkey H -- P = 4.9597 × 10, z = 11.20; monkey G -- P = 0.01, z = 3.38)), as shown previously in humans. The presence of CoMs and changes in confidence, sometimes both occurring in the same trial, imply that both aspects of the decision were subject to revision at the time of the initial saccade. This is inconsistent with a strictly serial process, although it also reveals a brief window for postdecision processing even for saccadic choices.

Previous studies, with some exceptions, typically assume a particular temporal framework for choice and confidence rather than comparing across model classes. Here we provide a thorough comparison of serial, parallel and hybrid models fitted to the same data from the peri-dw task. The parallel and hybrid models consist of two accumulators that integrate evidence for the two motion directions, differing only in how the accumulated evidence is mapped to confidence. To explain the wager, the confidence mapping is binarized by a single free parameter -- a criterion on log odds correct associated with a high bet versus a low bet (Fig. 1g (right)). The hybrid model adds an additional free parameter for the duration of postdecision accumulation. For the serial model, we made the simplifying assumption that the two accumulators were perfectly anticorrelated, equivalent to a one-dimensional (1D) drift-diffusion model (DDM). After one of the bounds is reached, evidence continues to accumulate toward a second set of bounds dictating the wager (Fig. 1f). The observed RT is the sum of the time taken to reach both bounds, as well as nondecision time.

The smooth curves in Fig. 3 and Extended Data Fig. 3 are fits to the serial, parallel and hybrid models for both monkeys. All models perform quite well at describing choice, RT and confidence as a function of motion strength when pooled across correct/incorrect and high/low wager trials (Extended Data Fig. 3), a testament to the explanatory power of the bounded accumulation framework. Interestingly, all three models also qualitatively capture the greater choice sensitivity and faster RTs for high versus low wager (Fig. 3 (first and second columns)). This comparison illustrates the difficulty of disambiguating the mechanism(s) underlying choice and confidence using behavior alone. Indeed, quantitative model comparison yielded mixed results for the two monkeys -- hybrid and parallel models were favored over the serial model for monkey G (Bayesian information criterion (BIC), hybrid = 1.1610 × 10, parallel = 1.1614 × 10, serial = 1.1644 × 10, n = 115,811), but the opposite was true for monkey H (BIC, serial = 7.7993 × 10, parallel = 7.8179 × 10, hybrid = 7.8279 × 10, n = 82,449).

Critically, however, the serial model fails in one key aspect -- the pattern of wagering behavior conditioned on accuracy (Fig. 3 (right column)). It is commonly observed that confidence ratings increase as a function of evidence strength on correct trials, but decrease with evidence strength on incorrect trials. This characteristic 'X' pattern (or 'folded-X', if stimulus strength is unsigned) is widely accepted as a signature of confidence in behavior and brain activity, yet it is not universal. Other studies report that confidence increases with evidence strength, even for errors, and this is what we observed as well (Fig. 3 (right column)). It is becoming increasingly clear that these conflicting findings can, in many cases, be explained by a temporal dissociation -- resolving a choice first, followed by confidence later, allows for revision of the confidence judgment upon further deliberation. When the stimulus is strong, incorrect choices are more likely to undergo such revision; hence, confidence decreases (on average) with evidence strength on error trials. Reducing or eliminating the delay between the choice and confidence report tends to flatten or reverse the X pattern. Because the serial and hybrid models impose such a delay (implicitly, in our task), they cannot reproduce the qualitative trend in error-trial confidence we observed empirically (Fig. 3 (right column, green data points)), unless the postdecision epoch is very brief, as it was in the best-fitting hybrid model for monkey G (60 ms; Fig. 3i). This qualitative miss is not reflected in the above BIC results because the model likelihoods were calculated using the unconditioned wager data, meaning the split between correct and error trials (Fig. 3 (right column, green versus purple curves)) is a prediction, not a fit. Quantifying the accuracy of this prediction using the error-trial wagers establishes the parallel and serial model as the most and least supported, respectively, for both monkeys (negative log-likelihood, for monkey H -- parallel = 8.4320 × 10, hybrid = 8.4370 × 10, serial = 8.6572 × 10, n = 13,176; for monkey G -- parallel = 1.5636 × 10, hybrid = 1.5699 × 10, serial = 1.5993 × 10, n = 13,755).

In summary, although each model variant is flexible enough to capture most behavioral trends, a holistic model comparison favors parallel accumulation of evidence for a decision and associated level of confidence. We then examine whether decision-related activity in parietal cortex is consistent with such a mechanism.

Putative DV representations can be found in several subcortical and cortical areas, characterized by a 'ramping' pattern of neural activity (or decoded proxy thereof) that scales with evidence strength and often converges upon decision termination. Although this pattern does not uniquely identify a process of evidence accumulation, a large body of work supports the assertion that LIP neurons reflect such a process during random-dot motion discrimination. We reasoned that, if choice and confidence were resolved concurrently during motion viewing (parallel model), the ramping activity should begin to predict both dimensions of the eventual saccade simultaneously, classically around 200 ms after motion onset. Alternatively, if choice was deliberated first, followed by confidence (serial model), this temporal separation should be evident in the divergence point of neural activity traces conditioned on the four outcomes.

These traces are shown in Fig. 4a for four example neurons. The highest firing rate (FR) corresponds to choices made into the receptive field (RF) of the neuron, which was almost always in the left (contralateral) hemifield but was equally likely to overlap the high or low wager target. The relative ordering of the remaining three traces differs across neurons, possibly due to idiosyncratic RF properties or nonspatial decision signals. The key observation is that the activity preceding saccades to the preferred wager target (low or high) diverges from the activity for the other wager target (high or low) at about the same time as it diverges from the traces for ipsilateral choice (right-low and right-high). This pattern is present in each example neuron and in the population averages (Fig. 4b,c). There is no evidence that ramping activity consistently predicts the left-right choice sooner than the high-low one (or vice versa), as expected under a serial model. Instead, to the extent the activity reflects accumulation of evidence favoring the target in the RF (see below), the results support a model in which such accumulation underlies concurrent deliberation toward a choice and confidence judgment.

To dig deeper into the nature of the observed ramping signals, we tested for statistical signatures of a bounded accumulation process -- (1) increasing variance of the underlying rate (variance of the conditional expectation, VarCE) followed by a collapse near decision termination, and (2) a characteristic autocorrelation pattern in this latent signal (correlation of conditional expectation, CorCE; Methods). The results supported both sets of predictions. Beginning 200 ms after motion onset, VarCE shows a roughly linear increase for at least the next 400 ms (Extended Data Fig. 4). For CorCE, the results from both monkeys were well-matched to the predictions, namely an increase in the correlation between neighboring time bins as time elapses, and a decrease in correlation between bins as the separation between them increases (Fig. 4d,e (left, monkey G -- coefficient of determination R = 0.83 and 0.82 for left-high and left-low neurons, respectively; monkey H -- R = 0.66 and 0.80 for left-high and left-low, respectively)).

These dynamics in variance and autocorrelation are consistent with an underlying neuronal mechanism that implements accumulation of noisy evidence, and are not easily explained by alternative accounts of LIP ramping activity, such as a gradual shift of attention or simple movement preparation. Critically, the patterns were present over the same time window in both the high-bet and low-bet preferring populations. This appears to refute a version of the serial model where choice is initially resolved by considering only one pair of targets, followed by a shift to the other pair after some time has elapsed. We explicitly tested this by computing the expected autocorrelation for a simulated process in which accumulation is delayed by a random amount of time. The delayed process provided a qualitatively inferior account of the empirically derived CorCE values, relative to standard (synchronous) accumulation (Fig. 4e (left versus right; left-high neurons in monkey H -- P = 0.012, n = 15; left-low neurons in monkey H -- P = 4.2725 × 10, n = 15; left-high neurons in monkey G -- P = 6.1035 × 10, n = 15; left-low neurons in monkey G -- P = 0.0015, n = 15; Wilcoxon signed-rank test)). Taken together, the results support a parallel model in which deliberation occurs simultaneously across the high and low target pairs. What remains to be tested is whether and when these accumulation signals are predictive of the monkey's choice and wager on individual trials.

Most of our analyses so far have relied on trial averages, potentially obscuring the dynamics of individual decisions. Therefore, we turned to a population-decoding approach to more directly address the question of parallel versus serial deliberation. We trained two logistic classifiers, one for the binary choice and the other for the binary wager, using the population spike counts (mean = 14 units per session) in the final 200 ms before the saccade. We then extracted a 'neural DV', also referred to as prediction strength or certainty, which is simply the log odds of a particular choice or wager as a function of time based on the decoded population spike counts on a given trial.

For both monkeys, the neural DV for choice ramped up starting about 200 ms after motion onset (Fig. 5a). The DV dynamics differed for the two animals, but both showed a ramping slope that depended on motion strength (monkey H, P < 0.001; monkey G, P < 10, linear regression). Cross-validated prediction accuracy also ramped up beginning at this time, simultaneously for both the choice and confidence decoders (Fig. 5b). At their peaks, both decoders performed well above chance on the test set, but a notable difference is the timing of the peaks, which for choice is just before saccade onset and for wager is slightly after the saccade (Fig. 5b). The time course of the choice and confidence DVs (Fig. 5c) mirrored the prediction accuracy traces, ramping in lockstep throughout most of the trials but with a subtle offset near the time of the saccade. This implies the persistence of a confidence-related signal even after the commitment to a wager, possibly reflecting continued deliberation (or a top-down signal) that could drive CoMs or even inform the next decision. However, the temporal offset was absent when using an alternative decoding approach with weights from a fixed (peri-saccadic) window (Supplementary Fig. 2), so this aspect of the results should be interpreted with caution. The details of the decoding method did not affect the main result of temporal congruency in the ramping of choice and wager signals during motion viewing.

Stepping back from the issue of temporal alignment, an important unanswered question is the degree to which choice and confidence signals overlap on a cell-by-cell basis. We computed the correlation between the fitted decoder weights (choice versus wager), after collapsing across time and converting them to an absolute magnitude, and found a modest but highly significant correlation across our sample (Extended Data Fig. 5a (monkey H, r = 0.18; monkey G, r = 0.21; P < 0.001 for both, permutation test)). In addition, the distribution of the difference between choice and confidence weights was unimodal (Extended Data Fig. 5b (Hartigan's dip test, P > 0.9 for both monkeys individually)), suggesting a continuum of contributions to choice and confidence and not two distinct subpopulations. This raises the question of whether the population can disentangle the two signals to prevent interference, especially since the evidence informing choice and confidence comes from a single source (the motion stimulus). To address this, we calculated the angular distance between the decoding vectors for choice versus confidence, separately for each session. During the deliberation period these vectors were approximately orthogonal (Fig. 5d (solid trace)), facilitating the readout of confidence by a downstream region, potentially at any time -- although they were closest to orthogonal around the time of the saccade. Finally, to test whether concurrent choice and confidence signals could be a trivial consequence of motor preparation, we trained a separate decoder to predict the vertical component of the saccade using a set of control trials where only one pair of wager targets (either the high or low pair) was present on a given trial. The resulting decoding vector was nearly orthogonal to the vector for decoding the wager on standard four-target trials (Fig. 5d (dashed trace)), and showed a qualitatively different log odds profile (Fig. 5e), suggesting that the confidence signal is distinct from eye movement preparation in the absence of a wager decision.

Given this multiplexed representation, we wondered whether the strength of choice decoding might predict the binary classification by the wager decoder on a trial-by-trial basis. To test this, we partitioned trials according to whether the wager decoder predicted a high or a low bet (P(high) in the peri-saccade epoch greater or less than 0.5, respectively). We then averaged the DV from the choice decoder, using only 0% coherence trials, and found that it was higher for decoded-high versus decoded-low trials (Fig. 6a). This indicates that the strength with which neural activity predicts the upcoming choice covaries with the probability that the same population predicts a high bet, consistent with a tight functional link between choice and confidence signals in LIP.

Remarkably, this link was only present for leftward (contralateral) and not rightward (ipsilateral) choices (Fig. 6a versus 6b). To further investigate this stark contrast, we performed a trial-by-trial analysis of the neural DVs centered around the saccade epoch (Fig. 6c,d). After separating the data by the monkey's wager on each trial (high, purple; low, gold), we confirmed that the wager decoder strongly predicted the behavioral confidence report, irrespective of choice (Fig. 6c,d (top histograms); P < 10, z = 56.5160, confidence interval (CI) = (3.554 .3809) and P = 1.139 × 10, z = 33, CI = (0.1769 0.1993)). However, the wager prediction was only positively correlated with choice strength for leftward (contralateral) choices (Spearman rank correlation,  = 0.22, P = 5.287 × 10 and  = -0.093, P = 5.212 × 10 for contra and ipsi choices, respectively). Because LIP RFs are mostly contralateral, this means the neurons that represent the unchosen option (associated with the 'losing accumulator' in the behavioral model) do not show a relationship between choice strength and wager prediction, although they still predict the wager itself (Fig. 6d (top histograms)). We considered an alternative model that reads out confidence from the winning accumulator, but this failed to capture the differences in accuracy and RT conditioned on the wager (Supplementary Fig. 4), unless there was at least ~80 ms of postdecision accumulation for confidence. However, this, in turn, predicted a prominent folded-X pattern (Supplementary Fig. 5f,l) that was absent in the data (see Discussion).

Having established a link between choice decoding strength and wager prediction (at least for contralateral choices), we can now examine the details of this relationship at a finer time scale and revisit the temporal offset shown in Fig. 5b,c. We fit a linear regression model relating choice decoder strength at time t to decoded wager probability at time t + ∆t, where ∆t ranges from +/-200 ms. During the deliberation phase (200-600 ms after motion onset and 0-400 ms before saccade onset), the strongest relationship between choice strength and wager probability was at a time lag of zero (Fig. 6e,f (corrected R = 0.132 and 0.216, respectively)). This result held even when the decoding vectors were realigned to be fully, not just approximately, orthogonal (Supplementary Fig. 3). Interestingly, the period centered around the saccade (-0.2 ↔ 0.2 s; Fig. 6g) gave rise to two peaks, one at zero lag (corrected R = 0.124) and the other at a lag of -0.2 s (choice preceding wager; corrected R = 0.125). We speculate that this late peak may indicate a re-evaluation of evidence informing the wager, similar to replaying the last few samples used for the choice as a substitute for external input. Regardless, the main takeaway is the prominent peak at zero lag, which is consistent with the near-simultaneous updating of internal representations guiding a decision and confidence judgment -- a surprising result given the evidence for serial bottlenecks in many cognitive processes.

Previous articleNext article

POPULAR CATEGORY

misc

18087

entertainment

19451

corporate

16235

research

9980

wellness

16137

athletics

20517