V-Dem Methodology

V-Dem has developed innovative methods for aggregating expert judgments in a way that produces valid and reliable estimates of difficult-to-observe concepts. This aspect of the project is critical because many key features of democracy are not directly observable. We continually review our methodology—and occasionally adjust it—with the goal of improving the quality of V-Dem indicators and indices.

Author: Kyle Marquardt

V-Dem uses innovative methods to aggregate expert judgments and thereby produce estimates of important concepts. We use experts because many key features of democracy are not directly observable. For example, it is easy to observe whether or not a legislature has the legal right to investigate an executive. However, assessing the extent to which the legislature actually does so requires evaluation by experts with extensive conceptual and case knowledge.

V-Dem typically gathers data from five experts per country-year observation, using a pool of over 4,000 country experts who provide judgment on different concepts and cases. Experts hail from almost every country in the world, allowing us to leverage diverse opinions.

Despite their clear value, expert-coded data pose multiple problems. Rating concepts requires judgment, which varies across experts and cases; it may also vary systematically across groups of experts. We address these concerns by aggregating expert coded data with a measurement model, allowing us to account for uncertainty about estimates and potential biases.

The logic of the V-Dem measurement model is that an unobserved concept exists (e.g. a certain level of academic freedom and freedom of cultural expression) but we only see imperfect manifestations of this concept in the form of the ordinal categories which experts use to code their judgments. Our model converts these manifest items (expert ratings) to a single continuous latent scale and thereby estimates values of the concept.

In the process, the model algorithmically estimates both the degree to which an expert is reliable relative to other experts, as well as the degree to which their perception of the response scale differs from other experts. Similarly, we use patterns of overlapping coding – both in the form of experts who code multiple countries and experts who code hypothetical cases (anchoring vignettes) – to estimate the degree to which differences in scale perception are systematic across experts who code different sets of cases. Given the iterative nature of the estimation process, these estimates of reliability and scale perception weight an expert's contribution to the estimation of the unobserved concept.

In the resulting V-Dem dataset, we present users with a best estimate of the value for an observation (the point estimate), as well as an uncertainty estimate (the credible regions, a Bayesian corollary of confidence intervals). More precisely, the output of the measurement model is an interval-level point estimate of the latent trait that typically varies from –5 to 5, and its associated measurement error. These estimates are the best for use in statistical analysis.

However, the interval-level estimates are difficult for some users to interpret substantively. We therefore also provide interval-level point estimates that we have linearly transformed back to the coding scale that experts originally used to code each case. These estimates typically run from 0 to 4; users can refer to the V-Dem codebook to substantively interpret them. Finally, we provide ordinal versions of each variable for applications in which users require ordered categorical values. Each of the latter two data versions are also accompanied by credible regions.

The result of this process is a set of versions of indicators of democratic institutions and concepts, which allow academics and policymakers alike to understand the different features of a polity. The table summarizes the output with which we provide users.

Versions of the V-Dem Indicators

For more information, download the V-Dem Methodology document.
Suffix	Scale	Description	Recommended use
None	Interval	V-Dem measurement model estimates	Regression analysis
_osp	Interval	Linearized transformation of the model estimates on the original scale	Substantive interpretation of graphs and data
_ord	Ordinal	Most likely ordinal value of model estimates on the original scale	Substantive interpretation of graphs and data
_codelow/ _codehigh	Interval	One standard deviation above (_codehigh) and below (_codelow) a point estimate	Evaluating differences over time within units
_sd	Interval	Standard deviation of the interval estimate	Creating confidence intervals based on user needs

KEY TERMS

Point Estimate: A best estimate of a concept’s value.

Confidence Intervals: Credible regions for which the upper and lower bounds represent a range of probable values for a point estimate. These bounds are based on the interval in which the measurement model places 68 percent of the probability mass for each score, which is generally approximately equivalent to the upper and lower bounds of one standard deviation from the median.

Significant Differences or Changes: When the upper and lower bounds of the confidence intervals for two point estimates do not overlap, we are confident that the difference between them is not a result of measurement error.

Resources and further reading

Marquardt, Kyle L. and Daniel Pemstein. 2018. IRT Models for Expert-Coded Panel Data. Political Analysis 26(4).
Pemstein, Daniel, Eitan Tzelgov and Yi-ting Wang. 2015. Evaluating and Improving Item Response Theory Models for Cross-National Expert Surveys. Varieties of Democracy Institute: Working Paper No. 1.
Pemstein, Daniel, Kyle L. Marquardt, Eitan Tzelgov, Yi-ting Wang, Juraj Medzihorsky, Joshua Krusell, Farhad Miri, and Johannes von Römer. 2025. “The V-Dem Measurement Model: Latent Variable Analysis for Cross-National and Cross-Temporal Expert-Coded Data”. V-Dem Working Paper No. 21. 10th edition. University of Gothenburg: Varieties of Democracy Institute.
Marquardt, K., & Pemstein, D. 2021. Estimating latent traits from expert surveys: An analysis of sensitivity to data-generating process. Political Science Research and Methods, 1-10.