Technical articles

Focus on the propensity score


When you are implementing some of the studies managed by Soladis, you may need to calculate a “propensity score”.

What is this score? When and why is it used?

When testing the efficacy of a new treatment, the aim is to estimate its effect in a way that is as unbiased as possible. To achieve this, the effect should be estimated with “comparable” groups, which differ “only” in terms of the treatment taken, not in terms of the individual characteristics of the study subjects. Let’s consider an example where the groups are not comparable in terms of the stage of disease: if the patients in the treatment group are more severely affected, this may influence the estimated treatment effect, and it could be wrongly concluded that the treatment is not effective. This is why most clinical trials are randomised: random assignment helps ensure comparable groups with regard to all factors (measured and unmeasured).

Unfortunately, randomised clinical trials cannot always be implemented (high cost, random treatment assignment posing ethical problems, etc.), as can observational studies. In this type of study, the investigators do not have control over treatment assignment, which is often influenced by the baseline characteristics of the patients. This potentially results in unbalanced patient groups for the observed and unobserved variables. The groups are then “non-comparable” and this induces a bias in the estimation of the treatment effect (selection bias).

The propensity score method can then be used to overcome this problem.

The propensity score is calculated for each individual in the study before the treatment effect is estimated. It represents the patient’s probability of receiving the treatment, given their characteristics at the start of the study. It is mainly estimated using logistic regression. It aims to explain assignment of the treatment (variable representing the group) according to the baseline characteristics of the patients. However, in this calculation step, not all baseline covariates should automatically be included in the model; despite a lack of consensus in the literature, it is advisable to include covariates that are related both to the study endpoint and to treatment assignment (true confounders) and/or those related only to the endpoint.

The propensity score method is applicable under certain conditions/assumptions (all true confounders must be measured; treatment assignment is strongly “ignorable” conditional on the baseline covariates; treatment assignment is independent from one patient to the next). When these assumptions are met, then two patients from different groups with the same propensity score are considered “pseudo-randomised”: their treatment assignment resembles a random mechanism.

Once the propensity score has been estimated for each patient, it can be used in various ways: matching, stratification, inverse weighting, and propensity score adjustment. The choice of method depends on the objective of the study: to estimate the treatment effect in the treated population (average treatment effect for the treated [ATT]) or in the overall population (average treatment effect [ATE]).

In order to check whether the propensity score correctly balances the patients’ characteristics, the standardised difference comparing the mean or prevalence of each covariate between the two groups is often calculated. It is therefore necessary to study higher-order moments (variance ratio, QQ plots, etc.). Standardised differences for the interactions between the covariates can also be studied.

The most important covariates may not be balanced after the propensity score has been applied. If this is the case, the propensity score can be recalculated by introducing interactions or polynomial factors to the variables in the model; this method is iterative and can be repeated several times until balance is reached. If, despite all these steps, balance is still not achieved, it can be concluded that the groups are simply not comparable.

Before validating these calculations, a sensitivity analysis is recommended to ensure that no confounders are missing.

Once the propensity score has been calculated, it is important to take into account the nature of the method used when estimating the treatment effect via a statistical analysis. Indeed, in the event of matching, for example, the two samples (treatment group and control group) are not independent since one or more treated individuals are associated with one or more untreated individuals based on their similarity in terms of baseline characteristics. A correlation therefore needs to be taken into account when estimating the treatment effect.


P. R. Rosenbaum and D. B.Rubin, “The central role of the propensity score in observational studies for causal effects”, Biometrika, 1983.

P. C. Austin, “An Introduction to Propensity Score Methods for Reducing Effects of Confounding in Observational Studies”, Multivariate Behavioral Research, 2011.

Gayat and Porcher, “Comparaison de l’efficacité de deux thérapeutiques en l’absence de randomisation : intérêts et limites des méthodes utilisant les scores de propension”, Réanimation, 2012.

R. E. Lanehart, P. R. d. Gil, E. S. Kim, A. P. Bellara, J. D. Kromrey and a. R. S. Lee, “Propensity score analysis and assessment of propensity score approaches using SAS procedures”, SAS Global Forum – Statistics and Data Analysis. Paper 314-2012, 2012.

E. A. Stuart, “Matching methods for causal inference: A review and a look forward”, Statistical science: a review journal of the Institute of Mathematical Statistics, vol. 25, no. 11, p. 1, 2010.