Equality of Opportunity in Supervised Learning
Abstract
We propose a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features. Assuming data about the predictor, target, and membership in the protected group are available, we show how to optimally adjust any learned predictor so as to remove discrimination according to our definition. Our framework also improves incentives by shifting the cost of poor classification from disadvantaged groups to the decision maker, who can respond by improving the classification accuracy.
In line with other studies, our notion is oblivious: it depends only on the joint statistics of the predictor, the target and the protected attribute, but not on interpretation of individual features. We study the inherent limits of defining and identifying biases based on such oblivious measures, outlining what can and cannot be inferred from different oblivious tests.
We illustrate our notion using a case study of FICO credit scores.
1 Introduction
As machine learning increasingly affects decisions in domains protected by antidiscrimination law, there is much interest in algorithmically measuring and ensuring fairness in machine learning. In domains such as advertising, credit, employment, education, and criminal justice, machine learning could help obtain more accurate predictions, but its effect on existing biases is not well understood. Although reliance on data and quantitative measures can help quantify and eliminate existing biases, some scholars caution that algorithms can also introduce new biases or perpetuate existing ones [BS16]. In May 2014, the Obama Administration’s Big Data Working Group released a report [PPM14] arguing that discrimination can sometimes “be the inadvertent outcome of the way big data technologies are structured and used” and pointed toward “the potential of encoding discrimination in automated decisions”. A subsequent White House report [Whi16] calls for “equal opportunity by design” as a guiding principle in domains such as credit scoring.
Despite the demand, a vetted methodology for avoiding discrimination against protected attributes in machine learning is lacking. A naïve approach might require that the algorithm should ignore all protected attributes such as race, color, religion, gender, disability, or family status. However, this idea of “fairness through unawareness” is ineffective due to the existence of redundant encodings, ways of predicting protected attributes from other features [PRT08].
Another common conception of nondiscrimination is demographic parity. Demographic parity requires that a decision—such as accepting or denying a loan application—be independent of the protected attribute. In the case of a binary decision and a binary protected attribute this constraint can be formalized by asking that In other words, membership in a protected class should have no correlation with the decision. Through its various equivalent formalizations this idea appears in numerous papers. Unfortunately, as was already argued by Dwork et al. [DHP12], the notion is seriously flawed on two counts. First, it doesn’t ensure fairness. Indeed, the notion permits that we accept qualified applicants in the demographic , but unqualified individuals in so long as the percentages of acceptance match. This behavior can arise naturally, when there is little or no training data available within Second, demographic parity often cripples the utility that we might hope to achieve. Just imagine the common scenario in which the target variable —whether an individual actually defaults or not—is correlated with Demographic parity would not allow the ideal predictor which can hardly be considered discriminatory as it represents the actual outcome. As a result, the loss in utility of introducing demographic parity can be substantial.
In this paper, we consider nondiscrimination from the perspective of supervised learning, where the goal is to predict a true outcome from features based on labeled training data, while ensuring they are “nondiscriminatory” with respect to a specified protected attribute . As in the usual supervised learning setting, we assume that we have access to labeled training data, in our case indicating also the protected attribute . That is, to samples from the joint distribution of . This data is used to construct a predictor or , and we also use such data to test whether it is unfairly discriminatory.
Unlike demographic parity, our notion always allows for the perfectly accurate solution of More broadly, our criterion is easier to achieve the more accurate the predictor is, aligning fairness with the central goal in supervised learning of building more accurate predictors.
The notion we propose is “oblivious”, in that it is based only on the joint distribution, or joint statistics, of the true target , the predictions , and the protected attribute . In particular, it does not evaluate the features in nor the functional form of the predictor nor how it was derived. This matches other tests recently proposed and conducted, including demographic parity and different analyses of common risk scores. In many cases, only oblivious analysis is possible as the functional form of the score and underlying training data are not public. The only information about the score is the score itself, which can then be correlated with the target and protected attribute. Furthermore, even if the features or the functional form are available, going beyond oblivious analysis essentially requires subjective interpretation or casual assumptions about specific features, which we aim to avoid.
1.1 Summary of our contributions
We propose a simple, interpretable, and actionable framework for measuring and removing discrimination based on protected attributes. We argue that, unlike demographic parity, our framework provides a meaningful measure of discrimination, while demonstrating in theory and experiment that we also achieve much higher utility. Our key contributions are as follows:

We propose an easily checkable and interpretable notion of avoiding discrimination based on protected attributes. Our notion enjoys a natural interpretation in terms of graphical dependency models. It can also be viewed as shifting the burden of uncertainty in classification from the protected class to the decision maker. In doing so, our notion helps to incentivize the collection of better features, that depend more directly on the target rather then the protected attribute, and of data that allows better prediction for all protected classes.

We give a simple and effective framework for constructing classifiers satisfying our criterion from an arbitrary learned predictor. Rather than changing a possibly complex training pipeline, the result follows via a simple postprocessing step that minimizes the loss in utility.

We show that the Bayes optimal nondiscriminating (according to our definition) classifier is the classifier derived from any Bayes optimal (not necessarily nondiscriminating) regressor using our postprocessing step. Moreover, we quantify the loss that follows from imposing our nondiscrimination condition in case the score we start from deviates from Bayesian optimality. This result helps to justify the approach of deriving a fair classifier via postprocessing rather than changing the original training process.

We capture the inherent limitations of our approach, as well as any other oblivious approach, through a nonidentifiability result showing that different dependency structures with possibly different intuitive notions of fairness cannot be separated based on any oblivious notion or test.
Throughout our work, we assume a source distribution over , where is the target or true outcome (e.g. “default on loan”), are the available features, and is the protected attribute. Generally, the features may be an arbitrary vector or an abstract object, such as an image. Our work does not refer to the particular form has.
The objective of supervised learning is to construct a (possibly randomized) predictor that predicts as is typically measured through a loss function. Furthermore, we would like to require that does not discriminate with respect to , and the goal of this paper is to formalize this notion.
2 Equalized odds and equal opportunity
We now formally introduce our first criterion.
Definition 2.1 (Equalized odds).
We say that a predictor satisfies equalized odds with respect to protected attribute and outcome if and are independent conditional on
Unlike demographic parity, equalized odds allows to depend on but only through the target variable As such, the definition encourages the use of features that allow to directly predict but prohibits abusing as a proxy for
As stated, equalized odds applies to targets and protected attributes taking values in any space, including binary, multiclass, continuous or structured settings. The case of binary random variables and is of central importance in many applications, encompassing the main conceptual and technical challenges. As a result, we focus most of our attention on this case, in which case equalized odds are equivalent to:
For the outcome the constraint requires that has equal true positive rates across the two demographics and For the constraint equalizes false positive rates. The definition aligns nicely with the central goal of building highly accurate classifiers, since is always an acceptable solution. However, equalized odds enforces that the accuracy is equally high in all demographics, punishing models that perform well only on the majority.
2.1 Equal opportunity
In the binary case, we often think of the outcome as the “advantaged” outcome, such as “not defaulting on a loan”, “admission to a college” or “receiving a promotion”. A possible relaxation of equalized odds is to require nondiscrimination only within the “advantaged” outcome group. That is, to require that people who pay back their loan, have an equal opportunity of getting the loan in the first place (without specifying any requirement for those that will ultimately default). This leads to a relaxation of our notion that we call “equal opportunity”.
Definition 2.2 (Equal opportunity).
We say that a binary predictor satisfies equal opportunity with respect to and if
Equal opportunity is a weaker, though still interesting, notion of nondiscrimination, and thus typically allows for stronger utility as we shall see in our case study.
2.2 Realvalued scores
Even if the target is binary, a realvalued predictive score is often used (e.g. FICO scores for predicting loan default), with the interpretation that higher values of correspond to greater likelihood of and thus a bias toward predicting . A binary classifier can be obtained by thresholding the score, i.e. setting for some threshold . Varying this threshold changes the tradeoff between sensitivity (true positive rate) and specificity (true negative rate).
Our definition for equalized odds can be applied also to such score functions: a score satisfies equalized odds if is independent of given . If a score obeys equalized odds, then any thresholding of it also obeys equalized odds (as does any other predictor derived from alone).
In Section 4, we will consider scores that might not satisfy equalized odds, and see how equalized odds predictors can be derived from them and the protected attribute , by using different (possibly randomized) thresholds depending on the value of . The same is possible for equality of opportunity without the need for randomized thresholds.
2.3 Oblivious measures
As stated before, our notions of nondiscrimination are oblivious in the following formal sense.
Definition 2.3.
A property of a predictor or score is said to be oblivious if it only depends on the joint distribution of or , respectively.
As a consequence of being oblivious, all the information we need to verify our definitions is contained in the joint distribution of predictor, protected group and outcome, In the binary case, when and are reasonably well balanced, the joint distribution of is determined by parameters that can be estimated to very high accuracy from samples. We will therefore ignore the effect of finite sample perturbations and instead assume that we know the joint distribution of
3 Comparison with related work
There is much work on this topic in the social sciences and legal scholarship; we point the reader to Barocas and Selbst [BS16] for an excellent entry point to this rich literature. See also the survey by Romei and Ruggieri [RR14], and the references at http://www.fatml.org/resources.html.
In its various equivalent notions, demographic parity appears in many papers, such as [CKP09, Zli15, BZVGRG15] to name a few. Zemel et al. [ZWS13] propose an interesting way of achieving demographic parity by aiming to learn a representation of the data that is independent of the protected attribute, while retaining as much information about the features as possible. Louizos et al. [LSL15] extend on this approach with deep variational autoencoders. Feldman et al. [FFM15] propose a formalization of “limiting disparate impact”. For binary classifiers, the condition states that The authors argue that this corresponds to the “80% rule” in the legal literature. The notion differs from demographic parity mainly in that it compares the probabilities as a ratio rather than additively, and in that it allows a onesided violation of the constraint.
While simple and seemingly intuitive, demographic parity has serious conceptual limitations as a fairness notion, many of which were pointed out in work of Dwork et al. [DHP12]. In our experiments, we will see that demographic parity also falls short on utility. Dwork et al. [DHP12] argue that a sound notion of fairness must be taskspecific, and formalize fairness based on a hypothetical similarity measure requiring similar individuals to receive a similar distribution over outcomes. In practice, however, in can be difficult to come up with a suitable metric. Our notion is taskspecific in the sense that it makes critical use of the final outcome while avoiding the difficulty of dealing with the features
In a recent concurrent work, Kleinberg, Mullainathan and Raghavan [KMR16] showed that in general a score that is calibrated within each group does not satisfy a criterion equivalent to equalized odds for binary predictors. This result highlights that calibration alone does not imply nondiscrimination according to our measure. Conversely, achieving equalized odds may in general compromise other desirable properties of a score.
Early work of Pedreshi et al. [PRT08] and several followup works explore a logical rulebased approach to nondiscrimination. These approaches don’t easily relate to our statistical approach.
4 Achieving equalized odds and equality of opportunity
We now explain how to find an equalized odds or equal opportunity predictor derived from a, possibly discriminatory, learned binary predictor or score We envision that or are whatever comes out of the existing training pipeline for the problem at hand. Importantly, we do not require changing the training process, as this might introduce additional complexity, but rather only a postlearning step. In particular, we will construct a nondiscriminating predictor which is derived from or :
Definition 4.1 (Derived predictor).
A predictor is derived from a random variable and the protected attribute if it is a possibly randomized function of the random variables alone. In particular, is independent of conditional on
The definition asks that the value of a derived predictor should only depend on and the protected attribute, though it may introduce additional randomness. But the formulation of (that is, the function applied to the values of and ), depends on information about the joint distribution of In other words, this joint distribution (or an empirical estimate of it) is required at training time in order to construct the predictor , but at prediction time we only have access to values of . No further data about the underlying features , nor their distribution, is required.
Loss minimization.
It is always easy to construct a trivial predictor satisfying equalized odds, by making decisions independent of and . For example, using the constant predictor or . The goal, of course, is to obtain a good predictor satisfying the condition. To quantify the notion of “good”, we consider a loss function that takes a pair of labels and returns a real number which indicates the loss (or cost, or undesirability) of predicting when the correct label is Our goal is then to design derived predictors that minimize the expected loss subject to one of our definitions.
4.1 Deriving from a binary predictor
We will first develop an intuitive geometric solution in the case where we adjust a binary predictor and is a binary protected attribute The proof generalizes directly to the case of a discrete protected attribute with more than two values. For convenience, we introduce the notation
(4.1) 
The first component of is the false positive rate of within the demographic satisfying Similarly, the second component is the true positive rate of within Observe that we can calculate given the joint distribution of The definitions of equalized odds and equal opportunity can be expressed in terms of , as formalized in the following straightforward Lemma:
Lemma 4.2.
A predictor satisfies:

equalized odds if and only if and

equal opportunity if and only if and agree in the second component, i.e.,
For consider the twodimensional convex polytope defined as the convex hull of four vertices:
(4.2) 
Our next lemma shows that and characterize exactly the tradeoffs between false positives and true positives that we can achieve with any derived classifier. The polytopes are visualized in Figure 1.
Lemma 4.3.
A predictor is derived if and only if for all we have
Proof.
Since a derived predictor can only depend on and these variables are binary, the predictor is completely described by four parameters in corresponding to the probabilities for Each of these parameter choices leads to one of the points in and every point in the convex hull can be achieved by some parameter setting. ∎
Combining Lemma 4.2 with Lemma 4.3, we see that the following optimization problem gives the optimal derived predictor with equalized odds:
(4.3)  
(derived)  
(equalized odds) 
Figure 1 gives a simple geometric picture for the solution of the linear program whose guarantees are summarized next.
Proposition 4.4.
The optimization problem (4.3) is a linear program in four variables whose coefficients can be computed from the joint distribution of Moreover, its solution is an optimal equalized odds predictor derived from and
Proof of Proposition 4.4.
The second claim follows by combining Lemma 4.2 with Lemma 4.3. To argue the first claim, we saw in the proof of Lemma 4.3 that a derived predictor is specified by four parameters and the constraint region is an intersection of twodimensional linear constraints. It remains to show that the objective function is a linear function in these parameters. Writing out the objective, we have
Further,
All probabilities in the last line that do not involve can be computed from the joint distribution. The probabilities that do involve are each a linear function of the parameters that specify ∎
4.2 Deriving from a score function
We now consider deriving nondiscriminating predictors from a real valued score . The motivation is that in many realistic scenarios (such as FICO scores), the data are summarized by a onedimensional score function and a decision is made based on the score, typically by thresholding it. Since a continuous statistic can carry more information than a binary outcome , we can hope to achieve higher utility when working with directly, rather then with a binary predictor .
A “protected attribute blind” way of deriving a binary predictor from would be to threshold it, i.e. using . If satisfied equalized odds, then so will such a predictor, and the optimal threshold should be chosen to balance false positive and false negatives so as to minimize the expected loss. When does not already satisfy equalized odds, we might need to use different thresholds for different values of (different protected groups), i.e. . As we will see, even this might not be sufficient, and we might need to introduce additional randomness as in the preceding section.
Central to our study is the ROC (Receiver Operator Characteristic) curve of the score, which captures the false positive and true positive (equivalently, false negative) rates at different thresholds. These are curves in a two dimensional plane, where the horizontal axes is the false positive rate of a predictor and the vertical axes is the true positive rate. As discussed in the previous section, equalized odds can be stated as requiring the true positive and false positive rates, (), agree between different values of of the protected attribute. That is, that for all values of the protected attribute, the conditional behavior of the predictor is at exactly the same point in this space. We will therefor consider the conditional ROC curves
Since the ROC curves exactly specify the conditional distributions , a score function obeys equalized odds if and only if the ROC curves for all values of the protected attribute agree, that is for all values of and . In this case, any thresholding of yields an equalized odds predictor (all protected groups are at the same point on the curve, and the same point in false/truepositive plane).
When the ROC curves do not agree, we might choose different thresholds for the different protected groups. This yields different points on each conditional ROC curve. For the resulting predictor to satisfy equalized odds, these must be at the same point in the false/truepositive plane. This is possible only at points where all conditional ROC curves intersect. But the ROC curves might not all intersect except at the trivial endpoints, and even if they do, their point of intersection might represent a poor tradeoff between false positive and false negatives.
As with the case of correcting a binary predictor, we can use randomization to fill the span of possible derived predictors and allow for significant intersection in the false/truepositive plane. In particular, for every protected group , consider the convex hull of the image of the conditional ROC curve:
(4.4) 
The definition of is analogous to the polytope in the previous section, except that here we do not consider points below the main diagonal (line from to ), which are worse than “random guessing” and hence never desirable for any reasonable loss function.
Deriving an optimal equalized odds threshold predictor.
Any point in the convex hull represents the false/true positive rates, conditioned on , of a randomized derived predictor based on . In particular, since the space is only twodimensional, such a predictor can always be taken to be a mixture of two threshold predictors (corresponding to the convex hull of two points on the ROC curve). Conditional on the predictor behaves as
where is a randomized threshold assuming the value with probability and the value with probability . In other words, to construct an equalized odds predictor, we should choose a point in the intersection of these convex hulls, , and then for each protected group realize the true/falsepositive rates with a (possible randomized) predictor resulting in the predictor . For each group , we either use a fixed threshold or a mixture of two thresholds . In the latter case, if and we always set , if we always set , but if , we flip a coin and set with probability .
The feasible set of false/true positive rates of possible equalized odds predictors is thus the intersection of the areas under the conditional ROC curves, and above the main diagonal (see Figure 2). Since for any loss function the optimal false/truepositive rate will always be on the upperleft boundary of this feasible set, this is effectively the ROC curve of the equalized odds predictors. This ROC curve is the pointwise minimum of all conditional ROC curves. The performance of an equalized odds predictor is thus determined by the minimum performance among all protected groups. Said differently, requiring equalized odds incentivizes the learner to build good predictors for all classes. For a given loss function, finding the optimal tradeoff amounts to optimizing (assuming w.l.o.g. ):
(4.5) 
This is no longer a linear program, since are not polytopes, or at least are not specified as such. Nevertheless, (4.5) can be efficiently optimized numerically using ternary search.
Deriving an optimal equal opportunity threshold predictor.
The construction follows the same approach except that there is one fewer constraint. We only need to find points on the conditional ROC curves that have the same true positive rates in both groups. Assuming continuity of the conditional ROC curves, this means we can always find points on the boundary of the conditional ROC curves. In this case, no randomization is necessary. The optimal solution corresponds to two deterministic thresholds, one for each group. As before, the optimization problem can be solved efficiently using ternary search over the target true positive value. Here we use, as Figure 2 illustrates, that the cost of the best solution is convex as a function of its true positive rate.
5 Bayes optimal predictors
In this section, we develop the theory a theory for nondiscriminating Bayes optimal classification. We will first show that a Bayes optimal equalized odds predictor can be obtained as an derived threshold predictor of the Bayes optimal regressor. Second, we quantify the loss of deriving an equalized odds predictor based on a regressor that deviates from the Bayes optimal regressor. This can be used to justify the approach of first training classifiers without any fairness constraint, and then deriving an equalized odds predictor in a second step.
Definition 5.1 (Bayes optimal regressor).
Given random variables and a target variable the Bayes optimal regressor is with
The Bayes optimal classifier, for any proper loss, is then a threshold predictor of where the threshold depends on the loss function (see, e.g., [Was10]). We will extend this result to the case where we additionally ask the classifier to satisfy an oblivious property, such as our nondiscrimination properties.
Proposition 5.2.
For any source distribution over with Bayes optimal regressor , any loss function, and any oblivious property , there exists a predictor such that:

is an optimal predictor satisfying . That is, for any predictor which satisfies .

is derived from