admin管理员组

文章数量:1122850

Algorithmic Accountability:

A Legal and Economic Framework

Robert P. Bartlett, III1 2 Adair Morse2 Nancy Wallace2 Richard Stanton°

Abstract

Despite the potential for machine learning and artificial intelligence to reduce face-to-face bias in decision-making, a growing chorus of scholars and policymakers have recently voiced concerns that if left unchecked, algorithmic decision-making can also lead to unintentional discrimination against members of historically marginalized groups. These concerns are being expressed through Congressional subpoenas, regulatory investigations, and an increasing number of algorithmic accountability bills pending in both state legislatures and Congress. To date, however, prominent efforts to define policies whereby an algorithm can be considered accountable have tended to focus on output-oriented policies and interventions that either may facilitate illegitimate discrimination or involve fairness corrections unlikely to be legally valid.

We provide a workable definition of algorithmic accountability that is rooted in the caselaw addressing statistical discrimination in the context of Title VII of the Civil Rights Act of 1964. Using instruction from the burden-shifting framework, codified to implement Title VII, we formulate a simple statistical test to apply to the design and review of the inputs used in any algorithmic decision-making processes. Application of the test, which we label the input accountability test, constitutes a legally viable, deployable tool that can prevent an algorithmic model from systematically penalizing members of protected groups who are otherwise qualified in a target characteristic of interest.

Algorithmic Accountability:

A Legal and Economic Framework

table of contents

Page

  • I. INTRODUCTION

  • II. Accountability Under U.S. Antidiscrimination Law

  • A. Accountability and the Burden-Shifting Framework of Title VII

  • B. The Input Accountability Test (IAT) Versus Outcome-Oriented

Approaches

  • C. The Input Accountability Test Versus the ‘Least Discriminatory

Alternative’ Test

  • D. The Input Accountability Test Versus HUD’s Mere Predictive Test. 24

  • III. The Input Accountability Test

  • A. The Test

  • B. The Test in Regression Form

  • C. Challenges in Implementing the Test

  • i. Unobservability of the Target Variable

  • ii. Measurement Error in the Target

  • iii. Testing for ‘Not Statistically Correlated’

  • iv. Nonlinearities or Interactions Among Proxies

  • D. Simulation

  • i. Set Up

  • ii. Applying the Input Accountability Test

  • IV . Applications beyond Employment

  • A. Domains with Court-Defined Business Necessity Targets

  • B. Domains Without Court-Defined Business Necessity Targets

  • C. Self-Determining Business Necessity

  • V . Conclusion

Appendix

I. INTRODUCTION

In August 2019, Apple Inc. debuted its much-anticipated Apple Card, a no fee, cash-rewards credit card “designed to help customers lead a healthier financial life.”3 Within weeks of its release, Twitter was abuzz with headlines that the card’s credit approval algorithm was systematically biased against women.4 Even Apple co-founder Steve Wozniak weighed in, tweeting that the card gave him a credit limit that was ten times higher than what it gave his wife, despite the couple sharing all their assets.5 In the days that followed, Goldman Sachs—Apple’s partner in designing the Apple Card—steadfastly defended the algorithm, insisting that “we have not and will not make decisions based on factors on gender.”6 Yet doubts persisted. By November, the New York State Department of Financial Services had announced an investigation into the card’s credit approval practices.7

Around that same time, buzz spread across the media about another algorithm, that of health insurer UnitedHealth.8 The algorithm was used to inform hospitals about patients’ level of sickness so that hospitals could more effectively allocate resources to the sickest patients. However, an article appearing in Science showed that because the company used cost of care as the metric for gauging sickness and because African-American patients historically incurred lower costs for the same illnesses and level of illness, the algorithm caused them to receive substandard care as compared to white patients.9

Despite the potential for algorithmic decision-making to eliminate face-to-face biases, these episodes provide vivid illustrations of the widespread concern that algorithms may nevertheless engage in objectionable discrimination.10 Indeed, a host of regulatory reforms have emerged to contend with this challenge. For example, New York City has enacted an algorithm accountability law, which creates a task force to recommend procedures for determining whether automated decisions by city agencies disproportionately impact protected groups.11 Likewise, the Washington State House of Representatives introduced an algorithm accountability bill, which would require the state’s chief information officer assess whether any automated decision system used by a state agency “has a known bias, or is untested for bias.”12 Federally, the Algorithmic Accountability Act of 2019, which is currently pending in Congress, would require large companies to audit their algorithms for “risks that [they] may result in or contribute to inaccurate, unfair, biased, or discriminatory decisions impacting consumers.”13

Yet, a notable absence in these legislative efforts is a formal standard for courts or regulators to deploy in evaluating algorithmic decision-making, raising the fundamental question: What exactly does it mean for an algorithm to be accountable? The urgency of this question follows from the meteoric growth in algorithmic decision-making, spawned by the availability of unprecedented data on individuals and the accompanying rise in techniques in machine learning and artificial intelligence.14

In this Article, we provide an answer to the pressing question of what accountability is, and we put forward a workable test that regulators, courts, and data scientists can apply in examining whether an algorithmic decisionmaking process complies with long-standing antidiscrimination statutes and caselaw. Central to our framework is the recognition that, despite the novelty of artificial intelligence and machine learning, existing U.S. antidiscrimination law has long provided a workable definition of accountability dating back to Title VII of the Civil Rights Act of 1964.15

Title VII and the caselaw interpreting it define what it means for any decision-making process—whether human or machine—to be accountable under U.S. antidiscrimination law. At the core of this caselaw is the burdenshifting framework initially articulated by the Supreme Court in Griggs v. Duke Power Co)16 Under this framework, plaintiffs putting forth a claim of unintentional discrimination under Title VII must demonstrate that a particular decision-making practice (e.g., a hiring practice) lands disparately on members of a protected group.17 If successful, the framework then demands that the burden shift to the defendant to show that the practice is “consistent with business necessity.”18 If the defendant satisfies this requirement, the burden returns to the plaintiff to show that an equally valid and less discriminatory practice was available that the employer refused to use.19 The focus of Title VII is on discrimination in the workplace, but the analytical framework that emerged from the Title VII context now spans other domains and applies directly to the type of unintentional, statistical discrimination utilized in algorithmic decision-making.20

Despite the long tradition of applying this framework to cases of statistical discrimination, it is commonly violated in the context of evaluating the discriminatory impact of algorithmic decision-making. Instead, for many, the legality of any unintentional discrimination resulting from an algorithmic model is presumed to depend on simply the accuracy of the model—that is, the ability of the model to predict a characteristic of interest (e.g., productivity or credit risk) generally referred to as the model’s “target.”21 An especially prominent example of this approach appears in the Department of Housing and Urban Development’s 2019 proposed rule revising the application of the disparate impact framework under the Fair Housing Act (FHA) for algorithmic credit scoring.22 The proposed rule provides that, after a lender shows that the proxy variables used in an algorithm do not substitute for membership in protected group, the lender may defeat a discrimination claim by showing that the model is “predictive of risk or other valid objective.”23 Yet this focus on predictive accuracy ignores how courts have applied the Griggs framework in the context of statistical discrimination.

To see why, consider the facts of the Supreme Court’s 1977 decision in Dothardv. Rawlinson.24 There, a prison system desired to hire job applicants who possessed a minimum level of strength to perform the job of a prison guard, but the prison could not directly observe which applicants satisfied this requirement.25 Consequently, the prison imposed a minimum height and weight requirement on the assumption that these observable characteristics were correlated with the requisite strength required for the job.26 In so doing, the prison was thus engaging in statistical discrimination: It was basing its hiring decision on the statistical correlation between observable proxies (an applicant’s height and weight) and the unobservable variable of business necessity (an applicant’s job-required strength).

Because this procedure resulted in adverse hiring outcomes for female applicants, a class of female applicants brought suit under Title VII for gender discrimination.27 Deploying the burden-shifting framework, the Supreme Court first concluded that the plaintiffs satisfied the disparate outcome step,28 and it also concluded that the prison had effectively argued that hiring applicants with the requisite strength could constitute a business necessity.29 However, the Court ultimately held that the practice used to discern strength—relying on the proxy variables of height and weight—did not meet the “consistent with business necessity” criteria.30 Rather, absent evidence showing the precise relationship between the height and weight requirements to “the requisite amount of strength thought essential to good job performance,”31 height and weight were noisy estimates of strength that risked penalizing females over-and-above these variables’ relation to the prison’s business necessity goal. In other words, height and weight were likely to be biased estimates of required strength whose use by the prison risked systematically penalizing female applicants who were, in fact, qualified.

The Court thus illustrated that in considering a case of statistical discrimination, the “consistent with business necessity” step requires the assessment of two distinct questions. First, is the unobservable “target” characteristic (e.g., requisite strength) one that can justify disparities in hiring outcomes across members of protected and unprotected groups? Second, even with a legitimate target variable, are the proxy “input” variables used to predict the target noisy estimates that are biased in a fashion that will systematically penalize members of a protected group who are otherwise qualified? In this regard, the Court’s holding echoes the long-standing prohibition against redlining in credit markets. A lender who engages in redlining refuses to lend to residents of a majority-minority neighborhood on the assumption that the average unobservable credit risk of its residents is higher than those of observably-similar but non-minority neighborhoods.32 Yet while differences in creditworthiness can be a legitimate basis for racial or ethnic disparities to exist in lending under the FHA,33 courts have consistently held that the mere fact that one’s neighborhood is correlated with predicted credit risk is insufficient to justify red-lining.34 By assuming that all residents of minority neighborhoods have low credit, redlining systematically penalizes minority borrowers who actually have high credit worthiness.

These two insights from Dothard—that statistical discrimination must be grounded in the search for a legitimate target variable and that the input proxy variables for the target cannot systematically discriminate against members of a protected group who are qualified in the target—remain as relevant in today’s world of algorithmic decision-making as they were in 1977. The primary task for courts, regulators, and data scientists is to adhere to them in the use of big data implementations of algorithmic decisions (e.g., in employment, performance assessment, credit, sentencing, insurance, medical treatment, college admissions, advertising, etc.).

Fortunately, the caselaw implementing the Title VII burden-shifting framework, viewed through basic principles of statistics, provides a way forward. This is our central contribution: We recast the logic that informs Dothard and courts’ attitude towards redlining into a formal statistical test that can be widely deployed in the context of algorithmic decision-making. We label it the Input Accountability Test (IAT).

As we show, the IAT provides a simple and direct diagnostic that a data scientist or regulator can apply to determine whether an algorithm is accountable under U.S. antidiscrimination principles. For instance, a statistician seeking to deploy the IAT could do so by turning to the same training data that she used to calibrate the predictive model of a target. In settings such as employment or lending where courts have explicitly articulated a legitimate business target (e.g., a job required skill or creditworthiness),35 the first step would be determining that the target is, in fact, a business necessity variable. Second, taking a proxy variable (e.g., height) that her predictive model utilizes, she would next decompose the proxy’s variation across individuals into that which correlates with the target variable and an error component. Finally, she would test whether that error component remains correlated with the protected category (e.g., gender). If a proxy used to predict a legitimate target variable is unbiased with respect to a protected group, it will pass the IAT, even if the use of the proxy disparately impacts members of protected groups. In this fashion, the test provides a concrete method to harness the benefits of statistical discrimination with regard to predictive accuracy while avoiding the risk that it systematically penalizes members of a protected group who are, in fact, qualified in the target characteristic of interest.

We provide an illustration of the IAT in the Dothard setting, not only to provide a clear depiction of the power of the test, but also to introduce several challenges in implementing it and suggested solutions. These challenges include multiple incarnations of measurement error in the target, as exemplified by the UnitedHeath use of cost as a target, rather than the degree of illness, mentioned previously. These challenges also include understanding what “significantly correlated” means in our era of massive datasets. We offer an approach that may serve as a way forward. Beyond the illustration, we also provide a simulation of the test using a randomly constructed training dataset of 800 prison employees.

Finally, we illustrate how the IAT can be deployed by courts, regulators, and data scientists. In addition to employment, we list a number of other sectors - including credit, parole determination, home insurance, school and scholarship selection, and tenant selection - where either caselaw or statutes have provided explicit instructions regarding what can constitute a legitimate business necessity target.36 We also discuss other domains such as automobile insurance and health care where claims of algorithmic discrimination have recently surfaced, but where existing discrimination laws are less clear whether liability can arise for unintentional discrimination. Businesses in these domains are thus left to self-regulating and have generally professed to adhering to non-discriminatory business necessity targets.37 For firms with an express target delineation (whether court-formalized or self-imposed), our IAT provides a tool to pre-test their models.

We highlight, however, that firm profit margins and legitimate business necessity targets can easily be confounded in the design of machine learning algorithms, especially in the form of exploiting consumer demand elasticities (e.g., profiling consumer shopping behavior).38 In lending, for instance, courts have repeatedly held that creditworthiness is the sole business necessity target that can justify outcomes that differ across protected and unprotected groups.39 Yet, newly-advanced machine learning techniques make it possible to use alternative targets, such as a borrower’s proclivity for comparing loan products, that focus on a lender’s profit margins in addition to credit risk. In other work, we provide empirical evidence consistent with FinTech algorithms’ engaging in such profiling, with the result that minority borrowers face higher priced loans, holding constant the price impact of borrowers’ credit risk.40 As such, these findings illustrate how the incentive of firms to use shopping behavior as a target can lead to discrimination in lending—a practice that could be detected by application of the IAT.41 Profiling for shopping behavior is a subject applicable to many settings beyond the lending context and a leading topic for future research and discourse.

Our approach differs from other approaches to “algorithmic fairness” that focus solely on ensuring fair outcomes across protected and unprotected groups.42 43 As we show, by failing to distinguish disparities that arise from a biased proxy from those disparities that arise from the distribution of a legitimate target variable, these approaches can themselves run afoul of U.S. antidiscrimination law. In particular, following the Supreme Court’s 2009 decision in Ricci v. DeStefanos" efforts to calibrate a decision-making process to equalize outcomes across members of protected and unprotected groups—regardless of whether individuals are qualified in a legitimate target of interest—are likely to be deemed impermissible intentional discrimination.44

This Article proceeds as follows. In Part 2, we begin by articulating a definition for algorithmic accountability that is at the core of our input accountability test. As we demonstrate there, our definition of algorithmic accountability is effectively a test for “unbiasedness,” which differs from various proposals for “algorithmic fairness” that are commonly found in the statistics and computer science literatures. Building on this definition of algorithmic accountability, Part 3 formally presents the IAT. The test is designed to provide a workable tool for data scientists and regulators to use to distinguish between legitimate and illegitimate discrimination. The test is directly responsive to the recent regulatory and legislative interest in understanding algorithmic accountability, while being consistent with longstanding U.S. antidiscrimination principles. Part 4 follows by exploring how the IAT can likewise be applied in other settings where algorithmic decisionmaking has come to play an increasingly important role. Part 5 concludes.

II. Accountability Under U.S. Antidiscrimination Law

A. Accountability and the Burden-Shifting Framework of Title VII

We ground our definition of accountability in the antidiscrimination principles of Title VII of the Civil Rights Act of 1964.45 Title VII, which focuses on the labor market, makes it “an unlawful employment practice for an employer (1) to ... discriminate against any individual with respect to his compensation, terms, conditions, or privileges of employment, because of such individual’s race, color, sex, or national origin; or (2) to limit, segregate, or classify his employees or applicants for employment in any way which would deprive or tend to deprive any individual of employment opportunities ... because of such individual’s race, color, religion, sex, or national origin.”46 Similar conceptualizations of antidiscrimination law were later written to apply to other settings, such as the prohibition of discrimination in mortgage lending under the FHA.47

In practice, Title VII has been interpreted as covering two forms of impermissible discrimination. The first and “the most easily understood type of discrimination”48 falls under the disparate-treatment theory of discrimination and requires that a plaintiff alleging discrimination prove “that an employer had a discriminatory motive for taking a job-related action.”49 Additionally, Title VII also covers practices which “in some cases, ... are not intended to discriminate but in fact have a disproportionately adverse effect on minorities.”50 These cases are usually brought forth under the disparateimpact theory of discrimination and allow for an employer to be liable for “facially neutral practices that, in fact, are ‘discriminatory in operation,’” even if unintentional.51

Critically, in cases where discrimination lacks an intentional motive, an employer can be liable only for disparate outcomes that are unjustified. The process of how disparities across members of protected and unprotected groups might be justified is articulated in the burden-shifting framework initially formulated by the Supreme Court in Griggs v. Duke Power Co.52 and subsequently codified by Congress in 1991.53 This delineation is central to the definition of accountability in today’s era of algorithms.

Under the burden-shifting framework, a plaintiff alleging discrimination under a claim without intentional motive bears the first burden. The plaintiff must identify a specific employment practice that causes “observed statistical disparities”54 across members of protected and unprotected groups.55 If the plaintiff succeeds in establishing this evidence, the burden shifts to the defendant.56 The defendant must then “demonstrate that the challenged practice is job related for the position in question and consistent with business necessity.”57 If the defendant satisfies this requirement, then “the burden shifts back to the plaintiff to show that an equally valid and less discriminatory practice was available that the employer refused to use.”58

This overview highlights two core ideas that inform what it means for a decision-making process to be accountable under U.S. antidiscrimination law. First, in the case of unintentional discrimination, disparate outcomes must be justified by reference to a legitimate “business necessity.”59 In the context of employment hiring, for instance, this is typically understood to be a job-related skill that is required for the position.60 Imagine, for instance, an employer who made all hiring decisions based on applicant’s level of a direct measure of the job-related skill necessary for the job. Even if the outcome of these decision-making processes results in disparate outcomes across minority and non-minority applicants, these disparities would be justified as nondiscriminatory with respect to a protected characteristic.

Second, in invalidating a decision-making process, U.S. antidiscrimination law does so because of invalid “inputs” rather than invalid “outputs” or results. This feature of U.S. antidiscrimination law is most evident in the case of disparate treatment claims involving the use by a decision-maker of a protected category in making a job-related decision. For instance, Section (m) of the 1991 Civil Rights Act states that “an unlawful employment practice is established when the complaining party demonstrates that race, color, religion, sex, or national origin was a motivating factor for any employment practice, even though other factors also motivated the practice.”61 However, this focus on inputs is also evident in cases alleging disparate impact, notwithstanding the doctrine’s initial requirement that a plaintiff allege disparate outcomes across members of protected and unprotected groups. Recall that even with evidence of disparate outcomes, an employer that seeks to defend against a claim of disparate impact discrimination must demonstrate why these outcomes were the result of a decision-making process based on legitimate business necessity factors (i.e., the disparate outcomes were the result of legitimate decision-making inputs).62 This focus on “inputs” underscores the broader policy objective of ensuring a decision-making process that is not discriminatory.

The practical challenge in implementing this antidiscrimination regime is that the critical decision-making input—an individual’s possession of a job-related skill—cannot be perfectly observed at the moment of a decision, inducing the decision-maker to turn to proxies for it. However, the foregoing discussion highlights that the objective in evaluating these proxy variables should be the same: ensuring that qualified minority applicants are not being systematically passed over for the job or promotion. As summarized by the Supreme Court in Ricci v. DeStefano, “[t]he purpose of Title VII ‘is to promote hiring on the basis of job qualifications, rather than on the basis of race or color.’”63

This objective, of course, is the basis for prohibiting the direct form of statistical discrimination famously examined by economists Kenneth Arrow64 and Edmund Phelps.65 In their models, an employer uses a job applicant’s race as a proxy for the applicant’s expected productivity because the employer assumes that the applicant possesses the average productivity of his or her race. If the employer also assumes the average productivity of minority applicants is lower than non-minorities (e.g., because of long-standing social and racial inequalities), this proxy will ensure that above-average productive minorities will systematically be passed over for the job despite being qualified for it. Because this practice creates a direct and obvious bias against minorities, this practice is typically policed under the disparate treatment theory of discrimination.66

Beyond this clearly unlawful form of statistical discrimination, a decision-maker can use statistical discrimination to incorporate not just the protected-class variable but also other proxy variables for the businessnecessity unobservable attributes. For instance, an employer might seek to predict a job applicant’s productivity based on other observable characteristics that the employer believes are correlated with future productivity, such as an applicant’s level of education or an applicant’s performance on a personality or cognitive ability test.67 Indeed, it is the possibility of using data mining to discern new and unintuitive correlations between an individual’s observable characteristics and a target variable of interest (e.g., productivity or creditworthiness) that has contributed to the dramatic growth in algorithmic decision-making.68 The advent of data mining has meant that thousands of such proxy variables are sometimes used.69

As the UnitedHealth algorithm revealed, however, the use of these proxy variables can result in members of a protected class experiencing disparate outcomes. The problem arises from what researchers call “redundant encodings”—the fact that a proxy variable can be predictive of a legitimate target variable and membership in a protected group.70 Moreover, there are social and economic factors that make one’s group membership correlated with virtually any observable proxy variable. As one proponent of predictive policy declared, “If you wanted to remove everything correlated with race, you couldn’t use anything. That’s the reality of life in America.”71 At the same time certain proxy variables may predict membership in a protected group over-and-above their ability to predict a legitimate target variable; relying on these proxy variables therefore risks penalizing members of the protected group who are otherwise qualified in the legitimate target variable.72 In short, algorithmic accountability requires a method to limit the use of redundantly encoded proxy variables to those that are consistent with the anti-discrimination principles of Title VII of the Civil Rights Act and to prohibit the use of those that are not.73

Our central contribution is in developing accountability input criteria that speak directly to the process demanded by Title VII. Specifically, we use these accountability input criteria to develop a statistical test for whether a proxy variable (or each proxy variable in a set of proxy variables) is being used in a way that causes illegitimate statistical discrimination and should therefore not be used in an algorithmic model. Fundamentally, it is a test for “unbiasedness” designed to ensure that the use of a proxy input variable does not systematically penalize members of a protected group who are otherwise qualified with respect to a legitimate-business-necessity objective. We refer to this test as the input-accountability test. We illustrate the test and its application with a simple pre-employment screening exam designed to infer whether a job applicant possesses sufficient strength to perform a particular job. Before doing so, however, we differentiate the input-accountability test from other approaches to algorithmic accountability.

B. The Input Accountability Test Versus Outcome-Oriented Approaches

Our input-based approach differs significantly from that of other scholars who have advanced outcome-oriented approaches to algorithmic accountability. For instance, Talia Gillis and Jann Spiess have argued that the conventional focus in fair lending on restricting invalid inputs (such as a borrower’s race or ethnicity) is infeasible in the machine-learning context.74 The reason, according to Gillis and Spiess, is because a predictive model of default that excludes a borrower’s race or ethnicity can still penalize minority borrowers if one of the included variables (e.g., borrower education) is correlated with both default and race.75 Gillis and Spiess acknowledge the possibility that one could seek to exclude from the model some of these correlated variables on this basis, but they find this approach infeasible given that “a major challenge of this approach is the required articulation of the conditions under which exclusion of data inputs is necessary.”76 They therefore follow the burgeoning literature within computer science on “algorithmic fairness”75 and advocate evaluating the outcomes from an algorithm against some baseline criteria to determine whether the outcomes are fair.76 As examples, they suggest a regulator might simply examine whether loan prices differ across members of protected or unprotected groups, or a regulator might look at whether “similarly situated” borrowers from the protected and nonprotected groups were treated differently.77

Gillis and Spiess are, of course, correct that simply prohibiting an algorithm from considering a borrower’s race or ethnicity will not eliminate the risk that the algorithm will be biased against minority borrowers in a way that is unrelated to their creditworthiness (which is a legitimate-businessnecessity variable).78 Indeed, we share this concern about redundant encodings, and it motivates our empirical test. However, we part ways with these authors in that we do not view as insurmountable the challenge of articulating the conditions for excluding variables that are correlated with a protected classification, as we illustrate in Part 3.

Equally important, it is with an outcome-based approach rather than with an input-based approach where one encounters the greatest conceptual and practical challenges for algorithmic accountability. As Richard Berk and others have noted, efforts to make algorithmic outcomes “fair” pose the challenge that there are multiple definitions of fairness, and many of these

  • 75 For a summary, see Sam Corbett-Davies and Sharad Goel, The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning (arXiv, August 2018), available at https://arxiv/pdf/1808.00023.pdf. In particular, a common approach to algorithmic fairness within computer science is to evaluate the fairness of a predictive algorithm by use of a “confusion matrix.” Id. at 4. A confusion matrix is a cross-tabulation of actual outcomes by the predicted outcome. For instance, the confusion matrix for an algorithm that classified individuals as likely to default on a loan would appear as follows:

    Default Predicted

    No Default Predicted

    Default Occurs

    # Correctly Classified as

    # Incorrectly Classified as Non

    Defaulting = Ntp

    Defaulting = Nfn

    (True Positives)

    (False Negatives)

    Default Does Not

    # Incorrectly Classified as

    # Correctly Classified as Non

    Occur

    Defaulting = Nfp

    Defaulting = Ntn

    (False Positives)

    (True Negatives)

Using this table, one could then evaluate the fairness of the classifier by inquiring whether classification error is equal across members of protected and unprotected groups. Id. at 5. For example, one could use as a baseline fairness criterion a requirement that the classifier have the same false positive rate (i.e., Nfp / (Nfp + Ntn)) for minority borrowers as for non-minority borrowers. Alternatively, one could use as a baseline a requirement of treatment equality (e.g., the ratio of False Positives to False Negatives) across members of protected and unprotected groups.

本文标签: AccountabilityAlgorithmicLegalFrameworkEconomic