Most of us, unless we’re insurance actuaries or Wall Street quantitative analysts, have only a vague notion of algorithms and how they work. But they actually affect our daily lives by a considerable amount. Algorithms are a set of instructions followed by computers to solve problems. The hidden algorithms of Big Data might connect you with a great music suggestion on Pandora, a job lead on LinkedIn or the love of your life on Match.com.

These mathematical models are supposed to be neutral. But former Wall Street quant Cathy O’Neil, who had an insider’s view of algorithms for years, believes that they are quite the opposite. In her book, *Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy*, O’Neil says these WMDs are ticking time-bombs that are well-intended but ultimately reinforce harmful stereotypes, especially of the poor and minorities, and become “secret models wielding arbitrary punishments.”

**Models and Hunches**

Algorithms are not the exclusive focus of *Weapons of Math Destruction. *The focus is more broadly on mathematical models of the world — and on why some are healthy and useful while others grow toxic. Any model of the world, mathematical or otherwise, begins with a hunch, an instinct about a deeper logic beneath the surface of things. Here is where the human element, and our potential for bias and faulty assumptions, creeps in. To be sure, a hunch or working thesis is part of the scientific method. In this phase of inquiry, human intuition can be fruitful, provided there is a mechanism by which those initial hunches can be tested and, if necessary, corrected.

O’Neil cites the new generation of baseball metrics (a story told in Michael Lewis’s *Moneyball*) as a healthy example of this process. *Moneyball* began with Oakland A’s General Manager Billy Beane’s hunch that using performance metrics such as runs batted in (RBIs) were overrated, while other more obscure measures (like on base percentage) were better predictors of overall success. Statistician Bill James began crunching the numbers and putting together models that Beane could use in his decisions about which players to acquire and hold onto, and which to let go.

While sports enthusiasts love to debate the issue, this method of evaluating talent is now widely embraced across baseball, and gaining traction in other sports as well. The *Moneyball* model works, O’Neil says, for a few simple reasons. First, it is relatively transparent: Anyone with basic math skills can grasp the inputs and outputs. Second, its objectives (more wins) are clear, and appropriately quantifiable. Third, there is a self-correcting feedback mechanism: a constant stream of new inputs and outputs by which the model can be honed and refined.

These WMDs are ticking time-bombs that are well-intended but ultimately reinforce harmful stereotypes, especially of the poor and minorities.

Where models go wrong, the author argues, all three healthy attributes are often lacking. The calculations are opaque; the objectives attempt to quantify that which perhaps should not be; and feedback loops, far from being self-correcting, serve only to reinforce faulty assumptions.

**WMDs on Wall Street**

After earning a doctorate in mathematics at Harvard and then teaching at Barnard College, O’Neil got a job at the hedge fund D.E. Shaw. At first, she welcomed the change of pace from academia and viewed hedge funds as “morally neutral — scavengers in the financial system, at worst.” Hedge funds didn’t create markets like those for mortgage-backed securities, in which complicated derivatives played a key part in the financial crisis — they just “played in them.”

But as the subprime mortgage crisis spread, and eventually engulfed Lehman Bros., which owned a 20% stake in D.E. Shaw, the internal mood at the hedge fund “turned fretful.” Concern grew that the scope of the looming crisis might be unprecedented — and something that couldn’t be accounted for by their mathematical models. She eventually realized, as did others, that math was at the center of the problem.

The cutting-edge algorithms used to assess the risk of mortgage-backed securities became a smoke screen. Their “mathematically intimidating” design camouflaged the true level of risk. Not only were these models opaque; they lacked a healthy feedback mechanism. Importantly, the risk assessments were verified by credit-rating agencies that collected fees from the same companies that were peddling those financial products. This was a mathematical model that checked all the boxes of a toxic WMD.

Disenchanted, O’Neil left Shaw in 2009 for RiskMetrics Group, which provides risk analysis for banks and other financial services firms. But she felt that people like her who warned about risk were viewed as a threat to the bottom line. A few years later, she became a data scientist for a startup called Intent Media, analyzing web traffic and designing algorithms to help online companies maximize e-commerce. O’Neil saw disturbing similarities in the use of algorithms in finance and Big Data.

In both worlds, sophisticated mathematical models lacked truly self-correcting feedback. They were driven primarily by the market. So if a model led to maximum profits, it was on the right track. “Otherwise, why would the market reward it?” Yet that reliance on the market had produced disastrous results on Wall Street in 2008. Without countervailing analysis to ensure that efficiency was balanced with concern for fairness and truth, the “misuse of mathematics” would only accelerate in hidden but devastating ways. O’Neil left the company to devote herself to providing that analysis.

**Misadventures in Education**

Ever since the passage of the No Child Left Behind Act in 2002 mandating expanded use of standardized tests, there has been a market for analytical systems to crunch all the data generated by those tests. More often than not, that data has been used to try to identify “underperforming” teachers. However well-intentioned, O’Neil finds these models promise a scientific precision they can’t deliver, victimizing good teachers and creating incentives for behavior that does nothing to advance the cause of education.

In 2009, the Washington D.C. school system implemented a teacher assessment tool called IMPACT. Using a complicated algorithm, IMPACT measured the progress of students and attempted to isolate the extent to which their advance (or decline) could be attributed to individual teachers. The lowest-scoring teachers each year were fired — even when the targeted teachers had received excellent evaluations from parents and the principal.

O’Neil examines a similar effort to evaluate teacher performance in New York City. She profiles a veteran teacher who scored a dismal 6 out of 100 on the new test one year, only to rebound the next year to 96. One critic of the evaluations found that, of teachers who had taught the same subject in consecutive years, 1 in 4 registered a 40-point difference from year to year.

The cutting-edge algorithms used to assess the risk of mortgage-backed securities became a smoke screen.

There is little transparency in these evaluation models, O’Neil writes, making them “arbitrary, unfair, and deaf to appeals.” Whereas a company like Google has the benefit of large sample sizes and constant statistical feedback allowing them to immediately identify and correct errors, teacher evaluation systems attempt to render judgments based on annual tests of just a few dozen students. Moreover, there is no way to assess mistakes. If a good teacher is wrongly fired and goes on to be a great teacher at another school, that “data” is never accounted for.

**In the Workplace**

Teachers are hardly alone. In the face of slow growth, companies are looking everywhere for an edge. Because personnel decisions are among the most significant for a firm, “workforce management” has become big business – in particular, programs that screen potential employees and promise to take “the guesswork” out of hiring. Increasingly, these programs utilize personality tests in an effort to automate the hiring process. Consulting firm Deloitte estimates that such tests are used on 60% to 70% of prospective employees in the U.S., nearly double the figure from five years ago.

The prevalence of personality tests runs counter to research that consistently ranks them as poor predictors of future job performance. Yet they generate raw data that can be plugged into algorithms that provide an illusion of scientific precision, all in the service of an efficient hiring process. But as O’Neil writes, these programs lack transparency and rejected employees rarely know why they’ve been flagged, or even that they’ve been flagged at all. They also lack a healthy feedback mechanism — a means of identifying errors and using those mistakes to refine the system.

Once on the job, a growing number of workers are subject to another iteration of Big Data, in the form of scheduling software. Constant streams of data — everything from the weather to pedestrian patterns — can be used, for example, to optimize staffing at a Starbucks café. A *New York Times* profile of a single mother working her way through college as a barista explored how the new technology can create chaos, especially in the lives of low-income workers. According to U.S. government data, two-thirds of food service workers consistently get short-term notice of scheduling changes.

This instability can have far-reaching and insidious effects, O’Neil says. Haphazard scheduling can make it difficult to stay in school, keeping vulnerable workers in the oversupplied low-wage labor pool. “It’s almost as if the software were designed expressly to punish low-wage workers and keep them down,” she writes. And chaotic schedules have ripple effects on the next generation as well. “Young children and adolescents of parents working unpredictable schedules,” the Economic Policy Institute finds, “are more likely to have inferior cognition and behavioral outcomes.”

Following the exposé in the *Times, *legislation was introduced in Congress to rein in scheduling software, but didn’t go anywhere.

**Crime and Punishment**

Often, as with both educational reform and new hiring practices, the use of Big Data initially comes with the best of intentions. Recognizing the role of unconscious bias in the criminal justice system, courts in 24 states are using computerized models to help judges assess the risk of recidivism during the sentencing process. By some measures, according to O’Neil, this system represents an improvement. But by attempting to quantify and nail down with precision what are at root messy human realities, she argues, they create new problems.

A new, pseudoscientific generation of scoring has proliferated wildly. … Yet unlike FICO scores, they are “arbitrary, unaccountable, unregulated, and often unfair.”

One popular model includes a lengthy questionnaire designed to pinpoint factors related to the risk of recidivism. Questions might inquire about previous police incidents; and, given how much more frequently young black males are stopped by police, such a question can come to be a proxy for race, even while the intention is to reduce prejudice. Additional questions, such as whether the respondent’s friends or relatives have criminal records, would elicit an objection from a defense attorney if raised during a trial, O’Neil points out. But the opaqueness of these complicated risk models shields them from proper scrutiny.

Another trend is the use of crime prediction software to anticipate crime patterns, and adjust police deployment accordingly. But one underlying problem with WMDs, the author argues, is that they essentially become data hungry, confusing more data with better data. And in the case of crime prediction software, even though the stated priority is to prevent violent and serious crime, the data generated by petty “nuisance” crimes can overwhelm and essentially prejudice the system. “Once the nuisance data flows into a predictive model, more police are drawn into those neighborhoods, where they’re more likely to arrest more people.” These increased arrests seem to justify the policy in the first place, and in turn feed back into the recidivism models used in sentencing: a destructive and “pernicious feedback loop,” as O’Neil characterizes it.

**The Cancer of Credit Scores**

In the wake of a financial crisis that was at the very least exacerbated by loose credit, banks are understandably trying to be more rigorous in their assessment of risk. An early risk assessment algorithm, the well-known FICO score, is not without its problems; but for the most part, O’Neil writes, it is an example of a healthy mathematical model. It is relatively transparent; it is regulated; and it has a clear feedback loop. If default rates don’t jibe with what the model predicts, credit agencies can tweak them.

In recent years, however, a new, pseudoscientific generation of scoring has proliferated wildly. “Today we’re added up in every conceivable way as statisticians and mathematicians patch together a mishmash of data, from our zip codes and internet surfing patterns to our recent purchases.” Crunching this data generates so-called “e-scores” used by countless companies to determine our creditworthiness, among other qualities. Yet unlike FICO scores, they are “arbitrary, unaccountable, unregulated, and often unfair.”

A huge “data marketplace” has emerged in which credit scores and e-scores are used in a variety of applications, from predatory advertising to hiring screening. In this sea of endless data, the author contends, the line between legitimate and specious measures has become hopelessly blurred. As one startup proclaims on its website, “All data is credit data.”

It’s all part of a larger process in which “we’re batched and bucketed according to secret formulas, some of them fed by portfolios loaded with errors.” According to the Consumer Federation of America, e-scores and other data are used to slice and dice consumers into “microsegments” and to target vulnerable groups with predatory pricing for insurance and other financial products.

And as companies gain access to GPS and other mobile data, the possibilities for this kind of micro-targeting will only grow exponentially. As insurance companies and others “scrutinize the patterns of our lives and our bodies, they will sort us into new types of tribes. But these won’t be based on traditional metrics, such as age, gender, net worth, or zip code. Instead, they’ll be behavioral tribes, generated almost entirely by machines.”

**Reforming Big Data**

In her conclusion, O’Neil argues we need to “disarm” the Weapons of Math Destruction, and that the first step for doing so is to conduct “algorithmic audits” to unpack the black boxes of these mathematical models. They are, again, opaque and impenetrable by design, and often protected as proprietary intellectual property.

Toward this end, Princeton University has launched WebTAP, the Web Transparency and Accountability Project. Carnegie Mellon and MIT are home to similar initiatives. In the end, O’Neil writes, we must realize that the mathematical models which have penetrated almost every aspect of our lives “are constructed not just from data but from the choices we make about which data to pay attention to… These choices are not just about logistics, profits, and efficiency. They are fundamentally moral.”

## Join The Discussion

2 Comments So Far## Roger Bohn

Very interesting. I will check it out for a course I teach.

In many areas of education, the effort to enforce quantitative decisions on things we don’t understand has led to major problems.

Who wrote this review?

## Anumakonda Jagadeesh

Outstanding.

Analysis of algorithms

For looking up a given entry in a given ordered list, both the binary and the linear search algorithm (which ignores ordering) can be used. The analysis of the former and the latter algorithm shows that it takes at most log2(n) and n check steps, respectively, for a list of length n. In the depicted example list of length 33, searching for”Morin, Arthur” takes 5 and 28 steps with binary (shown in cyan) and linear (magenta) search, respectively.

In computer science, the analysis of algorithms is the determination of the amount of time, storage and/or other resources necessary to execute them. Usually, this involves determining a function that relates the length of an algorithm’s input to the number of steps it takes (its time complexity) or the number of storage locations it uses (its space complexity). An algorithm is said to be efficient when this function’s values are small. Since different inputs of the same length may cause the algorithm to have different behavior, the function describing its performance is usually an upper bound on the actual performance, determined from the worst case inputs to the algorithm.

The term “analysis of algorithms” was coined by Donald Knuth. Algorithm analysis is an important part of a broader computational complexity theory, which provides theoretical estimates for the resources needed by any algorithm which solves a given computational problem. These estimates provide an insight into reasonable directions of search for efficient algorithms.

In theoretical analysis of algorithms it is common to estimate their complexity in the asymptotic sense, i.e., to estimate the complexity function for arbitrarily large input. Big O notation, Big-omega notation and Big-theta notationare used to this end. For instance, binary search is said to run in a number of steps proportional to the logarithm of the length of the sorted list being searched, or in O(log(n)), colloquially “in logarithmic time”. Usually asymptoticestimates are used because different implementations of the same algorithm may differ in efficiency. However the efficiencies of any two “reasonable” implementations of a given algorithm are related by a constant multiplicative factor called a hidden constant.

Exact (not asymptotic) measures of efficiency can sometimes be computed but they usually require certain assumptions concerning the particular implementation of the algorithm, called model of computation. A model of computation may be defined in terms of an abstract computer, e.g., Turing machine, and/or by postulating that certain operations are executed in unit time. For example, if the sorted list to which we apply binary search has nelements, and we can guarantee that each lookup of an element in the list can be done in unit time, then at most log2 n + 1 time units are needed to return an answer.

Cost models

Time efficiency estimates depend on what we define to be a step. For the analysis to correspond usefully to the actual execution time, the time required to perform a step must be guaranteed to be bounded above by a constant. One must be careful here; for instance, some analyses count an addition of two numbers as one step. This assumption may not be warranted in certain contexts. For example, if the numbers involved in a computation may be arbitrarily large, the time required by a single addition can no longer be assumed to be constant.

Two cost models are generally used:

• the uniform cost model, also called uniform-cost measurement (and similar variations), assigns a constant cost to every machine operation, regardless of the size of the numbers involved

• the logarithmic cost model, also called logarithmic-cost measurement (and similar variations), assigns a cost to every machine operation proportional to the number of bits involved

The latter is more cumbersome to use, so it’s only employed when necessary, for example in the analysis of arbitrary-precision arithmetic algorithms, like those used in cryptography.

A key point which is often overlooked is that published lower bounds for problems are often given for a model of computation that is more restricted than the set of operations that you could use in practice and therefore there are algorithms that are faster than what would naively be thought possible.

Run-time analysis

Run-time analysis is a theoretical classification that estimates and anticipates the increase in running time (or run-time) of an algorithm as its input size (usually denoted as n) increases. Run-time efficiency is a topic of great interest in computer science: A program can take seconds, hours, or even years to finish executing, depending on which algorithm it implements. While software profiling techniques can be used to measure an algorithm’s run-time in practice, they cannot provide timing data for all infinitely many possible inputs; the latter can only be achieved by the theoretical methods of run-time analysis9Wikipedia).

Shortcomings of empirical metrics

Since algorithms are platform-independent (i.e. a given algorithm can be implemented in an arbitrary programming language on an arbitrary computer running an arbitrary operating system), there are additional significant drawbacks to using an empirical approach to gauge the comparative performance of a given set of algorithms.

Take as an example a program that looks up a specific entry in a sorted list of size n. Suppose this program were implemented on Computer A, a state-of-the-art machine, using a linear search algorithm, and on Computer B, a much slower machine, using a binary search algorithm. Benchmark testing on the two computers running their respective programs might look something like the following:

“Algorithms are often elegant and incredibly useful tools used to accomplish tasks. They are mostly invisible aids, augmenting human lives in increasingly incredible ways. However, sometimes the application of algorithms created with good intentions leads to unintended consequences. Recent news items tie to these concerns:

The British pound dropped 6.1% in value in seconds on Oct. 7, 2016, partly because of currency trades triggered by algorithms.

Microsoft engineers created a Twitter bot named “Tay” this past spring in an attempt to chat with Millennials by responding to their prompts, but within hours it was spouting racist, sexist, Holocaust-denying tweets based on algorithms that had it “learning” how to respond to others based on what was tweeted at it.

Facebook tried to create a feature to highlight Trending Topics from around the site in people’s feeds. First, it had a team of humans edit the feature, but controversy erupted when some accused the platform of being biased against conservatives. So, Facebook then turned the job over to algorithms only to find that they could not discern real news from fake news.

Cathy O’Neil, author of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, pointed out that predictive analytics based on algorithms tend to punish the poor, using algorithmic hiring practices as an example.

Well-intentioned algorithms can be sabotaged by bad actors. An internet slowdownswept the East Coast of the U.S. on Oct. 21, 2016, after hackers bombarded Dyn DNS, an internet traffic handler, with information that overloaded its circuits, ushering in a new era of internet attacks powered by internet-connected devices. This after internet security expert Bruce Schneier warned in September that “Someone Is Learning How to Take Down the Internet.” And the abuse of Facebook’s News Feed algorithm and general promulgation of fake news online became controversial as the 2016 U.S. presidential election proceeded.

Researcher Andrew Tutt called for an “FDA for Algorithms,” noting, “The rise of increasingly complex algorithms calls for critical thought about how to best prevent, deter and compensate for the harms that they cause …. Algorithmic regulation will require federal uniformity, expert judgment, political independence and pre-market review to prevent – without stifling innovation – the introduction of unacceptably dangerous algorithms into the market.”

The White House released two reports in October 2016 detailing the advance of algorithms and artificial intelligence and plans to address issues tied to it, and it issued a December report outlining some of the potential effects of AI-driven automation on the U.S. job market and economy.

On January 17, 2017, the Future of Life Institute published a list of 23 Principles for Beneficial Artificial Intelligence, created by a gathering of concerned researchers at a conference at Asimolar, in Pacific Grove, California. The more than 1,600 signatories included Steven Hawking, Elon Musk, Ray Kurzweil and hundreds of the world’s foremost AI researchers.

The use of algorithms is spreading as massive amounts of data are being created, captured and analyzed by businesses and governments. Some are calling this the Age of Algorithmsand predicting that the future of algorithms is tied to machine learning and deep learningthat will get better and better at an ever-faster pace”.( Code-Dependent: Pros and Cons of the Algorithm Age, LEE RAINIE AND JANNA ANDERSON,Pew Research Centre).

Dr.A.Jagadeesh Nellore(AP),India