May 09, 2018

Why Ethics are Important in Data Science

In 2013, New York City’s Health and Human Services division embarked on a noble project: use data to ensure that homeless families were paired with appropriate public services. To start, the division’s data group was tasked with building a model to predict, based on a family’s characteristics, how long they might stay within the system. The first task? Determine which characteristics to use in their model.

The available 30 years of historical data included variables such as the number and age of family members, previous ZIP code, the number and lengths of previous stays in homeless services—and race.

Including race probably would have made the model more accurate, but in the end the team found it unethical to include it. And here’s why – given the history of racism in the U.S., it’s likely that outcomes for homeless families over the past 30 years have been biased. And if we used that biased data to build an algorithm, the model doesn’t know which outcomes were due to injustice and which were due to individual choices. Using that old data, the model might have shown that families with a black head of household were less likely to find employment, which could have meant they would receive less job training for black homeless families today.

The way data scientists build models can have real implications for justice, health, and opportunity in people’s lives. And we have an obligation to consider the ethics of our discipline each and every day.

Data Science for the Power of Good

When built correctly, algorithms can have massive power to do good in the world. When we use them for tasks that previously had to be done by a person, the benefits can be huge: cost savings, scalability, speed, accuracy, and consistency. And using a system that is more accurate and consistent than a human being, that means results tend to be more fair and less subject to social bias.

Crisis Text Line, a free 24/7 support for those in crisis, uses data to make sure the most at-risk people receive help first. They also examine the data for patterns to learn the most effective ways to provide counseling via text and use it to improve their team’s training.

And the Data-Driven Justice Initiative seeks to break the cycle of American incarceration by using data to match those with mental illness, substance abuse and health problems to the resources they need and stay out of our overcrowded jails.

Solutions like these not only save money—they are saving lives as well.

What Could Go Wrong?

Sadly, if we’re not actively addressing ethics in computer science, data used incorrectly can also cause unintended harm.

Personal data such as passwords, photographs, and location information can fall into the wrong hands. Predictive models used for policing and sentencing can reinforce stereotypes and have adverse racial or socioeconomic implications. Economic opportunity in the form of school admissions, job hiring, and loan approval can be denied. Healthcare decisions could be made incorrectly, compromising a person’s health and even their life. And of course, our democratic systems themselves can be undermined when data scientists harness the power of data to sow mistrust and discord.

Ethical Obstacles

Even the most kindhearted, well-intentioned data scientist can make unethical decisions. It’s easy if you’re not on guard.

For starters, people tend to view data as objective by its very nature. We tend to forget that it’s only as accurate and objective as the people and processes used to generate and collect it in the first place.

Secondly, modern machine learning tools are so complex, they are difficult for humans to interpret and understand. That makes it difficult to determine appropriate inputs and ethical implications of results. It’s almost like the answer is coming to us from a Data Deity, which we don’t quite understand but give it our implicit trust and awe.

And finally, most data scientists are trained in disciplines like applied mathematics, computer science, or statistics. In fields like these, data science is used mostly for research and academic theory, rather than to inform real-world behaviors that affect people’s lives. At Loyola, we embrace the philosophy of cura personalis, or care for the whole person, which means we’re always looking at the implications of our work in our neighbors, communities, and the world.

Our Charge

For a working data scientist, there isn’t much opportunity to discover the impact of your work. You probably won’t be writing an academic research paper to learn from the experience. You probably won’t see how your algorithms are deployed, or learn about any of its impacts beyond whatever result the model was originally created for.

Do social media algorithms encourage addictive behavior among users? Does a predictive policing model improve outcomes for the people it targets? Do teacher assessments actually improve how our children are being taught? In your data science career, you may be paid to create the social media algorithm, policing model, or teacher assessment, but likely won’t have an opportunity to investigate these questions.

So is it too late for big data ethics?

Not if data scientists like us are diligent up front, particularly in terms of preserving privacy, mitigating the risk of attack, and avoiding bias in our models.

After all, it’s our goal to create automated processes that affect people’s lives. Let’s use our power to do some good.