News

Board of Overseers Candidates Question Selection Process

News

Day 2 of Harvard Yard Encampment Faces Greater Threat of Disciplinary Action

News

Harvard Islamic Studies Program X Account Reinstated After University Intervenes

News

‘Appalling’: Keynote Speaker at Legacy of Slavery Symposium Calls for Faster Repatriation of Indigenous Remains

News

The State Legislature Is Considering an Endowment Tax. Experts Say It Could ‘Cripple’ Harvard

Harvard Researchers Identify Accuracy Concerns in Census Bureau’s New Privacy System

By Delano R. Franklin
By Kate N. Guerin, Crimson Staff Writer

Harvard Government and Statistics researchers found in a study published last month that a new method used by the United States Census Bureau to increase privacy could potentially bias data used for redistricting.

The Census Bureau introduced a new Disclosure Avoidance System for the data from the 2020 Census, which uses differential privacy to increase privacy protections through the addition of “noise” to Census microdata.

The Harvard researchers used computer simulations inputted with the proposed DAS parameters — which were released in late April — to generate numerous potential redistricting maps using available 2010 Census data. Prior to the 2020 Census, the Census Bureau swapped the data of some households with others to protect privacy.

Government and Statistics professor Kosuke Imai, the study’s corresponding author, said the DAS uses a “very complicated post-processing method” to facilitate the use of the data for redistricting.

“But the problem of that is that no longer the noise that’s added is symmetric, so it adds some bias, but it’s hard to know exactly how those biases are being created,” Imai said.

Investigating the effects of DAS on redistricting and democratic elections, the study found that DAS would make it “impossible” for map drawers to create precise districts of equal populations at the block level in accordance with the One Person, One Vote principle, which ensures that every person’s vote is equally represented across districts.

“Under the privacy protections of former censuses, block-level populations were exact — so exact meaning whatever the Census Bureau counted and figured was the most likely number is what was released,” co-author and Government Ph.D. student Christopher T. Kenny said.

“Now, we’re under this new system, which will have different populations at the block level from what the census actually believes the total number of people in that block is,” Kenny said. “This kind of puts a new spin on 54, 55 years of Supreme Court precedent here.”

According to the study, using the proposed DAS parameters from April, any deviations from truly equal districts will be underreported by several-fold.

Additionally, the researchers also found that under the then-proposed DAS model, areas that are racially and/or politically heterogeneous are under-counted, leading to a potential over-estimation of the degree of racial and political segregation across the nation.

“The DAS tends to introduce more error for minority groups than for White voters, and even more error for voters who are in a minority group for their Census block, which is more common for minority voters as well,” the study reads.

According to the researchers, underrepresenting racially and politically heterogeneous areas would make it more challenging to identify partisan gerrymandering, properly allocate federal funds, and conduct meaningful academic research.

The researchers showed that the DAS system also does not prevent algorithms from inferring the race of voters from names and addresses. Rather, the researchers were “able to predict the individual race of registered voters at least as accurately using the DAS-protected data as when using the original Census data.”

“So when you start to have a system that’s sometimes doubling or halving the population of small cities and towns — all in the name of stopping people from knowing the race of a respondent— I think it’s very valid to ask, ‘Okay, is that the right cost-benefit trade off?’” said co-author and Statistics Ph.D. candidate Cory W. McCartan.

Last Wednesday, the Census Bureau announced finalized settings that will be used for the DAS system this August to assist with redistricting based upon the 2020 Census data. In a press release, the Bureau thanked research groups for providing valuable feedback through the development of the DAS algorithm.

“The decisions strike the best balance between the need to release detailed, usable statistics from the 2020 Census with our statutory responsibility to protect the privacy of individuals’ data,” Ron Jarmin, director of the U.S. Census Bureau, said in the press release. “They were made after many years of research and candid feedback from data users and outside experts – whom we thank for their invaluable input.”

The press release noted that the DAS development team addressed concerns over bias toward racially or ethnically homogenous areas, and those changes were integrated into the new parameters.

Kenny said he was disappointed the Census Bureau did not utilize the study’s recommendation to keep block populations “true to whatever their best account was.”

“In our report, we recommend that they should — if they’re going to use the algorithm that they are currently trying to use — that they should be trying to keep block populations invariant,” Kenny said. “They will not be improving the accuracy of block populations, which to me is a very disappointing result.”

The Census Bureau wrote in the press release that it was unable to implement all feedback on the parameters it put forward in April.

“For example, some data users recommended nearly perfect accuracy in block-level data, which we are unable to achieve because it would undermine the ability to implement a functional disclosure avoidance system,” the Census Bureau wrote. “We are both legally and ethically bound to protect the privacy of the data provided by and on behalf of our respondents.”

Imai commended the Census Bureau’s transparency, even though it only released data parameters late in the process.

“I think the Census did the right thing, which is to release these demonstration datasets and ask people like us to analyze,” Imai said. “In some sense I wish they’d done that earlier — because we [were] also given only one month, so it was pretty hectic for us to put things together. At a distance from the process I think it was a good, very transparent way of making an important public policy decision.”

–Staff writer Kate N. Guerin can be reached at kate.guerin@thecrimson.com.

Want to keep up with breaking news? Subscribe to our email newsletter.

Tags
Research