Duke professor shows how to identify gerrymandered voting maps
Duke University math professor Jonathan Christopher Mattingly explains tools his research team has developed to help people identify gerrymandering as new congressional and legislative voting districts are drawn this fall.
Welcome to the session. I'm honored to introduce this year's I. E. Block Community lecture. I am Susanne Brenner the currents, I am president. The I. E. Block Community Lecture and award was established in 1995 to encourage public appreciation of the excitement and vitality of applied mathematics. This annual lecture is open to the public And is named in honor of I. Edward Block. A founder of Siam, who served as its first managing director for nearly 20 years. This year's block lecture will be given by Jonathan Mattingly, who is currently James B. Duke Professor of mathematics and professor of statistical science at Duke University. Jonathan received a bachelor's degree from Yale University and a master's degree and PhD and applied in computational mathematics from Princeton University. He held positions at Stanford University and the Institute for Advanced Study at Princeton University before moving the Duke in 2003. His honors include a Sloan fellowship and a P case career award. He's a fellow of the Institute of Mathematical Statistics IMs and the american Mathematical Society, the Arms. He's well known for his work on the effects of randomness and complex systems and fluid dynamics, molecular dynamics and biochemical networks Since 2013. He's also carried out research on quantifying gerrymandering, which is the title of his talk today. Can you hear the will of the people in the vote assessing fairness and redistrict redistricting through sampling. Thank you very much present for that introduction. And it's my great honor to be here both for these lectures. Lecture named after Ed Block, but also because this is a community lecture and I'm a great believer that mathematics needs to talk more about what it does um publicly. So before I get going, I want to give a shout out to Greg Hirsch log, my co author and much of this and thrust the quantifying gerrymandering team at Duke. So this talk is about democracy, is about how people vote and then that vote, it is translated into elective officials that act as our surrogates in this republic we live in. And so in some sense, I would say the vote is the expression of the people's will and who becomes elected is how we interpret that will write the day we wake up after an election. We read the newspapers and it says this party's agenda has been confirmed because they won most of the seats in this election. Right, So that's that's how we interpret the will of the people by which they express the developed box. So I want to talk a little bit about that. If you're a mathematician, there's somehow a map between how people vote and who gets elected. And I want to talk a little bit about that. So here I have a delegation from north Carolina to the U. S. Congress and this was the outcome of an election we had. And so this was the political makeup of that group. And of course that was based on some votes which you see are pretty even they're not so far off, but there that's what they are. Um And those votes led to these represents the elected. Now, if I keep the same set of votes and just draw a different map, I get a completely different set of outcomes. And you notice these are exact opposites A 93 split in one direction or a 39 split in the other direction. So all that I've done here is changed which maps I used, which district maps, Right? So we have an election, everyone votes in a district for a certain person. And there's a small fudge here that I'm considering people voting for a party, But maybe that's not so wrong. Unfortunately, these polarized times, but I can take a different map and here's a third map that gives another outcome. So the question is, if the choice of maps is so important to how we interpret these votes, which map should we choose and how should we decide if someone's done a good job of choosing that? Now? All right. So one way to think about that is you can invoke external fairness principles, you might say proportionality. That's why if you read the press, if you sit around the coffee shop, that's what you'll hear people talk about. You know, so and so party got 51% of the votes. Don't you think they should get 51% of the seats? Or symmetry? That's very appealing in some ways? Well, if I get 51% of the seats vote, whatever whatever number of seats I got, you should get when you get 51% of the seats vote, I should say. Or there's for a while, there was a popular idea about electoral efficiency. You know, every vote that you use above 50% is a wasted vote in some sense, because you didn't need that one to win Above 50% plus one vote. So maybe each group should have similar kind of efficiency in their elections. Those are all really nice ideas. But the problem is, is that although they may be desirable, our system wasn't designed to do this. So it's really not fair to judge it against these ideas, you may say, I don't like our system. I want a system that does X, Y or Z and that's perfectly fine. But it's not a way to decide whether a certain set of maps is a good is an honest broker of translating votes into election outcomes. All right, So the word gerrymander, so maybe it's worth going back to what the definition of gerrymandering. So it's to manipulate district boundaries to favor one party partisan gerrymandering or one class usually race racial gerrymandering is to change the outcome of an election and somehow built into this is the idea, you know, it's implicit that we're discussing what would happen, right? What we're comparing it to some ideal of what should have happened and the gerrymandering is to manipulate or change the outcome away from there. So, I was asking and kind of when people talk about this is usually saying, what would have happened if there had been no political agendas if no one had put their thumb on the scale to tilt it in one way or the other in drawing the maps? All right. So what are the principles that we do use to draw the maps If it's not proportionality? If it's not, you know, symmetry of outcomes, it's compact districts. We have this idea of local representation, right? That's the debate between national and syphilis, governments, between state governments and local governments. We want the districts to be relatively compact for the same reason as to say spatially, you know, not snakes, but more like circles. And they should have roughly equal population, sometimes exactly equal population depending on which types of district you're talking about. And that's this idea that we have one person, one vote that's enshrined in our constitution and our jurisprudence. And there are other important things we have to talk about. We have to talk about the voting rights act and preservation of of groups of interests that could be at the simplest version. County preservation or municipality preservation. Or maybe some people like to think we should protect incumbents or the cores of the districts. So these are all things that we could talk about. But these are really the principles of what we use to draw districts. Another thing that I have to mention is that people often want to talk about redistricting being strange districts. Right. So these are the actual Districts used in the 2012 congressional elections in North Carolina. They are very strange looking beasts. And while these make really nice posters and coffee cups and T shirts, the truth is stopping strange geometries will not stop gerrymander. It these two maps look completely different, but their outcome is politically equivalent. Alright. They're they're they're they're very strange. All right. Right. Yeah. And there's this map which is the one we actually used in 2020, which was the one that was changed after a court lawsuit. And it looks very similar to this one to the eye. But I can tell you politically, it's extremely different. All right. So, what's the idea that I want to talk to you about? The idea is, well, I want to compare maps. I want to somehow use a comparison to unbiased maps as a benchmark as a null hypothesis. So, here's these four maps I'm going to talk about maybe. So, here's the ones we used in 2012 in North Carolina here. The ones we used in 2016 after these were thrown out as a racial Gerrymander. Then both of these maps were then this map was then thrown out. And in 2020 we used this map. The remedial plan, remedial means the courts or medial plan. And then a bipartisan group of retired judges made a map. They were very political about it, but they tried to be nonpartisan or bipartisan rather. So if we wanted to talk about which of these maps we like better, we might compare the four to each other, but it might be nice to compare them to a whole bunch of other maps and maybe even more maps. So an entire collection or if you will an ensemble of maps as we've come to call it as my group likes to call it. So, you know, an ensemble of nuts, that's what we're going to compare to. And we're gonna use this as some kind of normative standard against which to compare a collection of maps. All right. So let me show you how this works. So here's the 2016 map I showed you. So here we go. And if, uh, if we run, we take a particular set of votes and run this election. So here's this set of votes, which I'm going to keep constant throughout this whole discussion. It's actually the one that elected those three democrats and those 10 republicans. And so I'm going to put a little bar here on this bar graph of height one saying there was one map, The elected three Democrats. And now here's a different map and it elects eight Democrats. And here's a different map that elects six. You see the votes are staying constant. But I'm building up what we call a hissed a gram in mathematics. Little by little. And so I'm building up this, hissed a gram. Yeah. And so now I'm going to let this kind of go and I'm going to keep doing this. You see, the margins are getting bigger and bigger and I get more and more maps, I'm going to keep doing this. How many until it stops changing, until this history Graham is converged. Okay, now it seems to have converged. So let's take a look at this. So it seems to be saying that somewhere there five or 6 maps, Democrats is pretty typical. Mhm. I can tell you that this election had over 50% Democrat, Just barely 50 point something, but still, you don't expect to see over 50% of the seats go to the Democrats. So it's just a fact of the structure of our election system. If you don't like it, you should have a different conversation, but that's not gerrymandering. So here we go. Let's uh let's put on some maps. So here's the 2016 map we used in our elections in North Carolina. Here's the map that the judges put in uh, after the court after this map was declared unconstitutional. It's always in yellow and my picture, sometimes it's called the remedial of the remedy map. And now here's a couple. Here's all those maps we talked about the judges map, the map from 2016. The map from 2014. These two maps now, what's different across these two plots are that I've changed the set of votes. So you can then use different votes, which have different spatial patterns and you can see what the outcome is. And now you see that in every one of these cases, the maps we used in our elections were complete outliers, let's say their way outside the center place where all the mass of this history Graham is concentrated. And the judges map and the remedial map are not so bad. They're pretty typical. You would say. They're not outliers. They don't lie outside of this central region. All right. So that's kind of the most basic analysis. But I want to convince you that just counting seats is not the right thing to do either. You need to be a little more nuanced, a little bit more. You want to understand what I'm, what I'm in the distance of doing is I don't want to just convince someone that something is wrong. I want to give them a microscope that they can look at a map and understand its properties and they make their own conclusion. So here we've taken a whole bunch of different elections. North Carolina, Secretary state U. S. House governor, uh, U. S. Senate president, President, house governor. And you see their organized top to bottom from most Republican to most democratic based on the statewide vote fraction. And you see an interesting thing, the remedy map kind of stays in the middle of the bluest A grams like you might expect. So it's very central and this map made by the non partisan judges does a pretty good job to to But the map we used in 2016 and the 2012 map is identical basically. It barely ever changes its opinion. It barely ever moves. So let's watch that in a movie, which I think is very convincing. So what I'm gonna do now is I have that blue hissed a gram here. And if you watch this counter right here, statewide vote fraction, I'm going to let that tick up. And of course, as more people vote democratic in this vote, synthetic votes that I'm making this blue hissed a graham is going to move to the right and you're gonna watch see this yellow line move with it. But just like here, this purple line won't move. So, first of all, notice, right, about sorry, got ahead of myself there, So I can't stop this picture, unfortunately. So right around 50%. You'll see that it's nowhere near 50% of the seats. So it's not about proportionality, but the remedial map does a pretty good job. Right? So, what we're doing is we're using computational mathematics to figure out what we expect to have happen. And then we can compare what happens from particular map. All right. And so eventually the map does act typical. But it's not until there's only 57% democratic. And you would have seen if you watch this again that around 54% is when it starts to kind of catch up with the blue line. So let's let's go into a different chamber in North Carolina. This is for a different court case. This was this was actually showed this video in court. I can tell you everyone was transfixed. So here we are in the state house, the state legislature, the house of the legislature and has 120 seats. And we're gonna watch again as this statewide vote fraction takes its way up. This was using presidential votes. It's actually usually better to use statewide votes instead of particular um congressional elections. So we're going to watch and this thing is going to slowly move to the right. The blue hissed a gram as this vote count ticks up. And what I want you to watch is what the overturned map, the one that was originally used that led to the state court case and the final map, the one that was putting in place. Let's watch that. And so you're going to watch that. This purple one doesn't move very much or lags behind little by little. But right now, right now it's only 57% democrat and now it's about to get over. Course I could just flip this around do Republican. I do it both ways sometimes. And the final map does a pretty good job of staying with the blue instagram. But you'll notice that the overturn map lags significantly behind and not only does it lag significantly behind, it seems to lag behind at important moments as we approach this line, which represents a shift in the supermajority and as we represent, approach this line which represents the change of the majority and same here. Okay. All right. So, you know, this main history Graham moves across the 50% line much sooner than this purple lined it. Let's look at this one more time. So now we're gonna have the same video, I'm going to play it again for you. But I have also lined up here a whole bunch of historic collections and you see the same effect happens when we're down here around 50% or below 50% like 48%. The maps not so atypical. The purple dot lines up very much with the instagram, but as we move upward, you see it lags dramatically behind and that shows that as the majority is in dangerous switching to the other party. This map under represents under elects democrats. Let's watch it one more time. So here it goes. And now it's also going to happen. Here is this other hissed a gram. This box plot is not history and it's a box plot is going to shift up and every time a dot crosses this dotted line, a seat shifts under the purple map. Under the typical map, it's every time a box crosses this box of this box and whisker plots crosses the line. And what you'll see is there's a whole bunch of districts where this purple dot lags dramatically behind the center of these boxes. That will become important in a second. All right, So let's just watch this thing. You see how much the purple line is lagging behind the blue history. All right. Yes. So, one thing I want to emphasize throughout this lecture actually is, is how much this talk is is equally about how do you describe things to a court and the public to better understand mathematics? And this is a lot of trial air and discussions went into buildings. So what's happening here is something we saw and this is Wisconsin and analysis. We did a Wisconsin for, For writing in America speaks to the Supreme Court. We see that when the elections are typical around around 50%,, The outcomes are very typical. These red dots that are the map used in Wisconsin. The time line up exactly, pretty much with the fat part of this history Graham. However, as the Republicans drop below 50% and they're in danger of losing the seats all of a sudden there's a dramatic under reporting of Democrats being elected. Very dramatic in some cases, and we like to call this a firewall. It's like a prevention of the it prevents the chamber for switching control. All right. And this is the 50% line. It's just less under 50%. I can't remember right now how many seats are, but that's where the Republican majority on this side and democratic majority in this side. All right. So what I've tried to tell you is you shouldn't just listen to analyses that talk about who won, how many seats you have to ask how they won them and over what types of elections are the results reported? So that's really important. So I'm going to now bring you to a kind of plot which is behind those movies which represented in Rancho uh common Cosby rubio, that ended up going to the U. S. Supreme Court. And I can talk a little bit about the court history in a second. So, what I've done here is I've taken that judges map the one we saw was pretty typical for two different sets of votes votes from 2016 routes from 2012 and I've taken the most Republican district and put it on the far left. So about 35% Democratic votes in the most Republican. And then I put the second most Republican there, which is about 4034, which is what you read off, you mean about 40% when you read off this data cross and then the most democratic is 65%. And you may ask yourself, is that normal that there's a district that only has 35% Republican and Democrat and district that has over 65% Democrat? Well, that's what this collection of maps that we generate is. So I should also just point out that when you cross this dotted line is when you switched from Republican to Democratic seats. So all these dots below the line represent demint Republican seats and all these dots above the line represent Democratic seats. So now, what I've done is I've taken my collection of maps about 24,000 of them, and I've drawn where did the most Republican district typically fall? It fell right inside this box plot. So inside this little box here has 50% of the maps. And here it's about 50% of map. So it's actually, yes. Typically you see one district around 65%. It's like a bunch of balloons in a box. You move them around and you can't help but have one district that typically has 65% and one district that has just maybe a shade less, uh, Republicans, but about just over 31%, Republican democrat. Now let's put on. So, right, so here, just laying that out for you. Now, let's put on those two maps that I showed you the North Time 12, 2012, The North Time 2016. And what you see is they're pretty normal down here. But as soon as they get close to the districts that would switch from one side of this dotted line to the other, all of a sudden there are many, many more democrats in these districts and it should be and many, many, many less republicans in these districts. And it really doesn't matter which vote you look at. And this is what people call cracking and packing. So democrats have been packed in this district, so their votes matter less and they've been cracked out of these districts, removed out of these districts so that there's less of a chance that they would swing above this line. They've depressed the amount of democrats in the swing districts. And you can see it very easily using mathematics, right? And I want to emphasize that this analysis takes in the fact that north Carolina is a strange shape that north Carolina has big cities and small cities and rural areas and rural areas that vote historically Democratic in rural areas that vote historically Republican. That's all taken to an account when we make these boxes. So it's all built into this. All right, right. So, and what you really see, which is interesting here, is the reason these dots don't move barely when you switch elections is because of this huge jump here. This huge jump between these dots essentially means that there's gonna be this huge range of public opinion where nothing changes. And so these are unresponsive elections. So, what I really think we should be talking about is whether elections are responsive or not. Right? So let me give you another example. Let me give you some kind of make up some fictitious. So sometimes people like to talk about competitiveness. So competitiveness means all the districts are as close as you can possibly be to 5050 And that's great right here. Now, they're all right next to 5050. And we have this huge gap here between the other districts. But what if public opinion shifts a little bit? What that means is that this whole set of districts would shift down a little bit and now all of a sudden none of these districts are competitive. They're all swinging hard to one party. So it's not really clear to me that we should be worrying about competitive districts. I think what we should be worrying about is responsive districting plants. So here's another set of districts and notice that when I change opinion, one of these dots crosses over, so somebody loses a seat. So what we really would like, I think, is to have Districting plans that are responsive, right? That when the elections change, they have consequences. When public opinion changes, the elections have consequences. So responsive as elections is democracy. That's the core principle in our democracy. And if you draw these maps in a nonpartisan way and look at the property of this collection of maps is ensemble maps. You see that if you draw them as the way I'll describe in a minute, you'll typically have districted plans that are very responsive in the way I've just described. All right. And so let me give you another example that we've done analysis of Maryland. There's lots of things I can talk about here in Maryland, but the thing I'm going to talk about right now is that okay? So Maryland is a quite democratic state. So it typically is a 71 split of its congressional delegation. But what you see is happened here is they've done the same kind of thing. There's almost an agreement between the parties that 71 is going to be, the answer for Maryland. They've made the most Republican district much more Republican than it typically would be and the next most district much more democratic than it chemically would be. So there are elections where the democrats like this one Would have typically won all eight seats and there were elections like this one where the democrats would have actually probably lost the seventh seed. The Republican democrats would have lost the seventies. So here the republic democrats would have won all of them here. The democrats would have won one. But by designing the districts the way they did, they basically locked in a 71 split. So everyone is happy in their legislature. Everyone gets I mean, this delegation, everyone gets to keep their job right. This is incumbency protection. And you can ask, well, is that big gap very typical. And we can ask how big a range does it stay? A 71 split? And you can look over all the maps we did, and a tiny fraction of them have such a big gap. So this really is an incredibly unusual map And that it protects the 71 split rather strongly. So this might be called an incumbent mary Mandiri. Some people try to use words like that. I can barely say it. All right, So take a little breath for a second. I want to talk fast. Sometimes. It's hard when you have an audience in front of you, just me and my books and my plan. So it's gonna be so nice to see everyone at conferences again, it's sad. Mhm. So, what I want to do now is probably a little the structure of results to help explain what's going on. I really don't want to just say good, bad outlier. Not outlier. I want to talk about the structure results. That's the most important thing. I'm not a big fan of indices to talk about gerrymandering. I think what you should do is find visualizations that explain what's going on. So the first thing I want to say is that it's not proportional representation one more time, because people are always talking about proportionality in the media. So if it was proportionality, the center of my blue boxes would follow this black line. They clearly do not, they're not on top of this black line. So it's not proportionality. You can't get upset about that. Our system is designed to favor spatially distributed parties. However, you can still separate the effect here. This is an analysis of Wisconsin and we could see the chamber split when the republicans had about 49% of voting. This one actually only 46% of vote here and maybe 47% there. But the map that the legislature drew kept the Republican majority way down here at a much lower percentage. So it still was a gerrymander even after you've taken this natural structural advantage for the party, like the Wisconsin Republican Party, which is in the more rural areas. All right, you can also do an analysis where you ask about like where exactly is that are the people that are put in strange districts. So here's one in north Carolina. I'm not going to go into that because I have a lot to talk about, but I just want to say we do do geospatial analysis to which can be quite interesting and I think that's still developing area. So our team, we've been lucky enough very early on and if I have a little moment at the end we can ask me the Q. And I'll tell the story about how we got here. You know this was an interesting story of how we ended up doing this but we've been involved in a number of courses so cases. The original one was um comment cause versus rubio here which was about state legislator about about the the U. S. Delegation to the Congress and that was a win. But then eventually the Supreme Court threw that out and told the States that they had no business. I mean told said the federal government was not going to get involved in deciding about political gerrymandering. They're going to throw it back to the States along the way. We also wrote an amicus brief. We actually alright eric Lander did but we wrote the we gave the uh the analysis that was part of his brief was built on. But then it went back to the state courts and before that actually, I should also mention that we also did some work in Covington, which was a racial gerrymandering case in north Carolina, talking about whether the high levels of african americans in certain districts was typical or not. And then we were involved in a series of cases common cause versus louis and common cause versus I mean harper versus louis, which led to every single map, the state legislature, maps, the state house and the Senate and the congressional delegation maps being thrown out in 2020. And we had brand new maps for all of our elections. All right. And there's lots of other people who have now, you know, this is a growing area. Um there's a group of toughs, there's also um, the group at uh at Carnegie Mellon Alan trees and West pregnant. And they had some really important landmark cases in pennsylvania, which I'll talk a little bit more about maybe in a moment and then off to some analysis and guilt various people being involved in. There's a group of mathematicians to put a brief in the Supreme Court with this kind of analysis headed by a group of toughs. So now let me talk a little bit about the building of an ensemble. So that's building of this collection of maps. All right, so the simplest thing you can do, so a lot of this is really the machinery that comes from modern basing statistics. And you always start with something simple, which is what's called the single node flipper for the physicists in the room the izing move. So you would pick appropriately an edge that goes between two different colors. You pick one of them and then you flip the color randomly across it. That's like a simple move. And that's the way to evolve one boundary to another. And that one we actually used in some of our early cases in in big States, you actually have to use simulated kneeling. But in those important cases in gil, I mean in louis versus common, cause we were actually able to make it work with single mode flip. So that was great. So we actually used real honest single note flip, not simulated kneeling there. But then, you know, about 2018, there was a good idea out of the tough group which was to introduce some more global move, which is what one always wants. And they called their move re calm. And so the idea there was you have a graph, which is just the adjacency graph, that's a one the which districts are next each other. And you break it up into little graphs based on the districts. And then the idea was you merge two districts together and draw a spanning tree, which is just a treaty that visits every single a tree has no loops and it visits every precinct in this case once. And the thing about trees is it's very easy to walk across and keep attracting how much population is in front of you and how much populations behind you. And so if you walk across, you'll find that there are certain edges. You can cut this and keep that one person, one vote balance that we care so much about intact. And if we cut it this orange one, we end up with two new districts and that works very well. And it's a nice way to speed up mixing and handle bigger states. Now the problem is that neither of these chains actually do what and when I said we use single note flip, we didn't actually use single notion. The reason we don't want to use single note flip just by itself or re come just by itself. It said what? We really want our policy driven distributions on the maps to find an independent of the method. They're used to sample it. So we want this helps focus the discussion policy wise, it allows for verification by simple by other methods and it also reduces the chance that the map generation method is introducing a bias. And there's some evidence now that if you use just some of these methods from before you'll introduce biases in a way which wasn't made explicit, right? By listing the measure, we make it very explicit what we care about and we can have an explicit conversation. So I like to think that all of this has three steps first. We talk about the law and the public policy and I'll do that in a second and we encode that in some distribution on the space of redistricting. And then we algorithmic lee sample that map using Markov chain monte Carlo, monte Carlo means you're sampling from a given distribution that you've decided on in advance. That's what Markov chain monte Carlo really is about. And then equally important you have to engage with that. You have to analyze it, you have to visualize it, you have to find the right way to communicate it so I can tell you that all those pictures I showed you our after many, many, many years of conversations with policy advocates and lawyers and others, reporters and every time they say I had no one understands that, then I have to explain in a different way and I have pushed me to engage in a way that I have never done before. And really, you know, I have come to really strongly feel that it's I can't just say I explained to you, why don't you understand it? Right. It's my fault. If they don't understand it, if they're not picking up on what I think is the most important point, I have to find a better way to talk. All right, So let me tell you about building these distributions, these policy driven distributions on maps. So the first thing we do is we're gonna create some score function that tells us how well a map encapsulates what we believe in and the relative weights of these. We're gonna make it a sum of different score function. So one maybe that measures population deviation, one that measures compactness, One that measures county compliance and one that measures the voting rights act. And we purposely put them as a some here so that except for maybe geographical constraints, they're basically independent. I mean they're not really independent but their independence is not something we're explicitly putting him, we're not correlating them with each other explicitly. So there's lots of different compact disc or someone like better than others. Um you know, there's some issues about how your maps are drawn about this one, but this is probably my favorite. Nonetheless, it's the inverse of the eyes of parametric constant, which is called the pulse beat popper score because they didn't know about Queen dido. It was discovered in the seventies by political scientists, Queen died of Queen of Carthage. Right. So, um yeah, we can just measure the altitude enormous to say the square deviation of the population from some ideal population, which is just the number of people in the state divided by the number of districts. And then we can also keep track of things like people ousted from the city. They should be that they're living from the district that contains most of that city and also ideas of community, I mean county preservation or voting rights act. So these are all things that we put into this score function and we balance it appropriately so that they look like historic districts or the kind of districts that the policy makers say that they want. And then we can ask the question afterwards, did they really follow their own prescription or did they do something different? All right. So I told you that those just pure single note for for pure recon don't do what we want. They don't do what they want because they can't sample from an arbitrary measure like this, which is adaptable to the policy considerations. All right. So what we want to do is we want to use a really great idea That that is really the engine behind modern basic statistics in many ways is the metropolis Hastings algorithm but it goes back to metropolis of Los Alamos and earlier nuclear experiments in the 50s or simulations. And it says the following is that you can propose a move according to any algorithm you like, like single the flipper or recon. And then you can accept that move or not make either make that move or not make that move according to some special probability that the metropolis Hastings and other like algorithms defined and by doing so you then sample from this measure you were given, which is kind of amazing. And the mathematical core of this is what you do is you make the flux From one configuration. So here's one redistricting to another. Redistricting. If you look carefully right around here, it's different. And you can calculate the probability of moving in one direction, I say, the probability of finding yourself in this configuration and then the probability of moving to be. So that's the flux kind of any equilibrium from A. D. And you want it to be the same as the flux from B to A. All right. And if you do that, you're actually guaranteed that this measure here will be the stationary measure. The measure from which this Markov chain monte Carlo system is simple. All right. So that's an important math idea. And there'll be a few moments of math and this talk for sure, but this is one of a really important idea. Yeah. All right. So there are a lot of our winds and you know, we start in various places. So, you know, when, when we didn't really know about this work when we first got started, but Joey Chen and Jonathan Rodan, we're doing some work where they were creating ensembles of maps and comparing them to to to import cases and also in their political science work there to political scientists at stanford and michigan and the thing about this algorithm, although it was very nice palace system things you really don't know what measured samples from. It's unknown and also Wendy cho did a nice genetic algorithm work, but it's again, not really, you don't know what measure you're sampling from. So you don't know what biases. Then there's been some work. So I should also say. And then, you know, at the very beginning to do big, big graphs, we had to do simulating dealing, which is somehow on the border here because you know, a lot about what measures sampling from, but you know what the valleys look like, but you don't know how they're relatively waited as well as you wish you did. And then around the same time we did our work, which is a little bit after chondroitin. Um you know, there was also some work by freeze, which is right here around the same time and also uh keisuke Enema and then fight Field and their collaborators at Princeton and they actually introduced some nice ideas, which we tried also um around parallel tempering and simulated tempering and using wolf moves at your statistical mechanics, computational mechanization. But those don't seem to work nearly as well in many ways as we come, which does a really nice job of large scale mixing is what you might call a global move in a nice way. Um It's nice to go to mix it with single notes because it doesn't look the expiration, but it's really nice. But so the problem was we couldn't sample from any distribution we wanted to, so then our group has taken that idea and introduced a way to make it reversible or to make it so you can use metropolis Hastings. And we've actually also introduced a multi scale version of it, which I'll talk a little bit about. All right. But there's lots of other interesting ideas like sequential monte Carlo again from Kozuka Enemas Group. And there's so there's a lot of different ideas floating on here are a lot of work to be done. All right, so mhm. So let me talk a moment about rick um uh somehow the slide appear got deleted. Oh no something got deleted. Okay I apologize somehow. Oh I know I simplify my slides and I absolutely believe that the reconstruct. So let me tell you what we come does. So let me let me just switch to this so let me tell you what we can do. So what rick um does is rick um says all right, I'm gonna emerge to slides actually already mentioned it earlier and that's what I did mention. So I did I got myself confused there for a second. That's right. It was earlier. So what what what what what our forestry com algorithm does instead is it replaces partitions of graphs with collections of spanning treats. Now that sounds like a trivial thing, but it's important. And the reason that's important is the following is it's very easy to calculate this forward probability and rick um The hard thing is is to calculate the backwards probability and by doing this by searching to spanning trees instead of districts. What the short version is that it lets you populate this backwards probability, which is key to metropolis station to doing the metropolis Hastings scheme and sampling from an arbitrary measure that's driven by the policy considerations. All right. And so here's a very mouthy slide which I left in just cause I couldn't help it. So, for for the for the mathematicians in the audience, the reason this works is that when you go from partitions two partitions, there's a last step where you move from spanning trees, two partitions. The problem is that there are many, many spanning trees on a partition. So this is a many to one now and it's very expensive to invert that. However, if you just keep around the spanning trees then you don't need to inverted and it's not very expensive to calculate the backwards probability, that is to say go backwards in the reverse direction these areas, So that's that's what that's about. All right, so, Mhm. So let me just pause for a second. So, so the idea of these things is we have these spanning trees, we merge them, we merged two districts together, we draw a new spanning tree, and we use that to cut it into. That's the idea of re come the idea of the forestry. Com is that we instead keep these spanning trees around and that lets us calculate the backward public. There's still issues with keeping keeping counties whole, keeping communities of interest hole and actually just scale efficiency, Like we might want to go all the way down to census blocks, although we really don't need to because we can do some other analysis, but there's some reasons why uh, practitioners might like to have much finer maps than we have. So We came up with an idea of which would be a natural idea for many people on the, I would say in this room, but in this virtual room, which is to use a multi scale of it. So here we have at the level of precincts North Carolina. So there's like 3000 precincts, it's an area of North Carolina. Um, And if you put a census blocks ins, we have 80,000. Okay. And so instead, what we're gonna do is we're going to recognize that the districting is actually made up often of whole counties and then some fine scale modifications like this. So what we're gonna do is instead of keeping all this detail, we're going to keep a multi scale graph representation where we keep the graph at different scales as needed. And we found a way to then sample from that in an appropriate way. So what we do is we have this multi scale representation, we take its backbone out at its highest level. We find a spanning tree after merging this dark grey in the dark green graph together this like where we find a spanning tree at the course of sco we cut at this red edge to create two graphs. And then we resolve the finer scales and again put a spanning tree and cut again and cut again and so on and so on. And so we're currently running this at three levels. Okay. And so here's a little movie where we're gonna watch this run. And so this goes all the way down to the finest scale of census of census blocks the finest scale. And so this is actually running on north Carolina. It doesn't take very long to run. And it does a really nice job of resolving down to find skills. This both allows us to do in a very natural way. County preservation and municipal preservation, which is important. It's constitutionally required in north Carolina, for instance, in many cases, and certainly preferred. Um, and it also gives us a much more compressed representation. And compression is always good for computation. Right? All right. So this is, uh, running all the way down to the census block level in north Carolina. Now, there's other ideas we've explored. So now I'm kind of kind of broaden our conversation. That's actually what we're planning to use in this cycle. Right? We're August 16. The census is released The 2020 census after much delay. And so we're all who do this are gearing up to do an analysis of that. So, this is an incredibly important moment. We've also looked at some other ideas. So we've looked at ideas of non reversible metropolis Hastings where we put in some kind of circulating flow, which is like stirring your coffee, which tends to mix the districts around. It has some nice effects. It's still a work in progress, but it's interesting science to try to think about that. So you might only allow transitions like this one that's being shown here if they line up with the flow. And then if you reject too many times you say, okay, now, I'm gonna look at ones that go in the other direction for the flow. And so then you switch the phone in the other direction. You can put all different flows on your space, you can imagine. So we have some interesting paper using non reversible dynamics to try to look at mixing here. Um, and I should say this is really is an evolving subject where lots of work to be done. So let me try to give some really incomplete tour of other work to be done. So, um, so here's a Maria allen and West at, at Carnegie Mellon introduced this really nice idea of it has a lot of advantages that doesn't need a mixing and actually the fear and backing it up. And it is used to compare a map. Now, I can't look at what you would typically expect, but it can tell you whether a given map is unusual. And so this was used with great success in the state gerrymandering case in pennsylvania. And it was also used in the north Carolina cases. The later ones where West uh testified in West and I joined their team and we did kind of a generalization that's adapted to this case where you have multi scale structures there it goes because using the landmark pennsylvania case um here we have some really nice work of cinema and some computational commented tourists. And this is kind of setting up the idea of having a legislative, having kind of test cases to use. And I can tell you that in the north Carolina case we actually use their algorithm to completely enumerate part of the state. So we created a complete enumeration. We counted every district that was possible and we compared our methods to that enumeration to show that we are doing a good job. And so it's really useful to have these benchmarks, especially in a courtroom to talk about it. So we, you know, it's not very often you have computational combinatorics enumeration showing up in a court case, but that's what we did in our court case. There's lots of other interesting cases. So here's the sequential monte Carlo using kind of particle filter methods. Um There's some topological data analysis out of Ohio state in the toughest group. There's some just new work on thinking about kind of hierarchical partitioning algorithms and then some trying to announce, analyze the shape space here using optimal transport ideas. Yeah. Uh there's some really nice work uh by people at Tufts again, thinking about how we might want to improve, how we think about the voting rights act and how we would do what's called ecological or uh ecological inference or ecological regression. How better to think about that. Um There's also some work thinking about by that same group, thinking about how differential privacy might or might not affect the census. There's lots of interesting opportunities. Once you start doing this, you started finding like in north Carolina, there's a law that said there was an algorithm essentially a greedy algorithm for how you have to divide the state up into clumps around its counties. The problem was there was this court case, but the state legislature had no idea how to do this wasn't a simple problem, but it wasn't hard. So along with some high school students in north Carolina, we wrote a paper where we wrote open source algorithm and we released to the state and the state actually never was passed. But there were some bills where they actually said, you know, use this algorithm to figure this thing out. You need to figure out for our public policy. And so one thing we're going to do in the new census comes out is we're going to tell the state what county clustering is called should be used in north Carolina. Um there's lots of other ideas. So one thing I didn't really talk about is two things mathematicians do the minute they get to redistricting, they want to either talk about geometry of it and we'll talk about compactness. So it's really not that important. It is plays a role, but it's not central and even less. And the other thing they want to do is they want to talk about how to draw good maps. I mean that's a really interesting question, but I think unfortunately, I think politicians should always be involved. There's always externalities that you can't put an algorithm, but there is some nice work. It's it's useful. I think in north Carolina in our recent redistricting cycle, we restarted after a court case from a randomly drawn map and then the state had to carefully change what was gonna happen, what wouldn't happen. Um and that that process started from a random map. And, you know, maybe a good random map is a good way to start a redistricting process. Um Other people have looked at kind of the computational empirical, effective sampling and some ideas about different indices maybe. Alright, so at the very end, I'm almost to the end of my second to last slide, I want to just kind of maybe tell the story of how I got here a little bit. It's kind of interesting and maybe the best way most striking way since this room might might have a few people who teach it. Um Is these are all the collaborators on this project and a lot of them. And it really started with Christie von Graze, her name changes halfway through who. This started as her undergraduate thesis and I kind of grew out of some summer work starting in 2013, but really picking up in 2014 and we released our first paper called redistricting in the will of the people in 2014. And after that, a whole bunch, everyone here was undergrad Eugene next to the name or undergrads and master students. And so we did this work. And to be honest and beginning very little people care our state NPR didn't really care about it that much. A friend of mine who was at NPR at the time did a public interest story because I thought it was cool that Christie was involved as an undergrad and she started working as a high school student. And by the way, she just finished her PhD at, at Princeton and computational math and now she's a data scientist out there in the world. So, um, and so, but people really didn't care. They were thinking a lot about the efficiency gap at the time and there weren't really any other mathematicians that I knew if I didn't know that Alan freeze around the same time was thinking about it. And Kozuka is a statistician. I didn't know that he was thinking about it. So we kind of came up with these ideas by ourselves, but no one seemed to really care that much. And then it was an important moment where there was a A redistricting simulation in north, uh this set of judges and they were brought together and a few lawyers came to a talk I gave and they got interested and that was just around 2,016. 2017, and so they we did another analysis and all of a sudden it started, you know, next thing I knew, they asked me could I give that talk that I just gave in court and that was really fascinating and it's really been fascinating over time to watch this community grow. Um and and now there's a whole bunch of different centers of different places where people are thinking about it and I encourage everyone if they're all interested to become involved in and look into this. I think there's, you know, it's a great opportunity for undergraduate projects, but also just generally, it's a really great thing. I should thank my department for its support of my undergraduates in these projects and the data center, Science and the Duke's provost office and the Center for Politics pull us a Duke and the Information Initiative. They've all been very, very supportive of this work. Um So let me just summarize and as I do, if you're bored of me talking, you can go look at our website, which has lots of short little snippets about lots of the things I talked about and lots of the papers. And uh, you know, one thing that's interesting in the subject, you should go read the expert reports because often things arrive in the extra reports three or four years before they end up in a paper anywhere. Um, sometimes because it's hard to get them published. But here you can watch this movie again that I talked about. And let me just kind of summarize my last little bit by saying, you know, quantitative analysis is really poised on a major effect on this redistricting cycle. And you should really pay attention. You know, more than anything, more than voter suppression, more than stuffing ballot boxes, whichever you believe in, who draws the maps has the biggest effect on how your vote is translated into who wins the seats at the election. So, you know, it's really an important moment and every citizen should be paying attention to their states redistricting and, You know, quantitative analysis is really has a chance to play a major role for the first time in a way that it hasn't before. Secondly, I just want to say really loud when last time it's not about proportionality. So don't read when you read, someone says, well, 52% voted for this party and they only got this many seats. That's wrong. She take that with a grain of salt. That's kind of how I got started. Somebody said that and it turned on the mathematician in my brain. I said, wait a second. I agree that maybe I should got more seats, but 50% that seems unlikely, given the geographical structure which brings me to this point, which is the geography really matters. This is all about finding out the null hypothesis, finding receiving computational the information about how the geopolitical landscape interdict interacts with the redistricting criteria. We have decided through policy and legal means, what are registering criterias? And the question is how do they interact with the geography and where people live in our state? And lastly, I think this is really a non partisan question. This is a question about should elections lead to different results as the electorate's opinion changes? Should elections be responsive? And I guess I come down quite strongly on the side that elections should have consequences. And when people vote and change their votes, different people should win. And what's been happening on both sides of the aisle for sometime is that people been drawing maps that don't do that. And I think we need to use mathematics and use computational tools and quantum analysis to show when that's happening and call people out on it. So lastly I'll just point out this website if you want to read more about it and thank you so so very much uh for your aunt for your attention. Thanks a lot. Thank you very much, Jonathan. So on behalf of everyone who's been watching, I want to thank you very much for this. Very interesting presentation. So we have a little bit of time so you're willing to answer a couple of questions? Of course. Yeah I understand there's a room afterwards that I'm going to go to answer more to first of all. By the way there's a comment in case people aren't reading that. The comment is that the Fairmont mandarin paper by group Gurney and Smoyes. One this I. M. A. C. D. A. 21 best paper award. And it's going to be presented at 4 30 this afternoon. So that's a plug for people to go to the to the award talk this afternoon. Okay. And people are putting in chat that it was a wonderful talk. But here's a question. So someone asked you have Balasko answer, asked you have any insight on the mixing of the ensemble generating chain maybe depending on how strongly the compactness etcetera are enforced. Yes. So we do. So one thing we do is we use we use kind of classical built Billman Ruben type statistics. So we start from different we have a random initial generating technique and we start with a large number a number of them sometimes five sometimes 30 different chains. And we run them in parallel. We run them and we keep track of marginal statistics of important. Um I can tell you more which ones and when those have converged then we're pretty happy. We're pretty satisfied that we're getting a fairly representative um ensemble that we're getting good mixing um You know, so there are no silver single node flight has some issues but so you know, we're glad that we now have the recon proposal to use in our metropolis Hastings algorithm. Um and the multi scale one helps a lot with that question. It really helps provide better. But we have we have pretty good empirical results and when on the scales that we can do enumeration, we also see that does a good job. Okay. There's a question from broderick. Craig, craig. How did you calculate the responsiveness of the enacted plans? Well, I mean I showed there's lots of answers that question. I mean calculate I showed you the answer. I mean I showed you a picture, right? I showed you that as you swing as you change. So what I do is I use something called proportional. You can do it with actual different elections. That's one way to do it. Another way is to take a single election and you shift something called the uniform swing analysis, which is not perfect, but it gives a reasonably um a reasonable set of votes at different in moderate swings. And you shift the statewide vote fraction and you see over what range does the election not change? Right. What I'm really doing is looking back at plots like uh plots like just touch it and it goes away, plots like this, no plots like that. And I'm saying, look at this huge jump here, right? So how do I make that go away? Okay, I can't get my pointer back. Ah, there we go. Uh the point is still not coming back, whatever. So if you look at these ones that are way depressed, you look at the jump between the fourth and the third seat, that huge jump is the non responsible. Right? I mean, I can quantify that, but you can also, you know, look at the image and people understand very much what that says. I hope that answers your question. I'm not sure quite what you quite exactly how you quantify any men house. Another question does third party voting mess with the mathematics and these problems? Not really. Um in most of the States, I'm looking at the third party voting is so so small that it doesn't really have an effect on these elections in particular. But it is an interesting question about how does, how will things like it will be hard to do this with um when you have a jungle election, like in California and you don't really have a party declared party affiliation, you often for some preference if you have ranked choice voting, that's gonna be an interesting thing. I'm itching to get my hands on some of that. And there's lots of great things like multi person, the paper you just mentioned talks about multi person districts, the one once I am award. Um there's lots of other things you can do. Um But you know, at the moment, for better or worse, we're kind of largely in a paradigm of two parties and lots of places. Okay, now, another question in the Q and A. Is how can this be applied to other countries? Well, so some of my colleagues in Britain have looked at this to look at kind of how, how, how, you know, they're parliament to think about whether the way parliament seats and I forgot what they call all of a sudden just threw out of my brain are broken up the little the districts are not called districts are called something else. Um, and so that could be used there. I know that people have looked, there's some good examples of gerrymandering in Canada. Canada does have a national redistricting commission that does a lot of this work. But at the local level, there's been some in UBC for instance, there are some acquisition acquisitions of kind of cracking and packing in the past. So you could use it in that way. And people have, I know people in Britain have looked at using these methods in various ways. I don't know if you stayed at work is right? Yes. Okay. And probably perhaps the last question we can act is your view that proportionality is not sufficient without responsiveness or that when you can't have both responsiveness is more important? Well, so, first of all, I mean, our system doesn't have proportionality. So I don't think proportionally is really part of the conversation unless we wanna have a conversation about changing our system, which is fair. But gerrymandering is really about whether the system was implemented fairly. So I think maybe I would say I would say uh not not proportionality, but competitiveness is what I put up as the counterpoint responsiveness. And I think competitiveness is not what we should be going for because it's what's competitive in one election is not competitive in the next one as public opinion shifts. So, so I don't think, I mean, proportionality is it's we have a we have a belief in our country about local representation and that's hard to make compatible with proportionality. Yeah, statewide proportionality. Right? That's the thing. Local proportionality is exactly what we have. Whoever gets the most votes in the election in the district wins it. Now, maybe multi person districts are a good way to get round up. Okay, Now here's the final one. Does voting system, for example, ranked choice make a difference? Um, yeah, I think it will. I haven't done all that analysis. I know that there's been some conversations about in massachusetts about that and it'll be interesting after new york if we can get the data to look at that. You know, there's two points of view. You can take the point of view of how do I fix the system? What should we be doing? What's the ideal voting system? What would happen if I change this? Those are all great questions. I have largely currently, although I'm interested the other questions about here's the system we have, how does it act and how should we, how do we know when someone has put their finger on the scale within the confines of the system we have again? Right. Yeah. Okay, let me just mention there were a couple of questions about how you generated ensembles, but you did answer those at the top of the hour. But the people who ask those questions, if you, if it was not answered, please go to the chat room. So if you enter the conference portal or if you have other questions for Jonathan, if you enter the conference portal, you can click on the chat room in the navigation bar, there's a dedicated room for this talk. So you can ask additional questions there because we are now officially out of time for this. So, thank you again, Jonathan for a very, very interesting presentation. Thank you so much for this opportunity and for everyone's attention.