Wednesday, June 19, 2013

Data Mining and Manhunts

This is not a post about how America is slowly descending into fascism, or about how Edward Snowden is a traitor who has made it easier for terrorists to murder U.S. citizens.* Although I think the debate over where to draw the line between civil liberties and national security is an important one and, given the necessary vagaries of the topic, one in which intelligent people can reasonably disagree, that is not my purpose here.

Instead, what fascinates me about the NSA’s PRISM program is the question of how it purportedly helps to locate and subsequently apprehend individuals deemed a threat to national security. As anybody who has read my book (or this blog, or this interview), I’m a bit of a skeptic when it comes to the role of technology in strategic manhunts. Although intelligence is the critical variable in such campaigns, I argue that history shows that most technologies can be defeated by countermeasures (i.e. stay off the phone; don’t go outside in the daytime and wave at the UAV overhead), and that human intelligence drawn from inside the target’s network or the local population are more critical to operational success.

That being said, as more histories of the targeted killing campaigns which decimated al-Qa’ida in Iraq and the Jaysh al-Mahdi from 2006-2008 in Iraq become available (i.e. Eric Schmitt and Thom Shanker’s Counterstrike and General Stanley McChrystal’s memoir), it is clear that the use of metadata compiled from “pocket liter” found on insurgents was a critical factor in hunting those networks' leadership and operatives. So given that I’m always aware (and hopeful) that there may be some facet of manhunting I’m not privy to, for the moment I’m more curious about the mechanics of how metadata in general, and PRISM’s datamining in particular, relates to the kinetic aspect of finding/fixing terrorists than I am about the privacy-versus-security debate.

Last week Sean Gallagher of Ars Technica wrote a piece discussing the history and technical aspects of how the NSA collects “big data. Be warned, this being Ars Technica, it gets really technical (i.e. how does one even begin to conceptualize petabytes, exabytes, and zettabytes worth of information?) Yet even if you are not a “ones and zeros” geek, Gallagher makes a crucial point at the outset, warning:
“One organization’s data centers hold the contents of much of the visible Internet – and much of it that isn’t visible just by clicking your way around. It has satellite imagery of much of the world, and ground-level photography of homes and businesses and government installations tied into a geospatial database that is cross-indexed to petabytes of information about individuals and organizations. And its analytics systems process the Web search requests, e-mail messages, and other electronic activities of hundreds of millions of people.”

Of course, Gallagher says, he is talking about Google.

A more practical explanation of how metadata is used in finding terrorists is provided by J.M. Berger's ForeignPolicy.com essay, "Evil in a Haystack",. Berger presents a hypothetical of a captured al-Qa’ida operative’s cellphone, which includes the number of a terrorist fundraiser in Yemen, and then takes us step-by-step as to how the terrorist fundraiser’s social network is recreated through metadata. In doing so, Berger sets out both the tactical and strategic challenges that come with uncovering 50,000 second-order contacts (i.e. Do you investigate the 79 numbers that called the original number, the 24 mathematically most important members of that set, or all 47,923 numbers that called the 79 numbers?) Similarly, he lays out the ethical gray areas associated with such searches. For example:
“How much contact can an analyst have with a U.S. person’s data before it becomes a troublesome violation of privacy? Is it a violation to load a phone record into a graph if the analyst never looks at it individually? Is it a violation to look at a number individually if you don’t associate a name? Is it a violation to associate a name if you never take any additional investigative steps?”

Berger rightfully concludes” “None of these questions is simple or easy. None of them lends itself to polling or punditry. They aren’t easy to discuss in a reasoned and accurate manner during a two-minute TV hit or on the floor of the House of Representatives.”

Also illuminating is Duke University Sociology Professor Kiernan Healy’s blog post, Using Metadata to Find Paul Revere,” which provides a more specific (if somewhat tongue-in-cheek) case study of how some relatively simple mathematics can shed light on a social network and illuminate who the key nodes in that network are if . . . you know, if you wanted to eliminate them. Healy allows that relying exclusively on data rather than the content/context of these connections may create “the prospect of discovering suggestive but ultimately incorrect or misleading patterns,” – similarly, Berger notes that targeting American militia groups might just inadvertently create a database of legal sellers. (Deputy Attorney General James Cole admitted as much in congressional testimony yesterday.)But whereas Healy simply states “this problem would surely be greatly ameliorated by more and better metadata,” Berger suggests that increasing the size of a dataset creates “ever-more challenging complexities.”

Thus, so far as strategic manhunts are concerned, the importance of metadata to manhunting appears to be its social network mapping function in determining which individuals to target or, alternatively, with whom a targeted individual may seek sanctuary. (A similar link analysis was crucial to capturing Saddam Hussein, although it was created as the result of interrogation and photo albums rather than through metadata). But it likely is merely one of several tools rather than the panacea "Big Data's" apostles claim.

After all, if PRISM were a cure-all for manhunts, then why wasn't the government able to track down Snowden himself when they hunted for him in the days before his leaks were reported?!?

Ironically, the one manhunt in which metadata appears not to have helped was the hunt for NSA leaker Edward Snowden . . .

* Okay, I can’t resist making three points on Snowden. Even if the privacy advocates are 100% correct about the dangers of PRISM – which I doubt they are – they should be careful about lionizing Snowden given that:

1- Many of his initial claims (i.e. “I could wiretap the President’s phone if I wanted") have proven to be false;

2- Although he claims his leak was motivated by idealism to protect Americans' civil liberties, his most recent revelations have been about spying on foreign leaders at the G-8 and G-20 summits? How exactly does such espionage pose a threat to the privacy of U.S. citizens?

3- I have a little trouble taking somebody seriously who wraps himself in a martyr’s cloak, saying “I’m prepared to face the consequences of my action,” and then flees to China (which surely he must know has far less regard for civil liberties than even the most paranoid conception of the United States).

And we won’t even go into the question of trusting the judgment of a man who leaves a pole-dancing girlfriend behind in Hawaii . . .


No comments:

Post a Comment