Fighting Crime with Big Data
When the terms ‘big data’ and ‘algorithmic analysis’ are used, it sometimes feels like a game of buzz word bingo.
As a lawyer, I always say that it’s important to focus on what is really going on in any particular case – it’s often not possible to generalise. We might be looking at a complex system that compares a number of large databases in order to identify hidden patterns and weak correlations, or merely a simple algorithm that flags up duplicate entries in a database.
In the policing and crime context, there are systems which use recorded crime data to predict areas where offences are likely to take place, and thus help the police take decisions about where and when to allocate resources. These systems, such as Predpol (developed by a company in Santa Clara and used by Kent Police) also consider the demographics of the area, the people, the places and what kind of buildings these are. Predpol says that the software ‘does not replace, but requires, the insights of veteran officers and crime analysts.’ So in a way, it could be seen as a crime disruption tool, with the aim of preventing crime happening in the first place.
Other tools can analyse and compare large (bulk) databases, such as telephone or internet connection records, travel data, or databases of people with certain characteristics, for example those with air-side access at an international airport. These tools can identify threats, sometimes from fragments of intelligence, establish links between known subject of interests (and unknowns), and connect sometimes anonymous online personae to real world identities. There are other tools in use, in particular in a number of US States, that are used to more directly feed into immediate decisions or judgements about individuals, for instance (in the case of software developed by a company called Northpointe) to assess the risk of reoffending and thus inform parole and sentencing decisions. Northpointe say that the tool analyses information such as whether the defendant has a job and their education levels, but the precise details are proprietary.
From a legal perspective, where Big Data or algorithmic analysis results in potentially intrusive action or decisions that might adversely affect individuals, it needs both justification and transparency.
Is it explainable and is it fair? In the language of human rights, is it necessary and proportionate? Relying on the consent of individuals is not a concept that works in the context of policing powers so it’s up to the law to regulate actions that might affect an individual’s privacy or freedom. A bulk dataset is one which may contain a large amount of information about ‘innocent’ individuals, thus raising legitimate privacy concerns.
Under the new Investigatory Powers Bill, the acquisition of such datasets will need the sign-off of a judicial commissioner as well as the Secretary of State. Also Big Data/algorithmic analysis can reveal or deduce sensitive information about people that they had thought hidden or certainly obscure – and this information might be inaccurate or not present the full picture.
Tools that directly inform (or perhaps even dictate) judgements about individuals raise the most concerns I think, such as the algorithms used in the US to assess reoffending risk. It’s a principle of natural justice that a person should be able to understand the process by which significant decisions are made that affect his freedom, and be able to challenge them if he wants. But here even the people using the tools (judges, police) may not know how they work and what factors are being considered by the algorithm. Northpointe’s software has been accused of producing a racist result; others say that is not the case.
Who do we believe?
How do we find out in circumstances where the software is owned by a private company which refuses to reveal its workings? One could say that using an ‘objective’ algorithm is better than relying on a judge’s or police officer’s gut feeling – an algorithm isn’t affected by a bad day in the office; it doesn’t start from scratch for each person. But is an algorithm really objective? The factors that it considers are decided by humans; they may be applied consistently but if those factors are flawed, then all of the results could be undesirable, particularly if judges and police officers come under pressure to follow the algorithm’s direction although contrary to their own insight.
There needs to be a clear ethical and legal framework for the use of such tools, and even a new way of setting standards for how these algorithms work and how they are tested, determined by an impartial body.
We shouldn’t throw the baby out with the bath water however. Clever algorithms combined with today’s vast processing power can provide a better way of identifying patterns and correlations than human judgement alone. In its enquiry into intelligence material concerning Jimmy Savile, Her Majesty’s Inspectorate of Constabulary found that ‘the failure to connect the various allegations was critical to the eventual outcome of the investigations. There was intelligence available of four separate investigations which was never linked together and, because of that failure to ‘join the dots’, there was a failure to understand the potential depth of Savile’s criminality.’ (‘Mistakes were made’ March 2013).
Pretty much the same was said by the Bichard report into intelligence failures in connection with the Soham murders committed by Ian Huntley, 10 years before. (Bichard Enquiry Report, 2004). In addition, a succession of reports into the abuse or deaths of children or vulnerable adults have concluded that outcomes might have been different if attention had been paid to the patterns developing. (For instance, Daniel Pelka Serious Case Review, 2013, at 69; Winterbourne View Serious Case Review, 2012, at 135).
Big Data analytics could provide additional tools for the analysis of intelligence data in order to proactively identify patterns and connections that may indicate criminal or harmful behaviour (and indeed are so used by intelligence agencies in the protection of national security). This must be worth pursuing.
My colleague Jamie Grace (Sheffield Hallam) and I have recently carried out research into all UK police forces in which forces were asked (by way of freedom of information request) about their use of computational or algorithmic data analysis or decision-making in relation to the analysis of intelligence. Intelligence is a rather fluid term – it’s used to describe a type of information collected and also the results of analysis of information. It’s not ‘committed crime’ data. An example of information might be the fact that a person lives at a particular address; this information might become intelligence if it becomes known that such address has been the location of criminal activity such as drug dealing.
Oswald & Grace (2016)
N=43 (total number of responses)
The initial results of the research (hot off the press!) show that six forces (14%) indicated that they did use some form of computational/algorithmic data analysis or decision making software in relation to intelligence analysis.
This meant the majority (86%) of forces who responded answered that they did not. The nature of the algorithms in use was varied, from assessing risks to evaluating intelligence. This might seem rather surprising, bearing in mind what Big Data and algorithmic can do (and does do in other contexts) and the importance given to intelligence-led policing. We could even take the view that there should be a duty on the police generally to use these techniques, with a big proviso that there is considerably more transparency over methods, coupled with impartial oversight as to how algorithms are developed and used.
This is a version of Marion Oswald’s talk at the Big Data and Crime session at Cheltenham Science Festival 2016 with Hannah Fry and Timandra Harkness, chaired by Kat Arney.
About the Author
Solicitor Marion Oswald is a Senior Fellow in Law and Head of the Centre for Information Rights at the University of Winchester.
Posted by: Marion OswaldBack to media centre