Introduction to mobile phone data
The analysis of mobile operator data, for social-good purposes, is a field that has been rapidly growing over the past decade.
The growth has been fuelled by the discovery of a large number of use cases that can be addressed by this type of data, including disaster response, infrastructure planning and poverty mapping, together with the increasing availability of technologies that enable data of this size and complexity to be analysed.
For humanitarian and development applications, much of the value of mobile operator data stems from the possibility to extract insights about the mobility and social behaviour of large numbers of people. Since the data are continuously collected in an automated fashion by mobile operators for their own operational purposes, it is not necessary to invest additional time or money in the collection of the data - the data already exist and aggregated insights, wich are not senstive to privacy can be produced from it.
In settings where it is impossible or difficult to obtain accurate and up-to-date survey data about population movements and behaviours, mobile operator data can provide vital input to decisions that affect the wellbeing of large populations.
About mobile operator data
Mobile network operators (MNOs) collect several types of data in order to record the activities of their customers and networks. For humanitarian and development purposes, the most commonly analysed datasets are Call Detail Records (CDRs) combined with cellular network data. These are described below. Other types of data, such as signalling data, can also be used if available.
Cellular network data
A cellular network is based on a set of base stations, each of which typically has one cell tower. Each cell tower has several radio antennae mounted on it that are directed in different orientations. The area within the radio coverage of each antenna is called a cell. Each cell is assigned a unique identifier and is associated to a unique geographical location. The maximum theoretical range of an antenna is a few tens of kilometres, but in practice it is often shorter due to terrain e.g. mountains or dense vegetation.
In urban areas, towers are typically spaced less than a few hundred metres apart. In rural areas, towers may be several tens of kilometres apart. A phone will usually connect to the closest antenna, or the one with the strongest radio coverage.
In a cellular network dataset, the location of each cell is provided. Additional information about the antennae may also be available, such as the directional orientation, service date, and coverage range. An example fake dataset is shown below. The columns of this dataset that are essential for CDR analysis are ID, which is the unique identifier of the cell, and LOCATION, which is the latitude and longitude of the cell. The remaining two columns, which may not always be provided by a mobile operator, are AZIMUTH, which describes the angular orientation of the antenna, and DATE_OF_FIRST_SERVICE, which specifies the date that the cell became operational.
Call Detail Records (CDR)
Call detail records (CDRs) are data that are collected by mobile network operators (MNOs) for billing purposes. One record is generated each time a telecommunications event occurs; examples of this are when a call or SMS is made or received, or when a mobile data session is initiated. The record includes an identifier of the SIM card, the timestamp of the event, the ID of the cell through which the transaction was routed, and details of the event such as the duration of a call. Several other fields also exist. A mapping is provided separately to link each cell ID to the cell’s location.
On the left is an example of what a CDR dataset may look like. MSISDN is the phone number of the calling/sending party, and MSISDN_COUNTERPART is the phone number of the receiving party. CELL_ID is the identifier of the cell that the event was routed through, EVENT_TYPE specifies the type of event, and TIMESTAMP is the time of the event.
Signalling data
Signalling data are generated as a result of the continuous communication between a mobile device and a mobile network, for the purposes of maximising signal quality.
Like CDRs, this dataset includes a timestamp and the cell tower associated with each event. It has the advantage of having much higher temporal resolution because one data point is generated every few seconds or minutes whilst a mobile phone is switched on and is within a signal coverage area. These data are generated for both passive devices (those that are switched on but are not in use) as well as active devices. Signalling data therefore provides very high-temporal resolution information about any network subscriber who has their phone switched on.
However, the storage and processing of signalling data is very challenging and expensive for many mobile operators, because of the very large data volumes involved. Consequently, it can be significantly more difficult to gain access to a signalling dataset, and to analyse it, than a CDR dataset.