CDR aggregates: list and methods

This section presents the CDR aggregates which can be used as the basis of a range of analyses for supporting low-and middle-income countries.

It provides a description of the CDR aggregates, along with method details, that Mobile Network Operators can produce from their raw CDRs to build mobility indicators. Code used to derive these aggregates can be accessed from our GitHub.

Counts of active subscribers

The number unique subscribers recorded in a given area during a given time interval. This corresponds to the number of active users.

To specify areas
we suggest using administrative units and computing the number of unique subscribers seen in large areas (e.g. units level 1 (province) or level 2 (district) or 3) and also the number of unique subscribers seen in smaller areas (level 4 or 5 (wards)). Groups of nearby cell towers may also be used to define small areas.

For the time interval, we suggest computing the number of unique subscribers seen in a given areas using 1 hour, 1 day, and 1 week.

These suggestion are valid for all other aggregates below.

The unique number of subscribers seen in different time intervals and in areas of different sizes lead to different insights.

Spatial and temporal resolutions

  • Count_subscribers (hour, local)

  • Count_subscribers (day, local)

  • Count_subscribers (week, local)

  • Count_subscribers (day, regional)

  • Count_subscribers (week, regional)

  • Count_subscribers (15min, urban cluster)

Indicators relying on this aggregate

  • Subscriber presence

  • Population mixing

  • Intra-regional travel

  • Hotspots

Method details

Count of unique subscribers, per region per time interval

For each time interval (hour/day/week), and each region size (cluster, admin 4/3/2/1):

For each region, count the number of unique subscribers that used their phone at any cell tower in that region, at any time during the time interval.

Only output counts > 15.

Count of residents (home locations)

The number of subscribers who appear to be residents of each region. The location of residence is updated every week and is the location a subscriber visited on most days in the past 4 weeks. It is a reference location and each subscriber is assigned one.

Spatial and temporal resolutions

  • Count_residents (week, local)

Indicators relying on this aggregate

  • Home location

Method details

Home location counts (number of ‘resident subscribers’) per region

Count the number of subscribers assigned to each region (calculated in ‘Home location of each subscriber’), during the specified period. Only output the results for regions where the counts is greater than or equal to 15.

Count of travellers (origin-destination matrix)

The number of subscribers that travel between any two locations within the time period.

There are 2 types: all locations (all pairs of locations visited in a trip: a subscriber travelling from A to B then to C, is counted between A->B, B->C but also A->C), consecutive locations only (A->B, B->C) .

Spatial and temporal resolutions

  • od_matrix_directed_all_pairs (hour, local)

  • od_matrix_directed_all_pairs (hour, local)

  • od_matrix_directed_all_pairs (day, regional)

  • od_matrix_directed_consecutive_pairs (hour, local)

  • od_matrix_directed_consecutive_pairs (hour, local)

  • od_matrix_directed_consecutive_pairs (day, regional)

Indicators relying on this aggregate

  • Inter-regional travel: travel distance, dispersion, mixing factor,

  • Regional connectivity

  • Inter-regional travel: flows

Method details (od_matrix_directed_all_pairs)

Number of subscribers that travelled between each directed pair of regions, per time interval

This aggregate counts the number of subscribers that are seen at any pair of locations within the specified time interval. Directional information is included, so A -> B is distinguishable from B -> A.

For each time interval (hour, day) and region size (cluster, admin 4/3/2/1):

  1. For each subscriber, create a list of the unique regions that they used their phone in that time interval.

  2. From the list created in step (1), compute all possible unique ordered pairs. For each unique pair of regions, count the number of subscribers that have that pair in their list.

  3. Only output counts > 15.

Method details (od_matrix_directed_consecutive_pairs)

Number of trips / directional connections between each pair of regions, per time interval (origin-destination matrices)

This aggregate contains different information to (i) in the following respects:

Includes the direction of travel, so A -> B and B -> A are counted separately

Includes the number of ‘stays’ (consecutive calls made from the same location A -> A), and the number of subscribers who ‘stayed’ in a single region i.e. spent enough time in a single region to use their phone more than once, within the specified time interval.

For each time interval (hour, day) and region size (cluster, admin 4/3/2/1):

  1. For each subscriber, list the unique regions that they visited within the time period (hour or day), ordered by time.

  2. Create pairs of regions by pairing the nth region with the (n+1)th region. For example, the sequence [A, A, B, C, D, D, A] would result in the pairings [AA, AB, BC, CD, DD, DA].

  3. For each pair, count (i) the number of times that pair appears (total ‘number of trips’), and (ii) the number of unique subscribers who have that pair in their list (total ‘number of subscribers making each trip’).

  4. Only output counts > 15, in both cases.

Count of travellers (connections triangular matrix)

Alternative to OD (Origin-Destination) matrix with no direction of movement. The number of subscribers that travel between any two locations within the time period, irrespective of the direction of travel.

Spatial and temporal resolutions

  • od_matrix_undirected_all_pairs (hour, local)

  • od_matrix_undirected_all_pairs (hour, local)

  • od_matrix_undirected_all_pairs(day, regional)

Indicators relying on this aggregate

  • Inter-regional travel: travel distance, dispersion

Method details

Number of subscribers that travelled between each pair of regions, per time interval

This aggregate counts the number of subscribers that are seen at any pair of locations within the specified time interval. Directional information is not included, so A -> B is indistinguishable from B -> A.

For each time interval (hour, day) and region size (cluster, admin 4/3/2/1):

  1. For each subscriber, create a list of the unique regions that they used their phone in that time interval.

  2. From the list created in step (1), compute all possible unique pairs, ignoring ordering. For example, if a subscriber visited regions [A, B, C] in one day, then the pairs would be [A, B], [A, C], [B, C]. (Because ordering is ignored, [B, A] is identical to [A, B]).

  3. For each unique pair of regions, count the number of subscribers that have that pair in their list.

  4. Only output counts > 15.

Count of visits at home and away (home-away matrix)

The number of subscribers who are residents of region Xi and were recorded in region Xj, for all pairs of regions, within a given time interval.

Spatial and temporal resolutions

  • Count_visits_home_away (hour, local)

  • Count_visits_home_away (day, regional)

Indicators relying on this aggregate

  • Inter-regional travel from home (per subscriber aggregate equivalent)

Method details

Count of ‘home’ and ‘away’ visits (‘home-away matrix’), per time interval

For each pair of regions R1 and R2 (including R1 = R2), count the number of unique subscribers whose home location is R1 and that used their phone in R2 during the specified time interval. Only output the results for pairs of regions where the count is greater than or equal to 15.

Count of home relocations (home origin-destination matrix)

The number of subscribers that have changed their residence from region Xi to region Xj, for any pair of region, in the last week.

Spatial and temporal resolutions

  • Count_home_relocations (week, regional)

Indicators relying on this aggregate

Home location

Method details

Count of home relocations, per time interval

For each pair of regions R1 and R2 (including R1 = R2), count the number of unique subscribers that were previously assigned to R1 as their home region, and at a later date were reassigned to R2. Only output the results for pairs of regions where the count is greater than or equal to 15.

Count of subscribers only seen in one region

The number of subscribers that are only seen within a single region, in a given time interval.

Spatial and temporal resolutions

  • Count_subscribers_single_region (day, regional)

  • Count_subscribers_single_region (week, regional)

Indicators relying on this aggregate

  • Inter-regional travel

Method details

Count of subscribers that are seen only in one region

For each region and specified time period, count the number of subscribers that used their phone in that region, and who only used their phone in that region. Only output the results for regions where the count is greater than or equal to 15.

Count of subscribers only seen in home region

The number of subscribers that are only seen within their home region during a given time interval.

Spatial and temporal resolutions

  • Count_subscribers_home_region (day, regional)

  • Count_subscribers_home_region (week, regional)

Indicators relying on this aggregate

  • Inter-regional travel

Method details

Count of ‘static’ residents, per region per time interval

Count the number of unique subscribers that used their phone only within their assigned home region, within the specified time interval. Only output the results for regions where the count is greater than or equal to 15.

Count of events

The total number of data records during a given time period. Depending on the dataset, this will be equal to some combination of the number of calls sent and received, the number of SMS’s sent and received, and the number of mobile data sessions. This can be used to scale other aggregates

Spatial and temporal resolutions

  • Count_events (hour, local)

Indicators relying on this aggregate

  • Sample size / data quality indicators

Method details

Number of phone events (calls / SMS), per admin 4 region (or cluster), per hour

It is necessary to count the total number of calls recorded each day in order to check whether any apparent increase/decrease in mobility is actually just due to an increase/decrease in phone usage. This is because we only ‘see’ a subscriber in the dataset when they use their phone. It may be the case that a subscriber normally travels a lot and visits several different regions, but only ever uses their phone when they are at home. Therefore, we would not be able to detect that they have visited other regions. If they start to use their phone more but maintain their normal travel behaviour, then we will start to see them in different regions and may then conclude that they are now travelling more, when in fact they are just using their phone more frequently.

If you observe a significant change in call volumes occurring at the same time that mobility restrictions were introduced, then you should bear this in mind when interpreting any apparent ‘changes’ in mobility behaviour.

Count of active residents

The number of subscribers that are recorded within their home region, within an hour or a day. This can be used to scale other aggregates.

Spatial and temporal resolutions

  • Count_active_residents (hour, local)

  • Count_active_residents (day, regional)

Indicators relying on this aggregate

  • Sample size / data quality indicators

Method details

Count of active residents, per region per time interval

Count the number of unique subscribers that used their phone within their assigned home region within the specified time interval. Only output the results for regions where the count is greater than or equal to 15.

Integration within FlowKit

The code behind each CDR aggregate has been incorporated into FlowKit. FlowKit is Flowminder's open source suite of software tools that is designed to enable the secure access and analysis of mobile operator data for humanitarian and development purposes. For operators who already have FlowKit installed, this will enable them to very easily produce the listed aggregates.

For information about using FlowKit, please contact flowkit@flowminder.org.