It provides a description of the CDR aggregates, along with method details, that Mobile Network Operators can produce from their raw CDRs to build mobility indicators. Code used to derive these aggregates can be accessed from our GitHub.
The number unique subscribers recorded in a given area during a given time interval. This corresponds to the number of active users.
To specify areas we suggest using administrative units and computing the number of unique subscribers seen in large areas (e.g. units level 1 (province) or level 2 (district) or 3) and also the number of unique subscribers seen in smaller areas (level 4 or 5 (wards)). Groups of nearby cell towers may also be used to define small areas.
For the time interval, we suggest computing the number of unique subscribers seen in a given areas using 1 hour, 1 day, and 1 week.
These suggestion are valid for all other aggregates below.
The unique number of subscribers seen in different time intervals and in areas of different sizes lead to different insights.
Count_subscribers (hour, local)
Count_subscribers (day, local)
Count_subscribers (week, local)
Count_subscribers (day, regional)
Count_subscribers (week, regional)
Count_subscribers (15min, urban cluster)
Subscriber presence
Population mixing
Intra-regional travel
Hotspots
Count of unique subscribers, per region per time interval
For each time interval (hour/day/week), and each region size (cluster, admin 4/3/2/1):
For each region, count the number of unique subscribers that used their phone at any cell tower in that region, at any time during the time interval.
Only output counts > 15.
The number of subscribers who appear to be residents of each region. The location of residence is updated every week and is the location a subscriber visited on most days in the past 4 weeks. It is a reference location and each subscriber is assigned one.
Count_residents (week, local)
Home location
Home location counts (number of ‘resident subscribers’) per region
Count the number of subscribers assigned to each region (calculated in ‘Home location of each subscriber’), during the specified period. Only output the results for regions where the counts is greater than or equal to 15.
The number of subscribers that travel between any two locations within the time period.
There are 2 types: all locations (all pairs of locations visited in a trip: a subscriber travelling from A to B then to C, is counted between A->B, B->C but also A->C), consecutive locations only (A->B, B->C) .
od_matrix_directed_all_pairs (hour, local)
od_matrix_directed_all_pairs (hour, local)
od_matrix_directed_all_pairs (day, regional)
od_matrix_directed_consecutive_pairs (hour, local)
od_matrix_directed_consecutive_pairs (hour, local)
od_matrix_directed_consecutive_pairs (day, regional)
Inter-regional travel: travel distance, dispersion, mixing factor,
Regional connectivity
Inter-regional travel: flows
Number of subscribers that travelled between each directed pair of regions, per time interval
This aggregate counts the number of subscribers that are seen at any pair of locations within the specified time interval. Directional information is included, so A -> B is distinguishable from B -> A.
For each time interval (hour, day) and region size (cluster, admin 4/3/2/1):
For each subscriber, create a list of the unique regions that they used their phone in that time interval.
From the list created in step (1), compute all possible unique ordered pairs. For each unique pair of regions, count the number of subscribers that have that pair in their list.
Only output counts > 15.
Number of trips / directional connections between each pair of regions, per time interval (origin-destination matrices)
This aggregate contains different information to (i) in the following respects:
Includes the direction of travel, so A -> B and B -> A are counted separately
Includes the number of ‘stays’ (consecutive calls made from the same location A -> A), and the number of subscribers who ‘stayed’ in a single region i.e. spent enough time in a single region to use their phone more than once, within the specified time interval.
For each time interval (hour, day) and region size (cluster, admin 4/3/2/1):
For each subscriber, list the unique regions that they visited within the time period (hour or day), ordered by time.
Create pairs of regions by pairing the nth region with the (n+1)th region. For example, the sequence [A, A, B, C, D, D, A] would result in the pairings [AA, AB, BC, CD, DD, DA].
For each pair, count (i) the number of times that pair appears (total ‘number of trips’), and (ii) the number of unique subscribers who have that pair in their list (total ‘number of subscribers making each trip’).
Only output counts > 15, in both cases.
Alternative to OD (Origin-Destination) matrix with no direction of movement. The number of subscribers that travel between any two locations within the time period, irrespective of the direction of travel.
od_matrix_undirected_all_pairs (hour, local)
od_matrix_undirected_all_pairs (hour, local)
od_matrix_undirected_all_pairs(day, regional)
Inter-regional travel: travel distance, dispersion
Number of subscribers that travelled between each pair of regions, per time interval
This aggregate counts the number of subscribers that are seen at any pair of locations within the specified time interval. Directional information is not included, so A -> B is indistinguishable from B -> A.
For each time interval (hour, day) and region size (cluster, admin 4/3/2/1):
For each subscriber, create a list of the unique regions that they used their phone in that time interval.
From the list created in step (1), compute all possible unique pairs, ignoring ordering. For example, if a subscriber visited regions [A, B, C] in one day, then the pairs would be [A, B], [A, C], [B, C]. (Because ordering is ignored, [B, A] is identical to [A, B]).
For each unique pair of regions, count the number of subscribers that have that pair in their list.
Only output counts > 15.
The number of subscribers who are residents of region Xi and were recorded in region Xj, for all pairs of regions, within a given time interval.
Count_visits_home_away (hour, local)
Count_visits_home_away (day, regional)
Inter-regional travel from home (per subscriber aggregate equivalent)
Count of ‘home’ and ‘away’ visits (‘home-away matrix’), per time interval
For each pair of regions R1 and R2 (including R1 = R2), count the number of unique subscribers whose home location is R1 and that used their phone in R2 during the specified time interval. Only output the results for pairs of regions where the count is greater than or equal to 15.
The number of subscribers that have changed their residence from region Xi to region Xj, for any pair of region, in the last week.
Count_home_relocations (week, regional)
Home location
Count of home relocations, per time interval
For each pair of regions R1 and R2 (including R1 = R2), count the number of unique subscribers that were previously assigned to R1 as their home region, and at a later date were reassigned to R2. Only output the results for pairs of regions where the count is greater than or equal to 15.
The number of subscribers that are only seen within a single region, in a given time interval.
Count_subscribers_single_region (day, regional)
Count_subscribers_single_region (week, regional)
Inter-regional travel
Count of subscribers that are seen only in one region
For each region and specified time period, count the number of subscribers that used their phone in that region, and who only used their phone in that region. Only output the results for regions where the count is greater than or equal to 15.
The number of subscribers that are only seen within their home region during a given time interval.
Count_subscribers_home_region (day, regional)
Count_subscribers_home_region (week, regional)
Inter-regional travel
Count of ‘static’ residents, per region per time interval
Count the number of unique subscribers that used their phone only within their assigned home region, within the specified time interval. Only output the results for regions where the count is greater than or equal to 15.
The total number of data records during a given time period. Depending on the dataset, this will be equal to some combination of the number of calls sent and received, the number of SMS’s sent and received, and the number of mobile data sessions. This can be used to scale other aggregates
Count_events (hour, local)
Sample size / data quality indicators
Number of phone events (calls / SMS), per admin 4 region (or cluster), per hour
It is necessary to count the total number of calls recorded each day in order to check whether any apparent increase/decrease in mobility is actually just due to an increase/decrease in phone usage. This is because we only ‘see’ a subscriber in the dataset when they use their phone. It may be the case that a subscriber normally travels a lot and visits several different regions, but only ever uses their phone when they are at home. Therefore, we would not be able to detect that they have visited other regions. If they start to use their phone more but maintain their normal travel behaviour, then we will start to see them in different regions and may then conclude that they are now travelling more, when in fact they are just using their phone more frequently.
If you observe a significant change in call volumes occurring at the same time that mobility restrictions were introduced, then you should bear this in mind when interpreting any apparent ‘changes’ in mobility behaviour.
The number of subscribers that are recorded within their home region, within an hour or a day. This can be used to scale other aggregates.
Count_active_residents (hour, local)
Count_active_residents (day, regional)
Sample size / data quality indicators
Count of active residents, per region per time interval
Count the number of unique subscribers that used their phone within their assigned home region within the specified time interval. Only output the results for regions where the count is greater than or equal to 15.
The code behind each CDR aggregate has been incorporated into FlowKit. FlowKit is Flowminder's open source suite of software tools that is designed to enable the secure access and analysis of mobile operator data for humanitarian and development purposes. For operators who already have FlowKit installed, this will enable them to very easily produce the listed aggregates.
For information about using FlowKit, please contact flowkit@flowminder.org.