OPAL’s technical components are developed by teams at MIT, Imperial College London, and Orange Labs.
The core of OPAL consists of an open and secured platform and algorithms that can be run on the servers of partner companies behind their firewalls to extract key development indicators of relevance for a wide range of potential users.
OPAL provides a mechanism for using private individual-level data safely. To achieve this, data providers (e.g. telcos) push data to the OPAL platform. OPAL then pseudonymizes the data on the fly and stores them in a secure fashion. One OPAL instance uses the same key for pseudonymization (including from one ingestion to the other for consistency). After the data has been pulled into OPAL’s local database, authorized users can run algorithms within OPAL’s trusted compute space through the OPAL SafeAnswers API.
The OPAL architecture follows a microservice structure and as such it is fully scalable and modular. Each component of the OPAL system is independently tested using continuous integration. Performance information and health of every component is monitored; alerts are directly sent to system administrators if components go offline.
There are a number of required technological components and services for the OPAL platform to be a minimum viable product (MVP).
OPAL Questions and Answers use MapReduce model
“Questions” are defined by an algorithm in Python, specifying which computation to run on each subscriber’s records, as well as how to aggregate them. OPAL implement the common MapReduce model to easily process and aggregate large scale data sets.
There are three steps when executing a “question”:
1. Pre-processing: Run pre-processing tasks and import labels (e.g. list of all antennas in the data set, geographic areas, gender information).
2. Mapping: Map individuals to labels: for each individual, assign a value to each label (e.g. how many time each subscriber was connected to each antenna),
3. Reduce: Aggregate across individuals and return a single number or a distribution for every label (e.g. sum all counts by antenna and return a density map, with the number of subscribers per antenna). OPAL supports diverse aggregation schemes, such as counting individuals’ values, returning their sum, average, standard deviation, etc.
OPAL manages disclosure risks using a combination of server-side security, pseudonymization, fine-grained authorization for algorithms, limiting which private information can be stored locally or exported, and ensuring that an attacker is unlikely to learn information about an individual user.
Data privacy is accomplished in three steps:
Pseudonymization & SafeAnswers : Limiting nformatio ccesse lgorithms and limiting rivat nformatio eturne lgorithms so that results cannot be used to re-identify individuals.
OPAL API : Controlling uer equests while ensuring authezntication, authorization and auditing.
Network Security : Use of secure protocols and design
Beyond the MVP phases, other features can increase the value, security, and ease of use of the OPAL platform. Future designs will be based on a safe, stable, and scalable platform.
A techno-institutional innovation based on two components: governance and technology
Developed through participatory design, an orientation committee, and capacity building activities