Federated Learning


Federated Learning (FL) is a distributed machine learning approach which enables training  on a large corpus of decentralised data residing on devices like mobile phones. FL is one instance of the more general approach of “brining the code to the data, instead of the data to the code” and addresses the fundamental problems of privacy, ownership, and locality of data. The idea is to use data from a number of computing devices like smartphones instead of a centralised data source.

A basic design decision for a Federated Learning infrastructure is whether to focus on asynchronous and synchronous training algorithms. Even in data centers, there has been a consistent trends towards synchronous large batch training and successful work on deep learning has used asynchronous training. The Federated Averaging algorithm takes a similar approach. Further, several approaches to enhancing privacy guarantees for FL, including differential privacy and secure Aggregation, essentially require some notion of synchronisation on a fixed set of devices, so that server side of the learning algorithm only consumes a simple aggregate of the updates from many users.

The training protocol

The system involves devices and the Federated Learning server communicating availability and the server selecting devices to run a task. A subset of the available devices are selected for a task. The Federated Learning server instructs the devices what computing task to run with a plan. A plan would consist a TensorFlow graph and instructions to execute it. There are three phases for the training to take place:

  1. Selection of the devices that meet eligibility criteria
  2. Configuring the server with simple or Secure Aggregation
  3. Reporting from the devices where reaching a certain number would get the training round started

The devices are supposed to maintain a repository of the collected data and the applications are responsible to provide data to the Federated Learning runtime as an example store. The Federated Learning server is designed to operate on orders of many magnitudes. Each round can mean updates from devices in the range of KBs to tens of MBs coming going the server.

Data collection

To avoid harming the phone’s battery life and performance, various analytics are collected in the cloud. The logs don’t contain any personally identifiable information

Secure aggregation

Secure aggregation uses encryption to make individual device updates uninspectable. They plant to use it for protection against threats in data centers. Secure aggregation would ensure data encryption even when it is in-memory.

Challenges of federated learning

Compared to a centralised dataset, federated learning poses a number of challenges. The training data is not inspectable, tooling is required to work with proxy data. Models cannot be run interactively and must be compiled to be deployed in the Federated Learning server. Model resource consumption and runtime compatibility also come into the picture when working with many devices in real-time.

Applications of Federated Learning

It is best for cases where the data on devices is more relevant than data on servers. Ranking items for better navigation, suggestions for on-device keyboard, and next word prediction. This has already been implemented on Google pixel and Gboard.

Future work is to eliminate bias caused be restrictions in device selection, algorithms to support better parallelism (more devices in one round), avoiding retraining already trained tasks on devices, and compression to save bandwidth.

Federated Computation and edge computing

Federated learning and edge computing are very similar, there are but subtle differences in the purpose of these two. Federated learning is used to solve problems with specific tasks assigned to endpoint smartphones. Edge computing is for predefined tasks to be processed at end nodes, for example, IoT cameras. Federated learning decentralises the data used while edge computing decentralises the task computation to various devices.

For more details on the architecture and its working, you can check out the research paper.


Leave a Reply