SoterOne’s Distributed Management Systems

Do repost and rate:

SoterOne uses a new distributed management system to coordinate the federated learning process within SoterOne ecosystem, which we call “SoterOne Service”. In this patent, we introduce the details and innovation of the SoterOne Service.

1. Federated learning

Federated learning is a machine learning technique to train data models across multiple parties without exchanging the original data. This helps to join data between multiple parties and preserve privacy at the same time.

However, the federate learning runtimes are still within some domains and not fully open to the public.

The challenges that are open to the public are:

1) Public parties are decentralized and might report incorrect information. There is no good solution to schedule federated learning jobs on the parties that have an unknown execution environment.

2) Public parties are more constrained with data usage.

3) Facing more public users requires a decentralized and distributed management system. And the malicious behavior detectors need to be decentralized as well due to the single point resource and failure concerns.

2. SoterOne Service and its responsibilities

SoterOne Service (SS) is developed to provide a distributed management system for federation learning that is available to the public that helps to explore the power of data across different parties. This management system can penalize malicious or fraudulent public parties who report incorrect information and fulfill different privacy requirements for different public entities.

It introduces a reputation feedback technique to encourage participants to report correct information. The reputation score is defined as following:

Q is the reputation update function. The Management System, QC feedback, and existing reputation score all contribute to the updated score. The reputation score is shown in public for QC’s future reference. Also the incorrect resource profile detected by the Management System will be deprioritized when dispatching queries.

Also, it is a naturally distributed system that each of its components can be split into microservices and be scaled horizontally.

The management system has four major responsibilities.

1. Node management

2. Dataset metadata management

3. Query management

4. Resource management

3. Work details of SoterOne Service

As shown in the figure:

A. Node management (Node Manager 9): All of the parties (DO 3, MPC 4, and QC 5) that participate in the federated learning process are managed by the Management System. Their profiles that need to be registered and maintained.

QCs are the customers that initiate the federate learning process from the user interface. They use encrypted RESTful API 6 to communicate with the web service that builds on the top of the Management System. They can initiate a federated learning process first by choosing needed datasets and MPC nodes from the Dataset/MPC Catalog. Then constructing a federated learning query by uploading model and data configurations. Finally, they can kick off the learning process.

DO/MPC is the data and computation resource owners in the federated learning process. They will run executable binaries that are provided by the platform and use encrypted RPC 7 talking with the platform. Those binaries will register themselves to the platform and periodically send out heartbeats to the Management System reporting their liveness and the resource usage snapshots.

Once DO/MPC/QC has done all the work and decided to leave the platform. They can unregister from the platform and all sensitive information will be scrubbed or removed.

Due to privacy concerns, entities only have the access to the resource that they own. All requests will do permission validations and might be blocked by the Management System if the permissions are not set correctly.

B. Dataset metadata management (Dataset metadata Manager 10): Dataset metadata will be uploaded by its Data Owner (DO) through CLI and be aggregated by the Management System. All of the sensitive data will be masked during the upload. Once uploaded, the Management System will group and filter them in various ways to help QC identify the best-fit datasets for their queries. With proper grouping, the platform will also explore more federate learning opportunities and attract more QCs to try on the platform.

The Dataset/MPC Catalog will also be shown by the Management System. It will not only show the dataset metadata but also keep track of usage statistics for Datasets and MPCs including the number of times chosen, success/failure rate and job histories, etc.

C. Query management (Query Manager 11): Federated learning query will be initiated by the QC and then managed by the Management System. The Management System will assign jobs to selected entities, monitor the whole job clusters, and return the success/fail status to the QC. If the job fails, the Management System will start a best-effort failure recovery process to recover the job. If the job is not recoverable, the Management System will start a cleanup process. The query management process is illustrated below:

1. QC submits a query to the Management System.

2. The Management System validates the availability of the entities (MPC, DO, and dataset compatibility) that are included in the query. Set query status to “query dispatched” and broadcast the message through the broadcast communication channel 8. The typical broadcast communication channel can be P2P Network and Blockchain.

3. Once DO/MPCs watch the QueryDispatched message broadcasted, they will connect with other participants and start the query execution process.

4. The Management System will not be involved in the federated learning process after the query process starts. It will only monitor the activeness of the nodes and wait for the result.

5. Once the final result received by the Management System, the Management System will notify the QC and start the cleanup process after time by broadcasting a “query cleanup” message to all entities. All query-related information will be deleted or scrubbed in DO/MPCs. Only the Management System will store query statistics as well as the final query result.

6. If MPC crashes during the learning process, the Management System will choose a different MPC and broadcast a “query MPCUpdated” message containing both the old and the new MPC for DO to switch the connection.

a. The new MPC is chosen based on MPC’s liveness, version, and reputation.

7. If DO crashes during the learning process, the Management System will directly broadcast a “query failed” message through the communication channel. All parties receiving the message will start an internal cleanup process.

8. If the Management System crashes during the process, all parties will continue to execute the query until it finishes. The party who holds the final result will retry for ? of time. And the internal cleanup process will be launched in all parties if the Management System not available after ? +? of time.

D. Resource management: The Management System will manage the resource usage of the nodes to prevent them from overloading. The overloading of computation resources will cause races of resources and impact all federated learning jobs running on that node. Each node needs to report its resource profile during the registration and periodically sends resource usage snapshots through the heartbeats.

SoterOne realizes the rational use of federal learning technology for the public and provides a safe and reliable execution environment for data sharing. At the same time, it protects data privacy and improves the use-value of data by encrypting data. This is also the original intention of SoterOne. On the issue of data privacy protection, SoterOne combines encryption technology with machine learning and develops an encrypted machine learning algorithm — — vertical federated logistic regression machine learning, which we will introduce in next week’s patent.

Regulation and Society adoption

Ждем новостей

Нет новых страниц

Следующая новость