EPIC: Large Scale Federation Support

Summary
----
The purpose of these updates is to allow Portico to operate in extremely large federations that blend a mix of high and low capability devices (in both a computation and network sense).

The ultimate goal is to put in place a structure that can support:

- 1000+ federates
- A blend of server-class and low-power (IoT) devices
- Support sub-clusters of high-intensity federates in a way that doesn't negatively impact the QoS for the rest of the federation


Background
----
As part of an ongoing process working with the US National Institute of Standards, we have been looking at way to enable Portico to serve both large federations (>100 federates) and those that simultaneously contain federates running on high-powered infrastructure alongside those that may be running on low-powered or bandwidth constrained devices (such as IoT appliances).

The particular simulations that NIST envisages supporting as part of its UCEF initiative (Universal CPS Environment for Federation, where CPS is Cyber-Physical Systems) are ones that bring requirements that are challenging to meet and which the currently communications operations certainly cannot stretch to accommodate. These include:

- extremely large numbers of federates (1000+)
- federates spread gepgraphically
- spread across multiple control domains
- federations containing small sub-clusters of "high-intensity" federates
	- devices that exchange considerable data between them, but which only a limited amount is useful outside that sub-cluster
- a mix of high-power (server grade) components and low-power (IoT) devices 

Portico requires changes to support these sorts of federations. The updates necessary to support the extreme end of these environments must also be done in a manner that doesn't impact the easy-of-use of the current fully-distributed, serverless model.

While wonderly simple in many ways, this serverless model has caused a number of problems that we will ultimately be able to address at the same time as this work:

- Federation join process can be unreliable when a number of federates attempt to start simultaneously
- As we are fully-decenatralized, each federate must track the activities of every other federate
	- This creates some memory footprint issues
	- This can create unnecessary CPU consumption to just keep track of the accounting requirements for participating in a federation
- No easy way to see the federates involved in a federation without joining it
- Network configuration issues to do with multicast cause common problems that result in federates not able to see one another


High-Level Design
----
The high-level design for this structure can be seen in the following diagram:

![wp3-forwarder-architecture](https://user-images.githubusercontent.com/912209/34649995-416d7050-f3f4-11e7-990d-2597076781ed.png)

Key points to note here are:

- There will be a central RTI server process (will allow this to be transparently auto-started inside first federate also)
- There will be separate 'Control' and 'Data' channels for information exchange
	- Control will be for Federate<>RTI exchanges. This covers all services except attribution reflections and interactions
	- Data will be for "group" data communications. This covers attribute reflection and interactions
- The administration and service provision of a number of HLA services will move from decentralized back to a central RTI process ('control' channel)
- The exchange of attribute updates and interactions (the vast majority of traffic in a federation) will remain decentralized ('data' channel)
- Federates will be able to connect directly to an RTI, or through a "Forwarder" (much like the curernt WAN forwarder)
- The Forwarder will act as both a data router, and a firewall
	- All 'control' messages will be routed back to the RTI
	- Only a subset of 'data' messages will be allows to pass (defined in configuration)
- Federates will still filter messages on the receiver side
	- When used with clusters behind forwarders, this may be a reduced set of data

Task List
----
This work is broken down into three phases:

**Phase 1:** Create the Infrastructure for Central RTI
- 1a: (#219) Create the RTI Server, Message Loading and Communications Infrastructure (control/data)
- 1b: (#219) Create updated LRC, Message Loading and Communications Infrastructure (control/data). Exchange PING with Server.

**Phase 2:** Port HLA Services to Central RTI
- 2a: (#220) Port the Big-4 Federation Management Servies
- 2b: (#221) Synchronization Points
- 2c: (#222) Publication and Subscription
- 2d: (#223) Updates & Interactions
- 2e: (#224) Time Management
- 2f: (#225) Ownership
- 2g: (#226) Save/Restore
- 2h: (#227) Misc Services

**Phase 3**: Cluster / Forwarder Infrastructure
- 3a: (#228) Create Forwarder Infrastructure (upstream/downstream connection, local discovery, ...)
- 3b: (#228) Extend RTI Server to accept connections from Forwarders (TCP)
- 3c: (#228) Extend RTI Server to accept connections from Forwarders (Multicast)
- 3d: (#228) Big-4 Testing
- 3e: (#228) Message Exchange
- 3f: Benchmarking

  
  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EPIC: Large Scale Federation Support #218

Summary

Background

High-Level Design

Task List

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EPIC: Large Scale Federation Support #218

Description

Summary

Background

High-Level Design

Task List

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions