Startup Shutdown Synchronization Protocol (SSSP v2.0)

Startup Shutdown Synchronization Protocol (SSSP v2.0)¶

SSSP defines signal handling for modular systems during the startup phase until all modules are fully initialized, during operation phase to synchronize all modules, and during the shutdown phase so that the system turns off in a controlled and safe manner or restarts.
The complexity of the protocol is quite low and designed in a way that modules which do not implement SSSP will not compromise system operation.
Hence, only two GPIO signals are required:

S - synchronize
PD - power down

Both must be designed in a way that they realize a logical OR on activation (one or more nodes are active) and a logical AND on deactivation (all nodes are inactive) respectively.
This can easily be implemented using active-low, open-drain signals with pull-up resistors (wired-AND) and define a low signal as active state.
A wired-OR topology and active-high signals are equally fine for any specific implementation, of course.

Although these two mandatory signals suffice to implement the protocol, some optional features require further signals and communication interfaces:

N, P - next and previous (GPIOs)
B - a more sophisticated, preferably real-time capable communication bus
C - a dedicated clock signal for system synchronization during the Operation Phase.

N and P must realize a circular daisy-chain, so that each module can communicate with the next module in the system topology and the according back channel (an additional wire from connector to connector) just connects through.
Hence, the output signal N of each module is connected to the input P of its successor and two additional pins are required for the connectors (four in total; two per connector).
In case an architecture implements this daisy-chain but a module does not feature any logic to evaluate P or set N, it must keep these signals unconnected (breaking the daisy-chain) rather than to connect P through to N.
Due to the circularity of this signal and for clock synchronization during Operation Phase there must be a single module acting as master node, while all others are slaves.
Although B in general may be any bus signal, examples for recommended interfaces are CAN and FlexRay.

Some stages of SSSP are optional and may or may not be implemented by a module.
Note that a heterogeneous setup with some modules supporting the optional stages and others which do not, is fully compatible.
However, these optional features will only apply successfully if all modules support them.
System operation must thus not rely on the additional information (see Startup Phase), but may take advantage of it if available.

In order to make the protocol adaptable to any system, it defines three parameters:

D - delay time
T - timeout period
F - synchronization frequency

D defines a time period, which is used by the protocol for synchronization barriers, while T defines the maximum delay before timeouts are detected.
These two parameters must be identical for all nodes within a system and T, obviously, must be greater than D.
F defines the frequency at which S is toggled during the Operation Phase to synchronize all modules in a system.
Recommended values for these parmeters are D = 1ms, T = 10ms, and F = 1Hz.

In some cases it might be necessary to define D and/or T differently for specific stages.
This can be achieved by defining further parameters like T_{startup_3_1}, which would supersede the default in startup stage 3.1, while other stages still use T.
As a result, the basic parameters D and T may only be unspecified if there are custom parameters for all stages of the protocol.
Furthermore, defining a timeout parameter to be infinite is completely valid and will deactivate the timeout functionality.

Disclaimer: This is a draft version. All information and specifications stated below must be assumed to be modified in the future. If you want to use this version of SSSP, please contact a project manager. We will gladly freeze this version and move any further modifications to a new version (e.g. 2.1). You are also invited to propose any modifications to this version.

Startup Phase¶

All modules must initialize the signals in a way, that S is active and PD, N, and P are inactive.
PD must stay inactive during the startup phase, or the shutdown phase will be initiated either immediately by the bootloader or the operating system as soon as it is active.

Each module executes the following steps:

basic initialization
1. initialization of required signals and voltages
  This first stage primarily effects modules that provide energy.
  These must deactivate S only when the power is up and stable.
  All other modules may set S inactive as soon as they are powered up.
  In order to prevent erroneous behavior due to incorrect signals during the initialization, this stage takes at least one period D (at least one module must delay deactivation accordingly).
2. waiting for synchronization
  Each module waits for S to become inactive (all modules are initialized) as a first synchronization barrier.
3. synchronous start of stage 2
  As soon as S is inactive, the master node activates it again in order to start the next stage.
  To ensure that each module had enough time to detect the inactive state of S, the master node must delay the activation by at least one period D.
operating system initialization
1. complete system startup
  Each module activates S again and fully initializes (e.g. starts the operating system, initializes local hardware, etc.).
  As soon as it is ready, it deactivates S again.
  When a module indicates to be ready, at least the main communication channel must be fully operational.
  While it will usually act as B, in cases where these are two distinct interfaces, B must be fully operational as well at this point.
  If there is no such communication bus B at all, this requirement does not apply, of course.
  Again, S must be active for at least one period D, so every module can detect the activation.
2. waiting for synchronization
  Each module waits for S to become inactive (all modules are ready).
  Only now it is safe to use the main communication channel (and B) and all modules are able to receive messages correctly.
assigning module identifiers [optional]
This stage is optional and only applies if B exists.
Furthermore, it will only be successful if all modules fully implement N and P and there are no exceptional cases to these signals as described above.
The 'module IDs' assigned in this stage can later be used to represent a hierarchy within the system or to address/identify individual modules.
1. initiation of this stage
  The master node initiates this stage by broadcasting a unique command via B to all modules, so they can interpret the upcoming communication via B, N and P correctly.
  All supporting modules must wait at least one period T for the master's message before skipping this stage (similar to abortion; see below).
  As soon as the initiation command was received, all modules activate S for later detection of failure and set a timer to one period T in order to detect timeouts (which would lead to abortion of this module stack initialization).
2. starting the sequence
  The master module broadcasts its own module identifier (e.g. 1) via B.
  Right after that, it signals the next module to continue by setting N active for at least one period D, but keeps S activated for now.
  Note that an identifier value of 0 is reserved and must not be used by any module.
3. iterating over all modules
  This step is subdivided into two actions, which are triggered on different events and are repeated until one of the termination conditions is fulfilled (see below).
  All modules have to execute this stage.
  - message received via B
    If a message that holds an ID of another module is received via B, the timer as mentioned above is reset to T.
    Moreover, the received module identifier is checked, whether it is greater than the one before.
    If this rule is violated, an abort message is broadcasted via B and the stage is aborted.
  - triggered by P
    When a module is triggered by the activation of P (the preceding module activated N), it broadcasts its own module identifier via B, which is defined to be greater than the last one.
    Then again, it deactivates S and triggers the next module to continue by activating N for again at least one period D.
    If the module is triggered a second time during this stage, indicating an invalid loop in the system architecture, this module must abort this stage.
4. termination of this stage
  There are two ways this stage can be terminated: either it is completed correctly, or it is aborted.
  While any module can abort this stage, only the master (the initiator) can complete it successfully.
  - completion
    The stage is completed successfully if the signal is propagated all the way through the circular daisy-chain and the master module receives an activation of its P signal and S becomes is inactive as soon as the master deactivates it (all modules have participated in the procedure).
    All modules need to wait one more period T after the deactivation of S to make sure no timeouts occurred and no abort message was emitted.
    In this case, all nodes adopt their ID and can use it for later identification.
    If an abort message was received at any time during this stage, however, the whole procedure is aborted (see below).
  - abortion
    The stage is aborted, whenever an abort message was received or invalid behavior has been detected (see above).
    As a result, a unique abort message is broadcasted voa B by all modules that detected an issue.
    In this case, all module IDs must be considered unreliable, thus identification is not supported.
    Any modules that still activate S must hence deactivate it and as soon as S becomes inactive, all modules may continue operation.
5. rearranging module identifiers [optional]
  This sub-stage is an again optional extension to the already optional module identifier assignment stage.
  As soon as S became inactive after a successful completion of the assignment procedure, all modules can request ID swaps of any two modules by sending according messages via B.
  The addressed modules both have to confirm (or reject) the request and may adapt the new IDs only when the second confirmation was received.
  On any message, the timeout interval T is restarted.
  Furthermore, there must be no parallel swap requests, hence whenever a request has been sent, no module must send another request until both addressed modules answered the request.
  If any module in the system reads an invalid communication (e.g. a different module confirms than was requested or a parallel request was detected), or detects a timeout, it must send an abort message via B, invalidating all module IDs in the system.
  In any case, this stage has to be implemented with care, since it may result in an invalidation of all already assigned module IDs and ending up in a livelock is possible.
  However, with the default assignment procedure, the hierarchy described by the module IDs will be defined by physical properties like arrangement and wiring (e.g. depth-first or breadth-first hierarchy for tree-like architectures).
  In case a different hierarchy is desired, the swapping mechanic allows to do so.
  In order to reduce risk of errors, it is recommended that modules monitor the back channel of the daisy-chain signal and - if applicable - only superior modules send swap requests to inferior ones.
  Anyway, each swap request procedure is defined as follows:
  1. sending a request
    A module sends a request message (via B), specifying the two modules to swap their IDs by naming their current IDs.
    If the module itself is one of those, it has to confirm the request nevertheless as described in the next step.
  2. confirmation/rejection
    When a module receives a swap request that contains its own ID, it has to confirm or reject the request.
    Even if the other module (the one to swap the ID with) already rejected, this module has to send an according message nevertheless.
  3. swapping IDs
    The IDs are swapped only when the second module confirmed the request.
    In other words, both modules reassign their IDs right after the second confirmation was transmitted via B.

At the end of the startup phase all signals - S, PD, N, and P - are inactive.
Note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S, N or P.
However, such a module might cause errors after the startup phase, if it does not receive crucial information because communication is not set up (e.g. stage 3 might fail).

Operation Phase¶

Modules are kept in sync during operation by toggling either S or a dedicated clock signal C (defined by the implementation) at frequency F.
Hence, all modules must act as slaves and there may only be one (or none) master node.
Since S gets activated when a shutdown is initialized (see Shutdown Phase), modules must only synchronize at deactivation (falling edge) of S.
Since it is recommended to use a hardware timer to toggle the synchronization signal, a dedicated clock signal C might reduce complexity, but using S for that purpose in general is also possible without compromises.

Note that this whole phase is optional, since there may be no master node at all.
Further note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S or C respectively.
However, such a module might run out of sync which again may cause errors during operation.

Shutdown Phase¶

Since the PD signal must not be used during system operation, it is defined to be inactive.
The state of S is undefined, because it may be used for synchronization during operation.
Any module can initiate the shutdown phase by activation of PD.
Via S a regular system shutdown (active) or an emergency stop (inactive) can be selected.
Hence, modules which do not support the protocol but interfere with PD are assumed to be defective and thus will initiate an emergency shutdown rather than a regular one.
All modules (including the initiating one) must then execute the following steps as soon as the activation of PD is detected:

selection of shutdown mode
Obviously, the module which acted as master node during operation must stop toggling S as soon as PD is activated and deactivate it.
Since the value of S is undefined until the master node of the operation phase reacts on the PD signal, all modules must apply a delay of one period D before evaluating S in order to distinguish between regular or emergency shutdown.
If an emergency stop was requested, all modules must stop as fast as possible and enter a safe state (e.g. cut supply power).
The following stages thus only apply for the regular, controlled shutdown case.
shutdown of high-level operation
1. shutdown of high-level systems (e.g. applications and operating system)
  After another delay of one period D, all modules activate S again.
  Only now each module stops all computation in a safe manner, so it can be shut down without data loss or other issues.
  In order to ensure that every module had a chance to detect the activation of S, this step must take at least one period D.
  Each module deactivates S again, as soon as high-level shutdown is completed.
  The initiating module can select between system shutdown or restart: Keeping PD active indicates a shutdown request, deactivation of the signal before S is deactivated indicates a restart request.
2. waiting for synchronization
  Each module waits for S to become inactive (all modules are done).
system shutdown or restart
1. evaluation of PD
  When S becomes inactive, the state of PD indicates whether the system shall shutdown or restart.
  Hence, the initiating module, which activated PD, must have set it to the according state before it deactivated S.
2. disambiguation procedure
  Since there may be not one, but multiple ways to shutdown/restart the system, this ambiguity is resolved in the following procedure.
  The requirement for this to work is that the identifiers, which encode the exact shutdown/restart procedure to be executed (see below), must be non-ambiguous.
  These identifiers, however, are implementation specific and are not defined by SSSP.
  1. serial broadcast of identifier
    The module which initiated the shutdown/restart phase broadcasts an arbitrary number of 'pulses' via S.
    Each pulse is defined to start with S deactivated, activates it for at least one period D, and deactivates the signal again for at least another period D.
    All modules can count the number of pulses, which encodes the exact shutdown/restart procedure to be used.
    Note that S must be inactive for at least one period D before the first pulse (after PD was evaluated).
  2. termination of the serial broadcast
    The broadcast is terminated by a timeout T since the last change of S from active to inactive state.
    This timeout also applies if no pulse was sent at all, which corresponds to the identifier 0.
    Thus, this identifier is reserved for the special case, that the ambiguity is not resolved and all modules shall execute their default shutdown procedure.
3. final shutdown or restart
  Depending on the evaluation of PD and the result of the disambiguation procedure, each module reacts accordingly.
  - shutdown
    Each module completely stops itself and enters low-power mode.
    The details (e.g. which signals and sensors are still active) depend on the result of the disambiguation procedure and are implementation specific.
  - restart
    If a restart was requested, each module starts with the first step of the startup phase.
    The details (e.g. which sensors are kept active) depend on the result of the disambiguation procedure and are implementation specific.
    In order to minimize risk of errors, all modules can power off, except for a master node, which resets the whole system and forces a clean startup.

Again, a module which does not implement the protocol will cause no errors as long as it does not activate S or PD.
However, if such a module has its own power supply and does not enter low-power mode, it will unnecessarily draw energy and might not end up in a defined state as the rest of the system.
Most importantly, the latter might result in corruption of system operation if the not-defined state of modules that do not implement SSSP causes unwanted side effects like stalled communication buses or duplicate module IDs.

AMiRo-OS

Wiki

Startup Shutdown Synchronization Protocol (SSSP v2.0)¶

Startup Phase¶

Operation Phase¶

Shutdown Phase¶