Record and Replay ADR
- Lenny Goodell (Intel)
- Approved (2022-10-10)
Referenced Use Case(s)
This design involves creating a new Application Service that is responsible for the requirements in the above referenced UCR. This document is created as a means of formal design review.
A new Application Service will be created with a RESTful API to handle the Record, Replay, Export and Import capabilities. An Application Service has been chosen since the Record capability requires a service that can connect to the MessageBus and consume Events over a long period of time (just like other App Services). The service will not create or start a Functions Pipeline on start-up as normally done in Application Services. It will wait until the Record request has been received. Once the recording is complete the Functions Pipeline will be stopped.
Application Services do not receive data when the Functions Pipelines are stopped.
POST API will start recording data as specified in the request Data Transfer Object (DTO) defined below. The request handler will validate the DTO and then create a new Functions Pipeline and Start the Functions Pipeline to process incoming data. An error is retuned if a recording is already in progress.
The Functions Pipeline will contain the following pipeline functions in the following order
- Filter functions if filtering is needed, configured based on the DTO parameters.
- Batching pipeline function configured based on the DTO parameters. This will be used to control the record duration/count.
- New pipeline function written to process the batched data once the batching threshold has been exceeded. This function will simply send the recorded data to an async function for processing.
The async function receiving the data will first stop the Functions Pipeline and then save the data for later replay and/or export. It will also determine the list of unique Device Profile and Device Names from the data and store them along side the recorded data. Since app services can receive Events out of order per their timestamps, the saved Event data must be sorted by the Event timestamps. All data will saved in in-memory storage.
Starting a new recording will overwrite any previous recorded data.
Record Request DTO
Time duration in which to record data. Required if Event Limit is not specified.
Events to record. Required if Duration is not specified
Include Device Profile Names
Optional list of Device Profile Names to filter for
Include Device Names
Optional list of Device Names to filter for
Exclude Device Profile Names
Optional list of Device Profile Names to filter out
Exclude Device Names
Optional list of Device Names to filter out
DELETE API will cancel current in progress recording. An error is returned if a recording is not in progress.
GET API will return the status of Record. If Record is not active the status will be for the last Record session that was run. The API response will be the following DTO:
Record Status DTO
Boolean indicating if Record is in progress or not.
Count of Events that have been captured. 0 if not running and no past Record has been run.
Duration that the recording has been active. 0 if not running and no past Record has been run.
POST API will start replaying the recorded data as specified in the request Data Transfer Object (DTO) defined below. An error is retuned is there is already a replay session in progress. The request handler will validate the DTO and that the appropriate Device Profiles and Devices from the data exist. It will then start an async Go function to handle the replay so the request doesn't timeout on long replays.
The replay async Go function will use the Background Publishing capability to send the recorded Events to the EdgeX MessageBus using the same publish topic scheme used by Device Services, which is
edgex/events/device/<device-profile-name>/<device-name>/<source-name>. The App SDK has the Publish Topic Placeholders capability built-in to facilitate this. The data for these topics is available from the Event DTO. The timestamps in the Events and Readings published will be set to the current date/time. This requires a copy be made of the Event/Readings as they are published in order to not corrupt the original data.
Once the first event is published the replay function will calculate the wait time to use before sending the next Event from the recorded data. This will be based on the time difference from the original timestamp of the previous event published and the timestamp of the next event multiplied by the inverse of the
Replay Rate specified in the request DTO.
Examples - Replay Rate wait time calculation
Delta time between original Events is 800ms
Replay rate is 2.0 (100% faster) making wait time 400ms (800ms * (1 / 2.0))
Replay rate is 0.5 (100% slower) making wait time 1600ms (800ms * (1 / 0.5))
The replay function will repeat publishing the recorded data per the
Repeat Count in from the DTO.
Replay Request DTO
Required rate at which to replay the data compared to the rate the data was recorded. Float value greater than 0 where 1 is the same rate, less than 1 is slower rate and greater than 1 is faster rate than the rate the data was recorded.
Optional count of number of times to repeat the replay. Defaults to 1 if not specified or is set to 0.
DELETE API will cancel current in progress replay. An error is returned if a replay is not in progress.
GET API will return the status of Replay. If Replay is not active the status will be for the last Replay that was run. The API response will be the following DTO:
Replay Status DTO
Boolean indicating if a Replay is in progress or not
Count of Events that have been replayed. 0 if not running and no past Replay has been run.
Duration that the Replay has been active. 0 if not running and no past Replay has been run.
Count of repeats. Value indicates the Replay in progress or competed. 0 if not running and no past Replay has been run.
Download endpoint (Export)
GET API will request that the previously recorded data be exported as a file download. It will accept an optional query parameter to specify compression (NONE, ZIP or GZIP). An error is returned if no data has been recorded or invalid compression type requested.
The file content will be the Recorded Data DTO as define below. The request handler will build the DTO described below by extracting the recorded
Events from in-memory storage, pulling the referenced
Device Profiles and
Devices from Core Metadata using the names from in-memory storage. The file extension used will be
.gzip depending on the compression selected.
Recorded Data DTO
Readings) that were recorded
Device Profiles (complete profiles) that are referenced in the recorded
Device defintions that are referenced in the recorded
Upload endpoint (Import)
POST API will upload previously exported recorded data file. It will accept an optional Boolean query parameter to specify to not overwrite existing Device Profiles and/or Devices if they already exist. Default is to overwrite existing with those captured with the recorded data.
The request handler will receive the file as a Recorded Data DTO described above and detect if it is compressed and un-compress the contents if needed before un-marshaling the JSON into the DTO. The compression will be determined based the
Content-Encoding from the request header. The
Event data from the DTO will then be saved to the in-memory storage along with the Device Profile and Device Names. The
Device Profiles and
Devices will be pushed to Core Metadata if they don't exist or if overwrite is enabled.
Import will overwrite any previous recorded data.
- The above design is for a crawl implementation. Walk/run level enhancements can be added once there is usage and feedback. One obvious area would be the storage for the recorded data.
- Only one recorded data set may be held in memory for replay, recorded or imported.
- The whole data set is replayed. Can not specify to replay data for specific Devices within the larger data set.
- Wait times simulating rate of Events published will not be perfect since dependent on non-Realtime OS.
- Using a CLI approach rather than RESTful API has been suggested. A CLI would have to duplicate the long running service needs in the background which is not normal for a CLI to do directly.
- The EdgeX CLI could be updated in the future to make the RESTful API calls to this service as it does for the other EdgeX services
- The EdgeX UI could be updated in the future to have a tab for controlling this service as it does for the other EdgeX services
Implement this design as outlined above using a RESTful API and in-memory storage
Other Related ADRs