Implementation Logic
Possible Interruptions
Network issues causing repeated LLM API call failures.
Parsing errors during action execution.
Manual interruptions (e.g., Ctrl-C).
Serialized Storage Structure
All serialized data is stored in an integrated JSON file to minimize future changes to the storage structure. Example:
./workspace
storage
team
team.json # Contains team, environment, roles, actions, etc.
Recovery Scenarios
Role A fails to select an action:
Upon re-execution, Role A observes the unprocessed message and retries the action.
Role B resumes processing based on the updated message and continues.
Role B fails during action execution:
Upon recovery, Role B resumes directly from the failed action.
Key Recovery Steps
Re-execute from the Message before the interruption:
Messages act as communication bridges between roles. To resume execution, interrupted messages must be reloaded and processed.
Re-execute from the Action before the interruption:
Actions are granular execution units. Recovery ensures actions resume at the exact point of failure.
Result: Breakpoint Recovery Workflow
Command to enable recovery:
strataai "xxx" --recover_path "./workspace/storage/team"
Serialized data is stored in ./workspace/storage/team
by default.
Example Execution Flow:
A test case simulates an interruption and recovery:
Execution case: test_team_recover_multi_roles_save - RoleB fails at ActionRaise. - Upon recovery, RoleA and completed actions of RoleB are skipped. - RoleB resumes from ActionRaise.
Recovery Logs:
2023-12-19 10:26:02.476 | DEBUG | strataai.environment:publish_message - publish_message: {...}
2023-12-19 10:26:12.518 | INFO | strataai.roles.role:_act - RoleB ready to ActionRaise
2023-12-19 10:26:12.519 | ERROR | strataai.utils.utils:wrapper - Exception occurs, serialize the project...
2023-12-19 10:26:12.517 | DEBUG | strataai.roles.role:_observe - RoleB resumes execution from ActionRaise
Last updated