Evaluate Future ARC Architecture Based on Predecessors¶
Context and Problem Statement¶
This ADR is about finding the right architecture for the ARC suite of services based on the knowledge gained during the internal predecessors of this project.
Glossary¶
Order: Represents an order of one or more artifacts. Tracks generation numbers of referencedEndpointsandSecretsto detect changes and triggerArtifactWorkflowreconciliation for idempotency.ArtifactWorkflow: Represents a single artifact to be processed as part of anOrder. Minimal resource containing artifact type reference and source/destination endpoint references. Replaces legacyOrderArtifactWorkflow.ArtifactType: Defines rules, defaults, and theWorkflowTemplatereference for a specific artifact type like 'OCI'. Replaces legacyOrderTypeDefinition.Endpoint: General term for source or destination. Can be a source or destination for artifacts. Includes optional credentials to access it. Changes toEndpointor its referencedSecrettriggerOrderreconciliation.WorkflowTemplate: Argo Workflows, see https://argo-workflows.readthedocs.io/en/latest/fields/#workflowtemplateWorkflow: Argo Workflows, see https://argo-workflows.readthedocs.io/en/latest/fields/#workflowARC API Server: A Kubernetes Extension API Server which handles storage of ARC APIARC Controller Manager: A set of Kubernetes Controller which reconcilesOrders, creates/updatesArtifactWorkflowresources fromOrderartifacts and instantiatesWorkflowresources for eachArtifactWorkflow. TracksEndpointandSecretgenerations to ensure idempotency.
Considered Options¶
Classic Kubernetes Operators¶
- CRDs are used to interact with ARC via the Kubernetes API Server.
- Several operators come into play which reconcile the different custom resources.
- A sharding mechanism is implemented to be able to scale the workers horizontally and give every worker a given chunk of resources to reconcile.
Pros¶
- CRDs and Kubernetes are relatively simple to implement
Cons¶
- Storage may be limited by
etcdand can bring the control plane of Kubernetes into trouble if too many resources are present - The necessity to implement sharding may be hard work and error prone
- Thus said it may not scale in way necessary for such a solution
Extension API Server and CNCF Landscape Tooling¶
Flavor A¶
Flavor uses several controllers to handle different parts of the API. Orders are reconciled and converted to hydrated Orders which contain the source and destination information along with the artifact to process. This information are published to some message queue which is subscribed by workers working on that queue. Optionally KEDA is used to scale, handle quotas and fairness. The database to be used for the API Server is etcd.
Flavor B¶
Same as Flavor A except the database for the API Server is something like Postgres instead of etcd. Additionally the option is considered to let the controller access the Postgres directly to reconcile without using the Kubernetes API Server.
Flavor C¶
Same as Flavor B except that no message queue is used but the job queue is stored directly in the Postgres database.
Flavor D¶
Same as Flavor A except that no message queue is used.
The Order controller creates hydrated orders directly in etcd as readonly resource which is then consumed by workers directly.
This approach needs some kind of sharding mechanism to have the workers to know which shard of resources they need to handle.
Flavor E¶
This flavor is the most compelling due to the reduced amount of code that is necessary to bring this solution to live. etcd is used as storage for the API server. Argo Workflows is used to build workflows which do the steps necessary to process one artifact. Kueue can be used to bring fairness, scaling, quotas into play which workflows. The order controller creates "jobs" which are actually Argo Workflows.
This option is described in detail in the following document.
Technology¶
- Instead of CRDs,
ARCuses an Extension API Server via the Kubernetes API Aggregation Layer to handle API requests. - This gives it the possibility to use a dedicated
etcdor a even more suitable storage backend for the high amount of resources and status information in case this is necessary. - While
etcdstill can be used as storage backend, it is one separated from theetcdused by the Kubernetes control plane and reduces the risk of bringing the whole cluster into trouble. - As the storage implementation is under our control, we can implement a custom storage interface or use
kinddown the line. - Additional links
- https://github.com/kubernetes-sigs/apiserver-runtime
- https://github.com/kubernetes/sample-apiserver/tree/master
- Utilize Argo Workflows to handle the workflows necessary to process different artifact types
- Optionally use Kueue to handle quotas and enhanced scheduling
- Namespaces are used to separate resources in a multi-tenant environment.
Architecture Diagram¶
Overview Diagram
API Concept Diagram
---
config:
layout: elk
---
flowchart LR
subgraph subGraph0["Declarative Layer"]
Order["Order (User Input)"]
Spec["spec:<br>- defaults<br>- artifacts[]<br>--- <br>status:<br>- endpointGenerations<br>- secretGenerations"]
end
subgraph subGraph1["Generated Layer"]
ArtifactWorkflow1["ArtifactWorkflow-1<br>(Minimal Params)"]
ArtifactWorkflow2["ArtifactWorkflow-2<br>(Minimal Params)"]
ArtifactWorkflowN["ArtifactWorkflow-N<br>(Minimal Params)"]
end
subgraph Configuration["Configuration & Secrets"]
ArtifactTypeDef@{ label: "ArtifactType (e.g., 'oci')" }
EndpointSrc["Endpoint (Source)"]
EndpointDst["Endpoint (Destination)"]
EndpointSecret@{ shape: procs, label: "Secrets (e.g. Credentials)"}
end
subgraph Execution["Execution"]
WorkflowTemplate["WorkflowTemplate (Argo)"]
WorkflowInstance1["Workflow Instance"]
WorkflowInstance2["Workflow Instance"]
end
Order -- contains --> Spec
Spec -- generates --> ArtifactWorkflow1 & ArtifactWorkflow2 & ArtifactWorkflowN
Spec -- uses type --> ArtifactTypeDef
Spec -- reads & tracks generations of --> EndpointSrc & EndpointDst & EndpointSecret
ArtifactWorkflow1 -- workflowTemplateRef --> WorkflowTemplate
ArtifactWorkflow2 -- workflowTemplateRef --> WorkflowTemplate
ArtifactWorkflow1 -- srcRef --> EndpointSrc
ArtifactWorkflow2 -- srcRef --> EndpointSrc
ArtifactWorkflow1 -- dstRef --> EndpointDst
ArtifactWorkflow2 -- dstRef --> EndpointDst
EndpointSrc -- secretRef --> EndpointSecret
EndpointDst -- secretRef --> EndpointSecret
ArtifactTypeDef -- workflowTemplateRef --> WorkflowTemplate
ArtifactWorkflow1 -- references --> EndpointSecret
ArtifactWorkflow2 -- references --> EndpointSecret
ArtifactWorkflow1 -- instantiates --> WorkflowInstance1
ArtifactWorkflow2 -- instantiates --> WorkflowInstance2
WorkflowInstance1 & WorkflowInstance2 -- uses --> WorkflowTemplate
WorkflowInstance1 -- mounts --> EndpointSecret
WorkflowInstance2 -- mounts --> EndpointSecret
ArtifactTypeDef@{ shape: rect}
The solution shows the ARC API Server which handles storage for the custom resources / API of ARC.
etcd is used as storage solution.
Order Controller is classic Kubernetes controller implementation which reconciles Orders and monitors Endpoints and Secrets. In a similar fashion the ArtifactWorkflow Controller reconciles ArtifactWorkflows and instantiates and tracks Argo Workflows based on the ArtifactType. Both controllers are contained in the ARC Controller Manager.
An Order contains the information what artifacts should be processed and tracks generation numbers of referenced Endpoints and Secrets to detect changes.
An Endpoint contains the information about a source or destination for artifacts.
The Order Controller creates ArtifactWorkflow resources which are minimal, idempotent representations of artifacts to be processed.
When an Endpoint or referenced Secret changes (detected via generation tracking), the controller automatically recreates the rendered configuration and triggers ArtifactWorkflow recreation.
An ArtifactType specifies the processing rules and workflow templates for artifact types (e.g. oci, helm).
Pros¶
- Using dedicated
etcddoes not clutter the infra etcd - Storage can be changed later on if necessary
- Keep the declarative style of Kubernetes while having complete freedom on the API implementation
- Argo Workflows allows us to focus on the domain of the product without reinventing the wheel
- Qotas and Fairness easy without writing code via Kueue
- Idempotency: Generation tracking ensures configuration changes are reliably detected and handled without race conditions
- Minimal Resource Overhead: Small
ArtifactWorkflowresources with externalized configuration reduce API load and enable efficient scaling
Cons¶
- Building addon apiservers directly on the raw api-machinery libraries requires non-trivial code that must be maintained and rebased as the raw libraries change.
- Steep learning curve when starting the project and steeper learning curve when joining the project.
- Generation tracking adds complexity to the reconciliation logic that must be carefully maintained
Decision Outcome¶
Chosen Option: Solution E.
Because the solution is the one that provides the most flexibility while the necessity to write own code for many parts is minimized. The flexibility comes from utilizing the CNCF projects Argo Workflows and Kueue for building the workflow engine. The project itself can focus on the order process and the handling of endpoints.