Migrate from RunsOn v2 to v3

Breaking changes, replacements, and rollout guidance for upgrading an existing RunsOn v2 installation to v3.

RunsOn v3 is a real breaking release. Treat it like a migration, not a routine stack update.

If you are already on v2, the right way to approach v3 is:

  1. Read the current v3 docs before changing production.
  2. Inventory any v2-only parameters, outputs, labels, and pool config you still rely on.
  3. Stand up a test or parallel v3 environment.
  4. Move representative workflows first.
  5. Keep the curated v2 reference pages open only for historical semantics while you migrate.

Highest-risk changes

These are the changes most likely to break an existing v2 install if you update casually:

  • CloudFormation is now GitHub.com-only. GithubEnterpriseUrl is gone from the built-in template. GHES installs must move to Terraform / OpenTofu and set github_enterprise_url there.
  • CloudFormation no longer supports reusing an external VPC. The built-in template always uses the embedded networking path in v3. If you need an existing VPC, use Terraform / OpenTofu.
  • AppSize replaces old app tuning knobs. AppCPU, AppMemory, and AppEc2QueueSize are gone. Pick a single app-size preset instead.
  • AppBudgetDailyUsd replaces the old daily minutes alarm. Budgeting is now based on AWS daily cost in USD, filtered by your RunsOn cost-allocation tag.
  • EC2InstanceCustomPolicy was renamed to RunnerCustomPolicy.
  • disk=... is no longer a compatibility path you should depend on. Replace it with explicit volume=... values.

Migration matrix

Areav2 patternv3 replacement
CloudFormation sizingAppCPU, AppMemory, AppEc2QueueSizeAppSize
CloudFormation budget/alarmAppAlarmDailyMinutesAppBudgetDailyUsd
CloudFormation runner IAM policyEC2InstanceCustomPolicyRunnerCustomPolicy
CloudFormation GHESGithubEnterpriseUrlMigrate to Terraform/OpenTofu with github_enterprise_url
CloudFormation networkingNetworkingStack=external plus ExternalVpc*Use Terraform/OpenTofu for existing VPCs
CloudFormation VPC tuningVpcCidrBlock, VpcEndpoints, NatGateway*, flow-log paramsBuilt-in networking is now fixed; no per-stack tuning
Stack-level SSH adminsDefaultAdminsUse SSM for privileged instance access
Legacy disk compatibilitydisk=default, disk=large, stack default disk knobsExplicit volume=...
Prometheus metrics endpointServerPassword and /metrics scrapingOTLP export with OtelExporterEndpoint and OpenTelemetry
Permission boundaries in CloudFormationDefaultPermissionBoundaryArnManage outside CloudFormation or use Terraform permission_boundary_arn

Control-plane cost estimates

The table below estimates the v3 control-plane cost for a few migration scenarios in us-east-1. It is not a full CI cost forecast: runner EC2 instances, cache/storage usage, CloudWatch log ingestion, NAT gateways, WAF, VPC endpoints, and data transfer are excluded.

Assumptions:

  • 30-day month.
  • ~4 GitHub webhooks received per job on average.
  • One always-on Flex ECS/Fargate task (arm64 task).
  • small for 1,000 jobs/day, medium for 10,000 jobs/day, and xhigh for 50,000 jobs/day and above.
  • Public ingress uses API Gateway REST API plus the 256 MB Lambda path.
  • Lambda estimate assumes 100 ms average duration per webhook.
  • Queue estimate assumes 3 standard SQS API requests per webhook.
  • Unit prices are current public on-demand prices from the AWS Fargate, API Gateway, Lambda, and SQS pricing pages.
Average jobs/day Webhooks/month Example app size ECS/Fargate API Gateway Lambda ingress SQS Estimated monthly control plane
1,000 120,000 small $7.11 $0.00 $0.00 $0.00 $7.11
10,000 1,200,000 medium $7.11 $0.70 $0.04 $1.04 $8.89
50,000 6,000,000 xhigh $14.22 $17.50 $1.00 $6.80 $39.52
100,000 12,000,000 xhigh $14.22 $38.50 $2.20 $14.00 $68.92

Treat these as planning numbers. The actual bill will move with webhook volume, logging verbosity, runner runtime, cache behavior, and any networking choices you add around the default stack.

What changed in CloudFormation

Removed parameters

These CloudFormation parameters were removed in v3 and must be deleted from any saved stack-update workflow:

  • GithubEnterpriseUrl
  • ECInstanceDetailedMonitoring
  • VpcCidrSubnetBits
  • VpcFlowLogFormat
  • VpcFlowLogS3BucketArn
  • VpcFlowLogRetentionInDays
  • RunnerLargeDiskSize
  • RunnerLargeVolumeThroughput
  • RunnerDefaultDiskSize
  • RunnerDefaultVolumeThroughput
  • DefaultAdmins
  • ServerPassword
  • AppEc2QueueSize
  • AppCPU
  • AppMemory
  • NetworkingStack
  • ExternalVpcId
  • ExternalVpcPublicSubnetIds
  • ExternalVpcPrivateSubnetIds
  • ExternalVpcSecurityGroupId
  • VpcCidrBlock
  • DefaultPermissionBoundaryArn
  • AppDebug
  • EnableDashboard
  • Ec2LogRetentionInDays
  • SqsQueueOldestMessageThresholdSeconds
  • AppAlarmDailyMinutes
  • VpcEndpoints
  • NatGatewayAvailability
  • NatGatewayElasticIPCount
  • AlertTopicSubscriptionHttpsEndpoint

Renames and replacements

  • Replace EC2InstanceCustomPolicy with RunnerCustomPolicy.
  • Replace old app CPU/memory/queue tuning with AppSize.
  • Replace AppAlarmDailyMinutes with AppBudgetDailyUsd.

Fixed built-in behavior in v3

The built-in CloudFormation path is simpler and less tunable now:

If you still need any of the removed tuning surface or infrastructure flexibility below, switch that install to the Terraform / OpenTofu module instead of trying to preserve the old CloudFormation shape.

  • Embedded networking is always used.
  • The embedded VPC CIDR is fixed to 10.1.0.0/16.
  • The built-in topology is fixed to two AZs.
  • Only the free S3 gateway VPC endpoint is created by the built-in template.
  • EC2 and ECR interface VPC endpoints are no longer created by CloudFormation. Use Terraform/OpenTofu or external infrastructure if you need those PrivateLink endpoints.
  • Built-in VPC flow-log tuning is gone.
  • Built-in CloudWatch dashboard creation is always on.
  • EC2 instance log retention is fixed at 7 days.
  • Built-in SQS queue-age alarms are gone.
  • The required EmailAddress path is always used directly.
  • Additional HTTPS SNS subscription wiring is no longer created from template input.
  • EnableAdminRoutes controls whether the public admin/setup routes are exposed.

Outputs that disappeared

If you automated around these outputs in v2, update that automation before migrating:

  • RunsOnVpcCidrBlock
  • RunsOnPublicRouteTableId
  • RunsOnPrivateRouteTable1Id
  • RunsOnPrivateRouteTable2Id
  • RunsOnBootstrapTag
  • RunsOnPrivate
  • RunsOnService
  • RunsOnProvisioningTable

EphemeralRegistryUri was renamed to RunsOnEphemeralRegistryUri.

What changed in Terraform / OpenTofu

Terraform moved to the same simpler model:

  • New module consumption should use the explicit Flex submodule source: runs-on/runs-on/aws//flex.
  • app_size replaces app_cpu, app_memory, and ec2_queue_size.
  • app_budget_daily_usd replaces the daily minutes alarm path.
  • detailed_monitoring_enabled, default_admins, legacy disk defaults, app_debug, enable_dashboard, queue-age alarms, optional HTTPS alert subscription, and s3_encryption_key_id are gone.
  • GHES, existing VPCs, and IAM permission-boundary cases remain supported there, which is why Terraform is now the escape hatch for installs that outgrow the built-in CloudFormation path.
  • public_ingress_web_acl_arn is the Terraform path for a user-managed public ingress Web ACL.
  • Root outputs are grouped by subsystem. Expose values such as module.runs_on_flex.ingress.url and module.runs_on_flex.stack.getting_started from your own root module if you want them printed after terraform apply.
  • Cache storage now uses SSE-KMS with the AWS-managed S3 key. For EBS encryption with customer-managed KMS keys, make sure the key policy trusts the generated RunsOn service role.

Runtime behavior changes

Most v3 runtime behavior is compatible with v2 workflow labels, but a few operational details changed:

  • Completed runner instances are finalized faster, and housekeeping turns runs-on-terminate=true into EC2 termination more quickly.
  • Fresh queued workflow jobs are launchable immediately once RunsOn observes them. Delayed launch timestamps are now reserved for retry and recovery paths.
  • Counted launch retries now use staged backoff: 45s, 2m, 5m, 10m, and 20m, with the sixth counted failure becoming terminal.
  • Manual GitHub reruns no longer force on-demand capacity by themselves. They follow the normal spot/on-demand policy unless RunsOn itself triggered the rerun after a spot interruption.
  • Inline OTEL job summaries graph disk and network counters as per-interval rates, making short spikes easier to see.
  • The scrape-based Prometheus integration is removed. Current v3 installs do not expose /metrics; use OTLP export through OtelExporterEndpoint instead.

Repo config and label changes

Replace disk=... with volume=...

Do not carry forward legacy disk compatibility assumptions into v3.

  • disk=default no longer means “use the stack default disk size”.
  • disk=large no longer means “use the legacy large disk preset”.
  • .github/runs-on.yml may still contain disk, but RunsOn now warns that it is deprecated and ignored.

Use explicit volume values instead:

  • disk=large -> volume=80gb
  • custom size/perf -> volume=80gb:gp3:1000mbps:4000iops

Stack-level SSH admins are gone

If you relied on DefaultAdmins, move your privileged access workflow to SSM. Repository-level admins in .github/runs-on.yml still exist, but they are no longer layered on top of a stack-wide default-admin list.

Suggested rollout

For most teams, a blue-green migration is safer than an in-place update:

  1. Deploy a fresh v3 stack.
  2. Register the new GitHub App and give it access to the repositories you want to test.
  3. Suspend the old GitHub App.
  4. Pause the old App Runner service.
  5. Test representative workflows against the new stack.

If everything works as expected, keep the old stack around briefly as a rollback point, then delete it. If you need to roll back, suspend the new GitHub App, delete the new stack, unsuspend the old GitHub App, and resume the old App Runner service.

Treat RunsOn stacks as cattle, not pets. The clean migration path is to create a new stack, test it, and delete the old one once you trust the replacement.

Use the installation guide, stack configuration, job labels, and repo config pages as the v3 source of truth during that process.

What to verify after cutover

  • Jobs launch on the expected environment and runner family.
  • Any previous disk=... usage has been replaced with volume=....
  • Any CloudFormation automation no longer expects removed parameters or outputs.
  • Custom AMI, registry, EFS, and OTEL behavior still matches your expectations.
  • Cost monitoring now uses AppBudgetDailyUsd and your chosen cost-allocation tag is activated in AWS Billing.

Need the old docs?

The curated RunsOn v2 reference pages stay available for teams that need the historical semantics of v2 parameters, outputs, and labels while they migrate.