We at Togai use GitHub for source code management. We heavily use the Pull Request (PR) feature. PRs are kind of the main and only feature we use in GitHub. The next feature we have heavily started to use is GitHub Actions :D
We are still at a nascent stage in terms of Continuous Integration (CI) and Continuous Delivery (CD). I'm looking forward to the time when we reach Continuous Deployment for some of our services - where a git push
- pushes the code, triggers build and runs all the checks (all kinds of tests) and then automatically deploys the code to production, all automatically :D
I'm not going to go into the details of why CI/CD etc. This post is about how we chose GitHub Actions as our CI/CD platform and use it as much as possible as any code-related automation platform to run code for automation
In the past I have used quite some CI/CD systems, not all of them to the same level, but to a good extent like running basic jobs. Some of the CI/CD systems I have used are GitLab CI/CD, CircleCI, Travis CI, Jenkins, GoCD, Prow CI/CD and many internal build systems for example a few in my previous stint at VMware
The thing is - I wanted a good CI/CD system - that provides a good experience and has all the basic features that anyone would want and even a few complex features. Since we already use GitHub as our source code management platform, I chose GitHub Actions and quickly implemented it - it didn't take much time to write the workflow (pipeline) YAML files (config files) - which was mostly the same for most of our services. I later optimized the workflow YAML files to reduce duplication and reuse code, and GitHub has features to do that too (Reusing workflows). Anyway, the experience was great - implementing it and using it - it was a breeze until someone sent me this picture on Slack -
The developers were saying that GitHub Actions was not working anymore. I realized that I had to face the problem that I put away in the back seat 💺 long ago. So, at Togai we use a TogaiHQ GitHub Organization and we have many private git repositories under it. GitHub Actions is free and unlimited for open-source repositories (public repositories) only and not for private repositories under GitHub Organizations (because - of course, they have to earn 😅). We at Togai are still using the GitHub Free plan only. This is because we are still small and have no need to move to a paid plan since we don't want those paid features for now and can live without them. Also, when you are running a business, a newly funded startup, you have very limited funding and we all have limited resources (time, money, energy) in general, so you try to optimize and reduce wastage in general. So, we are also trying to be very frugal, or else we will be burning 🔥 too much money 💰 💴 💵 💶 💷 💸 🤑 when we should be trying to get more customers and make lots of profit.
So yeah, GitHub Free Plan, along with private Git repositories under our GitHub Organization. Based on the current GitHub Actions pricing, we have 2000 free minutes per month across all the private repositories under the GitHub Organization. 2000 free minutes is around ~33.33 free hours. It was good enough for us for some time - even when we were pushing to different branches a lot of times, and GitHub Actions Workflows get triggered on push
to some git branches, like main
and push
to PR. And PRs get updated so often and they could also get overwritten (git push --force
), which will all trigger runs in GitHub Actions. I remember one of our new joiners, when they were an intern, said "Maybe we should not run for PRs. We just have 2000 minutes 😅". And he was right - as in - running for PRs is a high cost, given the number of changes happening in PRs. But the thing is this - it's an important thing to do. One cannot always expect developers to run build, test, lint etc. locally before pushing to the PR branch. For people who do it, great, for others, their code has to be reviewed with some basic reviews, and this is also for people who missed running tests by mistake. Also, running in GitHub Actions managed Runners would mean pristine environments. Okay, we are now entering the territory of why CI/CD systems are a good idea 💡 compared to doing the build, test, release, deploy etc. all in the local system
Let's get back to the main thing - so, running workflows for PRs was important. And the constraint was that we had a fixed limit on the number of minutes of workflow run. Maybe one optimization that can be done is that just before someone approves the PR, the PR's latest code should have gone through the workflow (test etc.) at least once and passed. Maybe this can be introduced as a manual trigger rather than an automated trigger after every git push
which is causing a lot of workflows to run. This is a small optimization one can do. Kubernetes project's GitHub repositories, for example, use the Prow CI/CD system which has a way to provide commands to it through GitHub comments with some special characters that are treated as commands to Prow and they use this to run tests - through a manual trigger. See this here for an example
Anyways, so, that's one idea 💡 for the future - to reduce the number of workflow runs and hence to reduce the number of minutes used for workflow runs so that we use less and only when needed and do not have any unnecessary runs. But it's hard to define and figure out which runs are unnecessary. For example, if I'm sure I'm going to force push to a branch at some point, I would rather wait till I do everything, including the force push and then I'll trigger the workflow run
Even if we did the optimizations, we were going to hit the 2000-minute limit at some point. So, leaving the optimizations, we wanted to run GitHub Actions beyond the 2000 free minutes we got. We wanted to use the 2000 free minutes per month too, because, why not? :P So, we got into researching what we could do given we had already decided we would self-host GitHub Actions Runners. I just decided to stick with GitHub Actions because
The experience of using it is great! And smooth
It's simple and easy to use and understand
I didn't wanna maintain a complete CI/CD system, which is huge
The last reason is an important point. Why I don't wanna maintain you ask? A classic CI/CD system will have servers, which is the control plane that will assign jobs to workers, and it will have workers that run the jobs. There will be a mechanism to implement and use a cache to speed up builds, run tests etc; there will be a mechanism to interact with the version control system, which is GitHub in our case, and it will also have an artifact storage space - for example S3 is a popular example. There should also be the maintenance of logs, and one has to consider what is the data retention - for logs, statuses about very old pipelines, artifacts etc. Or else they will live forever and make the system storage bigger and bigger and harder to maintain. Also, who will pay for all that storage? Also, there's compute, for which we have to pay too, and then there's network too, which is usually priced high in some cases - for example for using the Internet; Maybe people won't charge for private network usage. Also, don't forget that you need to have a database for this CI/CD system - either you run it, or you get a managed instance. Both are gonna be costly - either self-host and management cost or Database as a Service cost.
So, when you maintain a CI/CD system, you have to do a lot. Even if the system is built by someone else - for example, in my last stint at VMware we tried to self-host and run Kubernetes-based Prow CI/CD in our infrastructure and I remember how much of a struggle it was - as Prow was complex. So, even if you have open source and/or free CI/CD software, running them is not gonna be a piece of cake, unless they built it with operations in mind, for example, any system operations, maintenance (upgrades), scaling up and down, configuration management etc and ensured that those operations were easy to do by a single human or small team of 3 or 4 or few developers
I didn't wanna do so much maintenance work. And I'm the only one working with Infrastructure in Togai. So I chose GitHub Actions, which looked like a good idea
So, now, with 2000 minutes per month, and the plan of self-hosting GitHub Actions Runners, what are the cost implications? And is it better than using the GitHub-managed runners? These are the questions I had in mind
So, I was checking out the cost of Linux GitHub-managed runners, which we were using and saw it's $0.008
per minute. At first glance, this may look pretty cheap. Also, if we were to use 4,000 minutes of workflow runs in a month, the first 2,000 minutes would be free, and the remaining 2,000 minutes would cost $16, the calculation will look like this -
2000 x $0.008 = $16
That seems like a small cost for one month. But let's not forget, CI/CD systems are something developers generally use often. As we scale at Togai, the code will be pushed more often, and there will probably be more services/microservices in the future and hence more git repositories as we don't follow a monolithic repository concept as of now, and on a whole we would have more code and more things happening leading us to use more CI/CD, meaning more minutes used. Also, the first month we used GitHub Actions, we reached 2000 minutes in two-thirds of a month and the next month we reached 2000 minutes in one-third of a month. But we can't say the same for next month, it could be more, or maybe even less. But in the long run, I think it will be more, as I said before - as we scale, more code etc. So, more minutes means more cost for GitHub-managed runners
Now, what's the cost implication of using self-hosted GitHub Action runners? It's free. But, it's hard to believe, because nothing is free usually 😅 . So, what is free here? I understand I am running my own self-hosted GitHub Actions runner somewhere - the cloud, my local laptop, or some on-premise data center or private cloud etc, and I pay the cost of running that machine that powers the GitHub Actions runner software. So I'm paying for the compute, network and storage costs for running the machine running the self-hosted GitHub Actions runner. What about the GitHub Actions features - the Graphical User Interface (GUI) on the GitHub website, and the storing of logs, what about all of that? Looks like it's all free! 😳 So, this means that - in the case of GitHub Actions - the control plane is free, and is hosted and managed by GitHub, and the control plane tells runners (basically workers) about what to run, and if the runners are managed by GitHub, you pay for it. If the runners are yours, you don't pay GitHub a dime to GitHub. But the GitHub Actions control plane, GUI and other related niceties come along for free, and you just pay to run your self-hosted runners. How cool is that? 😁😄😀 To know more about the exact cost of running GitHub Actions Self Hosted Runners on your own AWS Infrastructure, check the end of this blog post
Let's see how we have implemented self-hosted GitHub Action runners at Togai and the different iterations that happened
I was thinking about how to go about the implementation. I had a few ideas. I have seen one interesting setup at VMware for an open-source project where we wanted powerful machines and instead of using GitHub's managed large runners, we were running our self-hosted runners on AWS EC2 instances. GitHub's managed large runners are a pretty new feature and seem a bit costly too 😅. I don't know why GitHub-managed large runners weren't chosen at VMware - maybe it wasn't even known/considered, or maybe it wasn't available at the time. Anyways, the open source project where we needed powerful machines for CI was Tanzu Community Edition and one of the release engineers in my team wrote a small service using Golang, which would receive webhook calls whenever there was a workflow run and then the service would create self-hosted runners on-demand using AWS APIs on AWS and then kill the runner once the workflow was complete - regardless of if it was a success/failure
I loved this idea of on-demand self-hosted runners. On-demand as in - get them only when you need them, on-demand and then terminate them after using them.
I was planning on using this long-running service behind the scenes at Togai too, but I realized that I'll have to open up an endpoint, write an API, put access control around it (Authentication, Authorization too maybe) and then have rate limiting and blocklisting IPs too, to avoid any spam from the Internet. All of these were not done at VMware, at VMware they just used a hardcoded key for authentication I think, between the repository webhook caller and the API
Given the complexity of the above solution, I have put it off for later. So, the first thing I did was, look at other solutions for a quick implementation. There were some recommendations by GitHub themselves which had two big projects. Since we aren't using Kubernetes as of now, I chucked out the project based on Kubernetes. We do plan to use Kubernetes shortly, so we might come back to this decision and revisit it to check if we need to change things. The other project was based on Terraform which we do use, but the architecture seemed pretty complex, with Lambda and a lot of other services to do things. I wasn't sure about the cost of this infrastructure. Something to come back and visit. I was looking for a simple solution, as simple as possible. One of the prominent solutions I noticed was - https://github.com/machulav/ec2-github-runner
I quickly used https://github.com/machulav/ec2-github-runner and was able to see that the self-hosted runners were getting created on demand and then running tasks and then getting stopped, similar to the idea 💡 mentioned in the repo - on how it's supposed to be used
Note that when I say on-demand instance, I'm talking about AWS On-Demand EC2 Instance. Other instance types are Reserved Instance (RI) and Spot Instance. When I say on-demand runner, or creating runners on-demand - I'm referring to the ability to run runners on-demand, like, whenever you need it and only when you need it, and then destroying it once it's used. I wanted to keep that difference clear or else it could get confusing
Let me explain how the self-hosted runner works, and the concept behind it. So, GitHub manages the GitHub Actions control plane and gives an agent software to run on the worker nodes which will run the CI/CD tasks. This agent is available publicly as an open-source software over here - https://github.com/actions/runner. This agent is the GitHub Actions Runner software. We refer to the runner software as the runner software and the machine running the runner software as the runner machine
Initially, since we had used up all the free minutes in a month, we ran a long-running runner machine at the TogaiHQ GitHub Organization level, which could be used by any of the repositories under the TogaiHQ GitHub Organization
The process that started and stopped EC2 instances (servers) with self-hosted runners on-demand, was running on this long-running runner machine. In the future though, we expect that we will use the free 2000 minutes on the process that starts and stops the self-hosted runners and also for any small automation work that takes a few moments or a minute or two. Once these 2000 minutes are over, we will go back to using long-running self-hosted runners for running small automation work, and to start and stop self-hosted runners.
Initially, we had used EC2 instances in a private network as GitHub Runner machines, and these machines had access to the Internet, and hence the GitHub APIs to connect with the GitHub Actions control plane. Internet access was possible through an AWS NAT gateway. In discussing with my CTO, he mentioned that the cost of running an AWS NAT gateway was high - because it's a managed service, running NAT software to do NAT (Network Address Translation). Look below to find the pricing of NAT Gateway
As of this writing, the NAT gateway pricing is available at https://aws.amazon.com/vpc/pricing/. Screenshot below -
So, for this reason, I removed the usage of NAT gateway from the architecture and instead assigned public IPs to the EC2 instances running the GitHub Actions runner, so that they have access to the Internet.
You can refer to all the Terraform Code and GitHub Actions Workflow YAML code over here - https://github.com/karuppiah7890/github-actions-self-hosted-runner-terraform. It's all standalone and will work by itself ideally
The repository also contains the architecture diagram and sequence diagrams in code form and image form. Yes, code form - diagram as code ;) Let me still put the images over here
AWS Infrastructure setup diagram -
GitHub Actions Self-Hosted Runner Working Sequence Diagram:
The way it works now is it's a three-step process
Step 1: Start AWS EC2 GitHub Self-Hosted Runner instance. This process runs on a GitHub Hosted VM
Step 2: Run the actual CI/CD job on the AWS EC2 GitHub Self-Hosted Runner instance
- Jobs like lint, build, test, release, publish etc
Step 3: Stop the AWS EC2 GitHub Self-Hosted Runner instance. This process runs on a GitHub Hosted VM
Now, there were quite some issues and PRs in the https://github.com/machulav/ec2-github-runner/ repo. Some of the issues and PRs made sense. For example, the ability to use spot EC2 instances - https://github.com/machulav/ec2-github-runner/issues/5. And there was a PR too for this - github.com/machulav/ec2-github-runner/pull/62. Noticing some of these issues, for example, https://github.com/machulav/ec2-github-runner/issues/128 and more, I went ahead and created a fork of the repo here - github.com/karuppiah7890/ec2-github-runner. You can see the different changes I made here - https://github.com/karuppiah7890/ec2-github-runner/commits, like ephemeral runners, assigning public IP to the runner instances, allowing spot EC2 instances to be used given a parameter and some more small stuff
It wasn't / isn't all rosy though 😅 Let me talk about different issues we have faced till now.
One of the prominent issues is - the job not getting picked up for a long time. We still haven't found the root cause of this issue though. One guess we have is - this could be because the runner software installed in the EC2 instance is not compatible with the GitHub Actions Control Plane API. But this issue happens only at times, so, incompatibility issues only at times. I know that sounds weird. I'll post here if and when I find the root cause. The one solution to that is - to always install the latest GitHub Actions Runner software whenever the EC2 instance is created on demand, by using user data script. The code for getting the latest version can be injected here - https://github.com/karuppiah7890/ec2-github-runner/blob/d4c1e82c47b704ece2166bffa05b6316816962a8/src/aws.js#L15. It hasn't been done yet, as of this writing ✍️
The other issue was that - when using spot instances, sometimes spot instances were not available and this was clear in the error shown in the logs, an example -
The solution for this is to retry - and create a spot instance in another region or availability zone
There are also some possible flaws in this system. Since the creator and the destroyer of the on-demand EC2 instance is the workflow itself, it's possible that an instance gets created but doesn't get destroyed due to some reason - for example, the destroy task doesn't run due to some reason like workflow canceled and aborted, etc. To avoid any zombie EC2 instances hanging around, basically EC2 instances running and not being used and being a waste - we decided to kill EC2 instances that are missed as part of termination. How did we do this? We simply defined a timeout time for any running EC2 instance that's a GitHub Action Runner machine and killed it (terminated it) once it went beyond the timeout time. For example, the current timeout time is 30 minutes. This is all done automatically by a small simple tool that I wrote - https://github.com/karuppiah7890/ec2-killer - weirdly it finds EC2 instances running in a particular VPC with a particular security group and kills only them. We use a separate VPC to run just the GitHub Actions Self Hosted Runners and separate it from other workloads - like our microservices, database (PostgreSQL), message broker (NATS), caching server (Redis) etc
The killer requires some permissions. The IAM policy attached to the IAM user used for killing the zombie EC2 instances is -
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "ec2:TerminateInstances",
"Resource": "arn:aws:ec2:ap-south-1:032794201922:instance/*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": "ec2:DescribeInstances",
"Resource": "*"
}
]
}
A note for people who are trying to implement this solution - ensure that the permissions for the GitHub token are appropriate and correct. You can find the token and permission details in the repo itself. The same thing goes for the access of the AWS credentials. Also, use an AMI image with appropriate software installed in it. Refer to the README here - https://github.com/karuppiah7890/github-actions-self-hosted-runner-terraform/blob/7e5334099e935432f540d4f029e239880293d709/github/README.md
Where is the code present for all of GitHub Actions Self Hosted Runner implementation in Togai?
As I said - We use a fork of https://github.com/machulav/ec2-github-runner repo. The fork has some subtle changes - fixes, a few upgrades, and a few new features that we need at Togai. It works well and we have seen it work. Fork is here - https://github.com/karuppiah7890/ec2-github-runner/
For the Amazon Machine Image for the runners / EC2 instances, we have created a custom Amazon Machine Image on top of the Ubuntu 22.04 LTS base image, along with
git
,docker
,jq
,unzip
,zip
installed. It also has the GitHub Actions Runner application installed- We are leaving off
aws
CLI installation because AWS CLI is a client of AWS and could keep changing to accommodate new AWS APIs and we can’t keep changing the Amazon Machine Image and keep releasing those new images to all workflows for every new version of AWS CLI. So, we just installaws
CLI as part of the workflow.git
,docker
,jq
,unzip
andzip
is less likely to change - especially for the features we use. But if they do change, we have to create a new image
- We are leaving off
Cost of running Self-Hosted GitHub Actions Runner on your own AWS Infrastructure? An example with Togai
The cost (with the calculation) of this whole setup in Togai has been mentioned below and how it compares to the GitHub Hosted Runner Cost. We chose a particular AWS region and particular instance types to keep the cost low. The cost may vary based on your decision of which instance to choose
So, let's answer the question - what’s the average per-day cost of the current setup?
Assuming 20 pipeline runs per day per repo and each pipeline run is 10 minutes long, which is 200 pipeline minutes per day per microservice repository, and 10GB egress (data out) data transfer on a whole per day, the cost per day is around $5.22. After February 1, 2024, the cost per day is around $5.31. See below for the complete calculation
Note 1: We have 6 microservice repositories
Note 2: For the number of days per month - we use 30 when using it in the denominator in our calculation. And we use 31 when using it in the numerator in our calculations. This way, we get the upper bound value / higher value
AWS Resources used that have a price (Price as of 24th August 2023) and are not free:
Instance name (type) | On-Demand hourly rate | On-demand per-minute rate | vCPU | Memory | Storage | Network performance |
t3a.small | $0.0123 | $0.000205 | 2 | 2 GiB | EBS Only | Up to 5 Gigabit |
t3a.medium | $0.0246 | $0.00041 | 2 | 4 GiB | EBS Only | Up to 5 Gigabit |
t3a.large | $0.0493 | $0.000821666666667 | 2 | 8 GiB | EBS Only | Up to 5 Gigabit |
AWS EC2
Price per On Demand Linux EC2 instance per hour in Asia Pacific (Mumbai) Region (ap-south-1), along with instance details, for the types of instances we currently use is listed in the above table
As of today, 24th August 2023, we use
t3a.small
instances for pipelines in 2 service repositoriesGiven 1 service, assuming 20 pipeline runs per day, so 20 EC2 runners and each pipeline EC2 runner runs for 10 minutes (600 seconds), cost per day is = 20 x 10 x $0.000205 = $0.041. This is the cost for 200 pipeline minutes per day for the 1 service repository
We have 2 services here, so the cost per day is = $0.041 x 2 = $0.082
As of today, 24th August 2023, we use
t3a.medium
instances for pipelines in 1 service repository- In the given service, assuming 20 pipeline runs per day, so 20 EC2 runners and each pipeline EC2 runner runs for 10 minutes (600 seconds), the cost per day is = 20 x 10 x $0.00041 = $0.082. This is the cost for 200 pipeline minutes per day for the 1 service repository
As of today, 24th August 2023, we use
t3a.large
instances for pipelines in all ktor services in private repositories, which is - 3 servicesGiven 1 service, assuming 20 pipeline runs per day, so 20 EC2 runners and each pipeline EC2 runner runs for 10 minutes (600 seconds), cost per day is = 20 x 10 x $0.000821666666667 = $0.164333333333. This is the cost for 200 pipeline minutes per day for 1 service
We have 3 services here, so the cost per day is = 3 x $0.164333333333 = $0.493
AWS EBS from https://aws.amazon.com/ebs/pricing/
We use
gp2
General Purpose SSD VolumesThe price is $0.114 per GB-month of provisioned storage in
ap-south-1
AWS regionWe use 15GB for the volume that’s attached to the instance
- This 15GB volume is used as the boot disk which contains the Operating System (OS) from which we need to boot, and this volume is also used to store other data - data of the CI/CD job
Price we pay per month for 15GB storage = 15 x $0.114 = $1.71
Price we pay per day for 15GB storage = $1.71 / 30 = $0.057
Price we pay per minute for 15GB storage = $0.057 / (24 x 60) = $0.0000395833333333
The price we pay per day for 1200 minutes (200 per repo per day, for 6 repos) of 15GB storage = 1200 x $0.0000395833333333 = $0.0475
AWS VPC
Public IPv4 Address (with effect from February 1, 2024). From https://aws.amazon.com/vpc/pricing/ >
Public IPv4 Address
tab and from https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html#pricingHourly charge for In-use Public IPv4 Address $0.005
Hourly charge for Idle Public IPv4 Address $0.005
We have 1 Public IP per pipeline
Price we pay per minute for 1 Public IP = $0.005 / 60 = $0.0000833333333333
Price we pay for 20 pipelines per repo per day, with 10 minutes per pipeline, with 1 Public IP per pipeline = 20 pipelines x 6 repos x (Price of 1 Public IP per minute x 10 minutes) = 20 x 6 x $0.0000833333333333 x 10 = $0.1. This is the same as the price we would pay per day for 1200 minutes for using 1 public IP
Network
Accessing the Internet. Data Transfer Charges
Data Transfer OUT From Amazon EC2 To Internet
AWS customers receive 100GB of data transfer out to the Internet free each month, aggregated across all AWS Services and Regions (except China and GovCloud). The 100 GB free tier for data transfer out to the internet is global and does not apply separately or individually to AWS Regions.
First 10 TB / Month, the cost is $0.1093 per GB in
ap-south-1
AWS region- This cost is really high compared to the cost per GB in other regions like the US regions
Surely we won’t cross 10TB / Month for now
Will we cross the 100GB free limit? Yes. The data transfer out actions / upload actions that we do that cost us are
Upload and store Pipeline Artifacts in GitHub servers
- Currently, we just have JaCoCo (Java Code Coverage) reports for ktor service repos. 1 report per pipeline. These reports are of size less than 100KB per pipeline run. We have 3 Ktor service repos and on average 20 pipelines running per day in each repo, so that’s 3 x 20 x 1 = 60 artifacts per day, each 100KB (max), so that’s 60 x 100KB per day = 6000KB per day = 6MB per day. That’s 31 x 6MB per month = 186MB per month = 0.186GB per month
Upload and store cache (i.e. write to cache)
In Ktor services
Currently, this happens only on the default branch, like
main
As of today, per pipeline, the cache size can be as big as around 700MB, so that can be the max cache size that can be written to / restored from
We have 3 ktor service repos and on an average 20 pipelines running per day in each repo, so, assuming all 20 pipelines were run in the
main
and not on PRs that’s 3 x 20 = 60 pipelines per day, so that’s 60 x 700 MB per day = 42000MB per day = 42GB per day. That’s 31 x 42GB per month = 1302GB
In Node.js services
For one of the services ~= 60MB per pipeline and on average 20 pipelines running per day, so that’s 20 x 60MB = 1200MB = 1.2GB per day. That’s 31 x 1.2GB per month = 37.2GB
Other services - No additional cost as they don’t save any cache or restore from the cache - based on the pipeline run logs
Since we know we will go above the free 100 GB, let’s look at the cost for a month and then for a day. The cost per month is = total GB of data transfer out x $0.1093 = (0.186 GB + 1302GB + 37.2GB - 100GB) x $0.1093 = 1239.386 x $0.1093 = $135.4648898 . The cost per day on average is $135.4648898 / 30 = $4.51549632667
The free AWS resources we use
Network
Accessing the Internet. Data Transfer Charges
Data Transfer IN To Amazon EC2 From Internet
- All data transfer in - $0.00 per GB No Cost.
Data Transfer OUT From Amazon EC2 To Amazon S3
i.e. Upload to AWS S3
- Free of Cost
So, the cost per day till February 1 2024 is (without cost for Public IPv4 address) =
EC2 cost for 6 repos for 200 pipeline minutes per day per repo + EBS Cost for 6 repos for 200 pipeline minutes per day per repo + Network Cost per day = ($0.082 + $0.082 + $0.493) + $0.0475 + $4.51549632667 = $5.21999632667
Cost per minute is = $5.21999632667 / (24 x 60) = $0.00362499744908 ~= $0.004
And the cost per day after February 1 2024, including Public IPv4 address cost, is -
EC2 cost for 6 repos for 200 pipeline minutes per day per repo + EBS Cost for 6 repos for 200 pipeline minutes per day per repo + Network Cost per day + Public IPv4 Address Cost for 6 repos for 200 pipeline minutes per day per repo = ($0.082 + $0.082 + $0.493) + $0.0475 + $4.51549632667 + $0.1 = $5.31999632667
Cost per minute is = $5.31999632667 / (24 x 60) = $0.00369444189352 ~= $0.004
Most of the cost comes from the network cost (upload cost)
Comparing this with $0.008 per minute for a Linux machine hosted by GitHub is not an apples-to-apples comparison 🍎 🍏 since GitHub-hosted runner machines come with 7GB RAM, 2 cores, and 14GB SSD. Here we use 15GB SSD and some machines we use have 8GB RAM, 2 vCPUs (i.e. 2 cores) for some repos (ktor service repos), and some machines we use 4GB RAM, 2vCPUs for one of the services and some machines we use 2GB RAM, 2vCPUs (for 2 services)
To compare with GitHub Hosted Runners cost, we can consider all EC2 machines of the same configuration - close to GitHub Hosted Runner configuration - 8GB RAM and 2 vCPU
EC2 cost for 6 repos for 200 pipeline minutes per day per repo + EBS Cost for 6 repos for 200 pipeline minutes per day per repo + Network Cost per day = ($0.164333333333 x 6) + $0.0475 + $4.51549632667 = $5.54899632667
Cost per minute is = $5.54899632667 / (24 x 60) = $0.0038534696713 ~= $0.004
And the cost per day after February 1, 2024, including Public IPv4 address cost, is -
EC2 cost for 6 repos for 200 pipeline minutes per day per repo + EBS Cost for 6 repos for 200 pipeline minutes per day per repo + Network Cost per day + Public IPv4 Address Cost for 6 repos for 200 pipeline minutes per day per repo = ($0.164333333333 x 6) + $0.0475 + $4.51549632667 + $0.1 = $5.64899632667
Cost per minute is = $5.64899632667 / (24 x 60) = $0.00392291411574 ~= $0.004
So, the cost per minute in our case is half the cost of what GitHub Hosted Runner provides - $0.004 (our self-hosted runner cost) versus 🆚 $0.008 (GitHub hosted runner cost)
For the use case we have taken, which is 20 pipeline runs per day per repo and each pipeline run is 10 minutes long, which is 200 pipeline minutes per day per microservice repository, for 6 microservice repositories this would mean = 1200 minutes per day, so using GitHub Hosted Runners, say we have already used the free limit of 2000 minutes (around 33.33hrs), then the cost for 1200 minutes per day would be = 1200 x $0.008 = $9.6 per day
What’s the average number of pushes per month that trigger the CI/CD pipeline?
Around 5-10 on average, per day per repo. As we can see around 10 pipeline runs on average, per day per repo. We have 6 microservice repositories
Checking the total cost of running on-demand EC2 instances for GitHub Actions Self-Hosted Runners in your AWS Bill :D
We can add tags to all the EC2 instances we run and also tag other resources used for the GitHub Actions Self Hosted Runner Infrastructure, for example, Internet Gateway
For example, I added the below tags for the EC2 instances, using aws-resource-tags
parameter in the Start EC2 runner
step
[{"Key":"Feature", "Value":"Continuous Integration"},{"Key":"Repository", "Value":"${{ github.repository }}"}]
You can then go to AWS Billing
> Billing
> Cost allocation tags
> User-defined cost allocation tags
and then activate keys 🔑 like Feature
, Repository
and then later use it in your cost explorer to filter out the costs for running on-demand EC2 instances for GitHub Actions Self-Hosted Runners and other tagged infrastructure like Internet Gateway
This way, you can see the cost of your CI/CD setup using GitHub Actions Self-Hosted Runners on a per-repository basis :D Below is an example of what that looks like :)
Look at the Tag
🏷️ section on the right, with Feature
and Repository
tag and values selected for it :D
You can add other values to your Feature
key such as other prominent features of the system provided to external and/ internal users and check the cost of those :)