Recently, at Ola, the Sentinels team, that is the security team in Ola, was asking us, the Core Infrastucture team, to help with getting Logs πͺ΅ for many things for a PCI Audit
PCI - Payment Card Industry. PCI is a compliance. I think itβs called PCI DSS popularly, which expands to - Payment Card Industry Data Security Standard. You can search π π π π¦ online to find more about this
Now, this was the use case / need. And in our case, we were running Kubernetes Clusters using AWS EKS - Amazon Web Services Elastic Kubernetes Service, which is a managed service for providing Kubernetes Cluster as a service. Now, since itβs a managed service, AWS takes care of it all - managing the Kubernetes Cluster. In our case, AWS manages only the control plane and we run the worker nodes on AWS EC2 - Elastic Compute Cloud βοΈ. The Kubernetes Control Plane generally consists of many components - for example, the main ones being - Kubernetes API Server, a Database like etcd, Kubernetes Scheduler, Kubernetes Contoller Manager, Kubernetes Cloud Controller Manager for any Cloud related stuff like to integrate Kubernetes with AWS Cloud in this case, all through AWS APIs.
The security team wanted the logs for all the workloads (pods) - software applications running on worker nodes and also the logs of the control plane software components
AWS EKS has a feature to ship the AWS EKS Control Plane Logs to AWS CloudWatch, a popular service of AWS for logs, monitoring and some interesting things around it. Itβs also a costly service from what I hear and from what I have seen from far.
I donβt think thereβs any other integration or way or method to ship AWS EKS Control Plane Logs outside the Control Plane other than the AWS CloudWatch integration. Itβs a very smart move by cloud companies I guess, to force and sell the observability stuff like this, without any other options and maybe this also has many pros than cons like cost. Anyways, letβs move on to how to do this
So, once you enable CloudWatch for the AWS EKS Cluster Control Plane, you can choose to ship logs of the different Control Plane software components to CloudWatch. From the console, and from the current docs πππ https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html, I can see that you can send the logs of API Server, Controller Manager, Scheduler, Audit Logs, Authenticator Logs, thatβs all
Once you send the logs to CloudWatch, what next? In our case, the security team at Ola uses Sumo Logic to work with this data πππ , the log data that is shipped by all the different systems at Ola that are coming under PCI Compliance - which is not hte complete entireity of Ola, just a small part of it
So, security team - we can send Logs to CloudWatch and then send Logs from CloudWatch to S3 and then they will pick it up from there and ingest it into Sumo Logic
So, for this, I got started. I asked the security team to create a secure bucket πͺ£ with no access to anyone except just a few and to give me access to it too, to put data, get data
Once I got the S3 bucket, I started following these two documents
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/S3ExportTasks.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/S3ExportTasksConsole.html
Now, note that, once you skim through, you will understand that this is JUST about sending the logs from CloudWatch to S3 just once - given a time π°οΈ frame - between a from and to timestamp. So, itβs NOT streaming the CloudWatch Logs πͺ΅ in real time to S3 - NO, itβs NOT. For that, we need to think of a different solution!
It took me sometime to realize that Iβm actually doing this thing across two AWS Ola Accounts. One was where the actual stuff was running and giving out logs πͺ΅ and the CloudWatch in that account had the logs, and the other was where the S3 bucket πͺ£ got created by the sentinels team, in an AWS account that they manage and own :)
So, basically, I followed the Cross-account export
sections. I was trying to do the simplest thing to see if it would work - either using AWS CLI or using the AWS Console. I did it all using AWS Console first on one account (S3 bucket AWS account), with CLI for the other account (CloudWatch Logs AWS account). Basically, I used a mix. I could have used just one of these, ideally the AWS CLI, but I didnβt. In the case of the CloudWatch Logs πͺ΅ AWS account - I just had AWS CLI (v2) access with AWS Access Keys ππ. I could have checked if I can create a temporary user with AWS Console access etc and used that, but I didnβt - I didnβt wanna spend time on that over there. And I didnβt wanna create AWS Access Keys for the S3 bucket AWS account since I didnβt wanna leak anything even by mistake. I just ended up using the free π AWS CloudShell for the S3 bucket AWS account, to do anything over there, securely :)
I first figured out whatβs the CloudWatch thing I need to ship to S3. After some playing around, I noticed there are two AWS CLI (v2) commands around AWS CloudWatch. One is aws cloudwatch
which seems obvious, and the other one is aws logs
. You can use aws logs
to tail logs which is a thing that Sentinels team was asking - if they can get the logs in the terminal directly from CloudWatch, using APIs, or say in a CLI. And yes ππ, you can!
https://awscli.amazonaws.com/v2/documentation/api/latest/reference/logs/tail.html - aws logs tail
command
You can also use aws logs
command to list the βlog groupsβ that one has. Iβm yet to read more about these, but given the name, it seems like these are groups of logs? Different groups of logs. So, some of the commands I used are below β¬οΈβ¬οΈππ
To list all log groups -
$ aws logs describe-log-groups
For a nice colored output, and no pagination, I used this -
$ aws logs describe-log-groups | cat | jq
cat
gets rid of the pagination. Itβs the standard cat
command and then jq
colors the JSON output πππ. jq
docs πππ /documentation - https://jqlang.github.io/jq/
I found out the log group I need to focus on. It was -
{
"logGroupName": "/aws/eks/prod-ap-southeast-1-cp/cluster",
"creationTime": 1710998380542,
"retentionInDays": 90,
"metricFilterCount": 0,
"arn": "arn:aws:logs:ap-southeast-1:998877665544:log-group:/aws/eks/prod-ap-southeast-1-cp/cluster:*",
"storedBytes": 115407242988,
"logGroupClass": "STANDARD",
"logGroupArn": "arn:aws:logs:ap-southeast-1:998877665544:log-group:/aws/eks/prod-ap-southeast-1-cp/cluster"
}
Once I got this, I moved on to the next thing. By the way, cp
is short for control plane, itβs NOT βcopyβ like the popular Linux command, haha. Itβs a convention / short form we use here internally in my team :)
Also, note that the retention period is 90 days here. So, I was planning to move only the last 90 days logs from CloudWatch to S3. I did this activity / task on September 21st 2024, and 90 days before that was June 23rd 2024 I think π€π. I just chose June 24th 2024 for the time being as the start date and 12 am UTC as the start time, and the end date π π time as September 21st 2024 12 am UTC
I first created a bucket policy on the bucket πͺ£, in the account where the bucket was created (which was different from where the logs are being shipped from), like this -
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "s3:GetBucketAcl",
"Effect": "Allow",
"Resource": "arn:aws:s3:::sample-audit-logs",
"Principal": {
"Service": "logs.ap-southeast-1.amazonaws.com"
},
"Condition": {
"StringEquals": {
"aws:SourceAccount": [
"998877665544"
]
},
"ArnLike": {
"aws:SourceArn": [
"arn:aws:logs:ap-southeast-1:998877665544:log-group:*"
]
}
}
},
{
"Action": "s3:PutObject",
"Effect": "Allow",
"Resource": "arn:aws:s3:::sample-audit-logs/*",
"Principal": {
"Service": "logs.ap-southeast-1.amazonaws.com"
},
"Condition": {
"StringEquals": {
"s3:x-amz-acl": "bucket-owner-full-control",
"aws:SourceAccount": [
"998877665544"
]
},
"ArnLike": {
"aws:SourceArn": [
"arn:aws:logs:ap-southeast-1:998877665544:log-group:*"
]
}
}
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::998877665544:user/sample-uer"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::sample-audit-logs/*",
"Condition": {
"StringEquals": {
"s3:x-amz-acl": "bucket-owner-full-control"
}
}
}
]
}
I applied the above β¬οΈ πβοΈ bucket πͺ£ policy JSON using console but you can do from the command line too :)
βΉοΈ Note ποΈπ that, using the exact Log Groupβs ARN didnβt work out, so, chuck it. This is for the Resource
field in the JSON. Not sure why that didnβt work out. Maybe some issue / bug in what I did. I basically tried to put something like this -
arn:aws:logs:ap-southeast-1:998877665544:log-group:/aws/eks/prod-ap-southeast-1-cp/cluster
Thatβs the log groupβs ARN. I guess the logs each have their own ARN? Not sure, π€ since thereβs also this ARN -
arn:aws:logs:ap-southeast-1:998877665544:log-group:/aws/eks/prod-ap-southeast-1-cp/cluster:*
Notice the :*
at the end. Maybe I should have used this one βοΈ 1οΈβ£ with the :*
at the end and NOT the one without it.
Letβs move on now :) Then I created an IAM policy using the policy JSON
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::sample-audit-logs/*"
}
]
}
And attached this IAM policy to my user - basically to the user or role thatβs actually gonna perform the action to say start shipping of CloudWatch Logs to S3
So, I did it like this -
$ aws iam create-policy --policy-name cloudwatch-log-exporter --policy-document file:///Users/karuppiah.n/cloudwatch-log-exporter-iam-policy.json
$ aws iam attach-user-policy --user-name karuppiah --policy-arn arn:aws:iam::998877665544:policy/cloudwatch-log-exporter
998877665544
is the ID π πͺͺ of the AWS Account from where the CloudWatch logs πͺ΅ will be exported to the other account where the AWS S3 bucket πͺ£ is present
If it worked fine, the command should be successful
Weirdly, it showed me that my user has no policies, but weirdly it all worked fine, the shipping of CloudWatch Logs πͺ΅ to S3
$ aws iam list-user-policies --user-name karuppiah
For example, the AWS CLI (v2) command I ran to get the CloudWatch Logs to be shipped to S3 is -
$ aws logs create-export-task --task-name "aws-eks-prod-ap-southeast-1-cp-cluster-2024-09-21" --log-group-name "/aws/eks/prod-ap-southeast-1-cp/cluster" --from 1719230400000 --to 1726920000000 --destination "sample-audit-logs" --destination-prefix "cp-cluster-logs"
1719230400000
is basically - June 24, 2024 12:00:00 PM UTC, in epoch timestamp in milliseconds. Uh, I was aiming for 12 AM, but looks like I made a mistake, lol. I just realized the mistake while writing this blog! LOL!
1726920000000
is basically - September 21, 2024 12:00:00 PM UTC, in epoch timestamp in milliseconds. Again, Uh, I was aiming for 12 AM, but looks like I made a mistake, lol. I just realized the mistake while writing this blog! LOL!
So, those are the from and to timestamps. Note that they are timestamps in milliseconds!! And they are epoch timestamp! You can learn more about epoch timestamps online, say at https://www.epochconverter.com/. Thanks to https://www.epochconverter.com/ for helping me with conversions around date and time to epoch timestamp π°οΈ
On running the above, it finally gave me a task ID like this
{
"taskId": "0aj1e275-c109-4fe6-94c3-cd3d52f9d9988"
}
You can use this task ID π πͺͺ to query and understand the status of the task. Itβs an asynchronous task I believe, as it will take a lot of time, and it makes sense :) Example Output of how to check tasks status -
$ aws logs describe-export-tasks --task-id 0aj1e275-c109-4fe6-94c3-cd3d52f9d9988
{
"exportTasks": [
{
"taskId": "0aj1e275-c109-4fe6-94c3-cd3d52f9d9988",
"taskName": "aws-eks-prod-ap-southeast-1-cp-cluster-2024-09-21",
"logGroupName": "/aws/eks/prod-ap-southeast-1-cp/cluster",
"from": 1719230400000,
"to": 1726920000000,
"destination": "sample-audit-logs",
"destinationPrefix": "cp-cluster-logs",
"status": {
"code": "RUNNING",
"message": "Started successfully"
},
"executionInfo": {
"creationTime": 1726862637283
}
}
]
}
To confirm if everything is good - just go to the S3 bucket πͺ£ and see if it has the folder π π named cp-cluster-logs
and then see if it has a file named aws-logs-write-test
and if it has the content -
Permission Check Successful
Then you are good to go :)
Once the task completes, you should see something like this while describing the task -
$ aws logs describe-export-tasks --task-id 0aj1e275-c109-4fe6-94c3-cd3d52f9d9988
{
"exportTasks": [
{
"taskId": "0aj1e275-c109-4fe6-94c3-cd3d52f9d9988",
"taskName": "aws-eks-prod-ap-southeast-1-cp-cluster-2024-09-21",
"logGroupName": "/aws/eks/prod-ap-southeast-1-cp/cluster",
"from": 1719230400000,
"to": 1726920000000,
"destination": "sample-audit-logs",
"destinationPrefix": "cp-cluster-logs",
"status": {
"code": "COMPLETED",
"message": "Completed successfully"
},
"executionInfo": {
"creationTime": 1726862637283,
"completionTime": 1726870321063
}
}
]
}
With that, you can say itβs done. You should be able to say itβs done π βοΈ β βοΈ. And then of course check the S3 bucket πͺ£. It will files in it
The S3 bucket πͺ£ will have file structure something like this -
.
βββ sample-audit-logs/
βββ cp-cluster-logs/
βββ uuid1/
β βββ authenticator-uuid2
β βββ authenticator-uuid3
β βββ authenticator-uuid4
β βββ authenticator-uuid5
β βββ ...
β βββ kube-apiserver-uuid20
β βββ kube-apiserver-uuid21
β βββ kube-apiserver-uuid22
β βββ kube-apiserver-uuid23
β βββ ...
βββ aws-logs-write-test
(Thanks to https://tree.nathanfriend.io/ for creating the above tree file structure. And Thanks to https://tools.namlabs.com/uuid-validation/ for helping with validating if a string is a UUID or not :))
Where sample-audit-logs
is the S3 bucket πͺ£
Note that the uuid
s are I think version 4 UUID and are of course random, unique and different.