Shipping CloudWatch Logs to S3

Recently, at Ola, the Sentinels team, that is the security team in Ola, was asking us, the Core Infrastucture team, to help with getting Logs πŸͺ΅ for many things for a PCI Audit

PCI - Payment Card Industry. PCI is a compliance. I think it’s called PCI DSS popularly, which expands to - Payment Card Industry Data Security Standard. You can search πŸ‘€ πŸ” πŸ”Ž πŸ”¦ online to find more about this

Now, this was the use case / need. And in our case, we were running Kubernetes Clusters using AWS EKS - Amazon Web Services Elastic Kubernetes Service, which is a managed service for providing Kubernetes Cluster as a service. Now, since it’s a managed service, AWS takes care of it all - managing the Kubernetes Cluster. In our case, AWS manages only the control plane and we run the worker nodes on AWS EC2 - Elastic Compute Cloud ☁️. The Kubernetes Control Plane generally consists of many components - for example, the main ones being - Kubernetes API Server, a Database like etcd, Kubernetes Scheduler, Kubernetes Contoller Manager, Kubernetes Cloud Controller Manager for any Cloud related stuff like to integrate Kubernetes with AWS Cloud in this case, all through AWS APIs.

The security team wanted the logs for all the workloads (pods) - software applications running on worker nodes and also the logs of the control plane software components

AWS EKS has a feature to ship the AWS EKS Control Plane Logs to AWS CloudWatch, a popular service of AWS for logs, monitoring and some interesting things around it. It’s also a costly service from what I hear and from what I have seen from far.

I don’t think there’s any other integration or way or method to ship AWS EKS Control Plane Logs outside the Control Plane other than the AWS CloudWatch integration. It’s a very smart move by cloud companies I guess, to force and sell the observability stuff like this, without any other options and maybe this also has many pros than cons like cost. Anyways, let’s move on to how to do this

So, once you enable CloudWatch for the AWS EKS Cluster Control Plane, you can choose to ship logs of the different Control Plane software components to CloudWatch. From the console, and from the current docs πŸ“ƒπŸ“„πŸ“‘ https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html, I can see that you can send the logs of API Server, Controller Manager, Scheduler, Audit Logs, Authenticator Logs, that’s all

Once you send the logs to CloudWatch, what next? In our case, the security team at Ola uses Sumo Logic to work with this data πŸ“ˆπŸ“‰πŸ“Š , the log data that is shipped by all the different systems at Ola that are coming under PCI Compliance - which is not hte complete entireity of Ola, just a small part of it

So, security team - we can send Logs to CloudWatch and then send Logs from CloudWatch to S3 and then they will pick it up from there and ingest it into Sumo Logic

So, for this, I got started. I asked the security team to create a secure bucket πŸͺ£ with no access to anyone except just a few and to give me access to it too, to put data, get data

Once I got the S3 bucket, I started following these two documents

Now, note that, once you skim through, you will understand that this is JUST about sending the logs from CloudWatch to S3 just once - given a time πŸ•°οΈ frame - between a from and to timestamp. So, it’s NOT streaming the CloudWatch Logs πŸͺ΅ in real time to S3 - NO, it’s NOT. For that, we need to think of a different solution!

It took me sometime to realize that I’m actually doing this thing across two AWS Ola Accounts. One was where the actual stuff was running and giving out logs πŸͺ΅ and the CloudWatch in that account had the logs, and the other was where the S3 bucket πŸͺ£ got created by the sentinels team, in an AWS account that they manage and own :)

So, basically, I followed the Cross-account export sections. I was trying to do the simplest thing to see if it would work - either using AWS CLI or using the AWS Console. I did it all using AWS Console first on one account (S3 bucket AWS account), with CLI for the other account (CloudWatch Logs AWS account). Basically, I used a mix. I could have used just one of these, ideally the AWS CLI, but I didn’t. In the case of the CloudWatch Logs πŸͺ΅ AWS account - I just had AWS CLI (v2) access with AWS Access Keys πŸ”πŸ”‘. I could have checked if I can create a temporary user with AWS Console access etc and used that, but I didn’t - I didn’t wanna spend time on that over there. And I didn’t wanna create AWS Access Keys for the S3 bucket AWS account since I didn’t wanna leak anything even by mistake. I just ended up using the free πŸ†“ AWS CloudShell for the S3 bucket AWS account, to do anything over there, securely :)

I first figured out what’s the CloudWatch thing I need to ship to S3. After some playing around, I noticed there are two AWS CLI (v2) commands around AWS CloudWatch. One is aws cloudwatch which seems obvious, and the other one is aws logs. You can use aws logs to tail logs which is a thing that Sentinels team was asking - if they can get the logs in the terminal directly from CloudWatch, using APIs, or say in a CLI. And yes πŸ‘πŸ™Œ, you can!

https://awscli.amazonaws.com/v2/documentation/api/latest/reference/logs/tail.html - aws logs tail command

You can also use aws logs command to list the β€œlog groups” that one has. I’m yet to read more about these, but given the name, it seems like these are groups of logs? Different groups of logs. So, some of the commands I used are below β¬‡οΈβ¬‡οΈπŸ‘‡πŸ‘‡

To list all log groups -

$ aws logs describe-log-groups

For a nice colored output, and no pagination, I used this -

$ aws logs describe-log-groups | cat | jq

cat gets rid of the pagination. It’s the standard cat command and then jq colors the JSON output πŸ˜πŸ˜€πŸ˜„. jq docs πŸ“ƒπŸ“„πŸ“‘ /documentation - https://jqlang.github.io/jq/

I found out the log group I need to focus on. It was -

{
  "logGroupName": "/aws/eks/prod-ap-southeast-1-cp/cluster",
  "creationTime": 1710998380542,
  "retentionInDays": 90,
  "metricFilterCount": 0,
  "arn": "arn:aws:logs:ap-southeast-1:998877665544:log-group:/aws/eks/prod-ap-southeast-1-cp/cluster:*",
  "storedBytes": 115407242988,
  "logGroupClass": "STANDARD",
  "logGroupArn": "arn:aws:logs:ap-southeast-1:998877665544:log-group:/aws/eks/prod-ap-southeast-1-cp/cluster"
}

Once I got this, I moved on to the next thing. By the way, cp is short for control plane, it’s NOT β€œcopy” like the popular Linux command, haha. It’s a convention / short form we use here internally in my team :)

Also, note that the retention period is 90 days here. So, I was planning to move only the last 90 days logs from CloudWatch to S3. I did this activity / task on September 21st 2024, and 90 days before that was June 23rd 2024 I think πŸ€”πŸ’­. I just chose June 24th 2024 for the time being as the start date and 12 am UTC as the start time, and the end date πŸ“…πŸ“† time as September 21st 2024 12 am UTC

I first created a bucket policy on the bucket πŸͺ£, in the account where the bucket was created (which was different from where the logs are being shipped from), like this -

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "s3:GetBucketAcl",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::sample-audit-logs",
      "Principal": {
        "Service": "logs.ap-southeast-1.amazonaws.com"
      },
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": [
            "998877665544"
          ]
        },
        "ArnLike": {
          "aws:SourceArn": [
            "arn:aws:logs:ap-southeast-1:998877665544:log-group:*"
          ]
        }
      }
    },
    {
      "Action": "s3:PutObject",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::sample-audit-logs/*",
      "Principal": {
        "Service": "logs.ap-southeast-1.amazonaws.com"
      },
      "Condition": {
        "StringEquals": {
          "s3:x-amz-acl": "bucket-owner-full-control",
          "aws:SourceAccount": [
            "998877665544"
          ]
        },
        "ArnLike": {
          "aws:SourceArn": [
            "arn:aws:logs:ap-southeast-1:998877665544:log-group:*"
          ]
        }
      }
    },
    {
        "Effect": "Allow",
        "Principal": {
          "AWS": "arn:aws:iam::998877665544:user/sample-uer"
        },
        "Action": "s3:PutObject",
        "Resource": "arn:aws:s3:::sample-audit-logs/*",
        "Condition": {
          "StringEquals": {
              "s3:x-amz-acl": "bucket-owner-full-control"
          }
        }
     }
  ]
}

I applied the above ⬆️ πŸ‘†β˜οΈ bucket πŸͺ£ policy JSON using console but you can do from the command line too :)

ℹ️ Note πŸ—’οΈπŸ“ that, using the exact Log Group’s ARN didn’t work out, so, chuck it. This is for the Resource field in the JSON. Not sure why that didn’t work out. Maybe some issue / bug in what I did. I basically tried to put something like this -

arn:aws:logs:ap-southeast-1:998877665544:log-group:/aws/eks/prod-ap-southeast-1-cp/cluster

That’s the log group’s ARN. I guess the logs each have their own ARN? Not sure, πŸ€” since there’s also this ARN -

arn:aws:logs:ap-southeast-1:998877665544:log-group:/aws/eks/prod-ap-southeast-1-cp/cluster:*

Notice the :* at the end. Maybe I should have used this one ☝️ 1️⃣ with the :* at the end and NOT the one without it.

Let’s move on now :) Then I created an IAM policy using the policy JSON

{
  "Version": "2012-10-17",
  "Statement": [{
          "Effect": "Allow",
          "Action": "s3:PutObject",
          "Resource": "arn:aws:s3:::sample-audit-logs/*"
      }
  ]
}

And attached this IAM policy to my user - basically to the user or role that’s actually gonna perform the action to say start shipping of CloudWatch Logs to S3

So, I did it like this -

$ aws iam create-policy --policy-name cloudwatch-log-exporter --policy-document file:///Users/karuppiah.n/cloudwatch-log-exporter-iam-policy.json
$ aws iam attach-user-policy --user-name karuppiah --policy-arn arn:aws:iam::998877665544:policy/cloudwatch-log-exporter

998877665544 is the ID πŸ†” πŸͺͺ of the AWS Account from where the CloudWatch logs πŸͺ΅ will be exported to the other account where the AWS S3 bucket πŸͺ£ is present

If it worked fine, the command should be successful

Weirdly, it showed me that my user has no policies, but weirdly it all worked fine, the shipping of CloudWatch Logs πŸͺ΅ to S3

$ aws iam list-user-policies --user-name karuppiah

For example, the AWS CLI (v2) command I ran to get the CloudWatch Logs to be shipped to S3 is -

$ aws logs create-export-task --task-name "aws-eks-prod-ap-southeast-1-cp-cluster-2024-09-21" --log-group-name "/aws/eks/prod-ap-southeast-1-cp/cluster" --from 1719230400000 --to 1726920000000 --destination "sample-audit-logs" --destination-prefix "cp-cluster-logs"

1719230400000 is basically - June 24, 2024 12:00:00 PM UTC, in epoch timestamp in milliseconds. Uh, I was aiming for 12 AM, but looks like I made a mistake, lol. I just realized the mistake while writing this blog! LOL!

1726920000000 is basically - September 21, 2024 12:00:00 PM UTC, in epoch timestamp in milliseconds. Again, Uh, I was aiming for 12 AM, but looks like I made a mistake, lol. I just realized the mistake while writing this blog! LOL!

So, those are the from and to timestamps. Note that they are timestamps in milliseconds!! And they are epoch timestamp! You can learn more about epoch timestamps online, say at https://www.epochconverter.com/. Thanks to https://www.epochconverter.com/ for helping me with conversions around date and time to epoch timestamp πŸ•°οΈ

On running the above, it finally gave me a task ID like this

{
    "taskId": "0aj1e275-c109-4fe6-94c3-cd3d52f9d9988"
}

You can use this task ID πŸ†” πŸͺͺ to query and understand the status of the task. It’s an asynchronous task I believe, as it will take a lot of time, and it makes sense :) Example Output of how to check tasks status -

$ aws logs describe-export-tasks --task-id 0aj1e275-c109-4fe6-94c3-cd3d52f9d9988
{
    "exportTasks": [
        {
            "taskId": "0aj1e275-c109-4fe6-94c3-cd3d52f9d9988",
            "taskName": "aws-eks-prod-ap-southeast-1-cp-cluster-2024-09-21",
            "logGroupName": "/aws/eks/prod-ap-southeast-1-cp/cluster",
            "from": 1719230400000,
            "to": 1726920000000,
            "destination": "sample-audit-logs",
            "destinationPrefix": "cp-cluster-logs",
            "status": {
                "code": "RUNNING",
                "message": "Started successfully"
            },
            "executionInfo": {
                "creationTime": 1726862637283
            }
        }
    ]
}

To confirm if everything is good - just go to the S3 bucket πŸͺ£ and see if it has the folder πŸ“ πŸ“‚ named cp-cluster-logs and then see if it has a file named aws-logs-write-test and if it has the content -

Permission Check Successful

Then you are good to go :)

Once the task completes, you should see something like this while describing the task -

$ aws logs describe-export-tasks --task-id 0aj1e275-c109-4fe6-94c3-cd3d52f9d9988
{
    "exportTasks": [
        {
            "taskId": "0aj1e275-c109-4fe6-94c3-cd3d52f9d9988",
            "taskName": "aws-eks-prod-ap-southeast-1-cp-cluster-2024-09-21",
            "logGroupName": "/aws/eks/prod-ap-southeast-1-cp/cluster",
            "from": 1719230400000,
            "to": 1726920000000,
            "destination": "sample-audit-logs",
            "destinationPrefix": "cp-cluster-logs",
            "status": {
                "code": "COMPLETED",
                "message": "Completed successfully"
            },
            "executionInfo": {
                "creationTime": 1726862637283,
                "completionTime": 1726870321063
            }
        }
    ]
}

With that, you can say it’s done. You should be able to say it’s done πŸ‘ β˜‘οΈ βœ… βœ”οΈ. And then of course check the S3 bucket πŸͺ£. It will files in it

The S3 bucket πŸͺ£ will have file structure something like this -

.
└── sample-audit-logs/
    └── cp-cluster-logs/
        β”œβ”€β”€ uuid1/
        β”‚   β”œβ”€β”€ authenticator-uuid2
        β”‚   β”œβ”€β”€ authenticator-uuid3
        β”‚   β”œβ”€β”€ authenticator-uuid4
        β”‚   β”œβ”€β”€ authenticator-uuid5
        β”‚   β”œβ”€β”€ ...
        β”‚   β”œβ”€β”€ kube-apiserver-uuid20
        β”‚   β”œβ”€β”€ kube-apiserver-uuid21
        β”‚   β”œβ”€β”€ kube-apiserver-uuid22
        β”‚   β”œβ”€β”€ kube-apiserver-uuid23
        β”‚   └── ...
        └── aws-logs-write-test

(Thanks to https://tree.nathanfriend.io/ for creating the above tree file structure. And Thanks to https://tools.namlabs.com/uuid-validation/ for helping with validating if a string is a UUID or not :))

Where sample-audit-logs is the S3 bucket πŸͺ£

Note that the uuids are I think version 4 UUID and are of course random, unique and different.

Β