Monday, August 14, 2023

Transfer OCI Audit logs to SIEM tooling

Goal

We want to transfer the audit logs from Oracle Cloud Infrastructure (OCI) to the Logz.io SIEM tool.

OCI audits all CRUD operations on objects in the tenancy. All the Create, Update and Delete actions (on any resource) will be captured and sent to the logz.io tool. SELECT (Read) operations are ignored for now.

Unexpectedly, this proved to be harder than the documentation wants you to believe. We did this for "logz.io" as SIEM target, but obviously this can be customized to suit your needs.

Technical flow

Service Connector “Logfile_audit_archive_connector”

Searches audit logfiles for any POST, PUT or DELETE action
Sends the resulting logfile to a bucket

Bucket “logfile_download_bucket”

Receives the audit log files
Emits an event on creation of a new object
Deletes the logfiles after 3 days

Events Service Rule “Process audit file from audit bucket”

Matches “Object - Create” event for the bucket
Calls OCI Function

OCI Function “logzio-from-bucket”

Created on application “logzio”
Custom function that retrieves the audit file from the bucket, transforms it to a usable format and sends it to logzio

Issues found

Several issues were encountered when designing and testing this concept. Some choices in this setup are a result of having to cerate a workaround.

A Service connector directly from Event Logging to a Notification did not work, because the logging from that service did not contain the details of the user modifying the resource.

It seems that this service emits a “v1 JSON event”, whereas we need a “v2 JSON event” format (including the modifying user).

The Log Search Service Connector puts the logfile in gz format on the bucket

The pre-built function “Object Storage File Extractor” to unzip bucket files only operates on zip, not on gz files.
We had to write our custom gunzip part in the function.

Logs are stored in minimized JSON format (without the brackets for a collection and without commas).

These are not stored on a single line per record, but on multiple lines, and having the curly brackets separating two records on the same line. This is a problem.
The JSON converter in Python does not understand the OCI format, so you will need to modify it to get it in a format that Python understands.

A simple OCI notification to call Logz.IO cannot be used, as the HTTP(s) does not allow the use of the required token on the URL.
Logz.IO expects a minimized JSON format (one record per line, no commas, no brackets for a collection).

This is only slightly different than what OCI has, but converting between the two proved a challenge.

Technical details – Bucket

To store the (temporary) audit log files, we create a bucket. The compartment for this bucket can be any compartment, but we recommend using a compartment that contains generic or maintenance related resources.

The bucket “logfile_download_bucket” is created as a private bucket in the Standard Storage tier. Important: enable “Emit Object Events” for this bucket.

Optional: Create a lifecycle rule to delete audit logs older than x days.

Technical details – OCI Application

Create an application to hold your function(s):

From the hamburger menu: Developer Services - Functions - Applications
Select “Create application”

name = “logzio”
VCN = “<yourVCN>”
subnet = “<yourPrivateSubnet>”

Optional: create a separate user to administer the functions:

Create the user

From the hamburger menu: Identity & security – Identity - users

Select “Create User”

Select “IAM User”
Name = "logzio_user"

Edit the user after creation

Edit User Capabilities

Select only "auth token"

Select “Auth Tokens” from the menu on the left

Select “Generate Token”
Save the generated value for later (it will only be displayed once)

Create a group

From the hamburger menu: Identity & security – Identity – Groups

Select “Create Group”

Name = “function_admins”

Select “Add User to Group”

Add the user “logzio_user” to the new group

Create a policy

From the hamburger menu: Identity & security – Identity – Policies

Select “Create Policy”

Name = “Functions_Repo_Policy”
Compartment = “<root>”
Policy statements

Allow group function_admins to read objectstorage-namespaces in tenancy
Allow group function_admins to manage repos in tenancy

Create the function

Your application “logzio” has an entry “Getting Started” in the menu on the left of the application page. Follow these steps with some modifications:

Launch Cloud shell
Use the context for your region

fn list context
fn use context eu-amsterdam-1

Update the context with the function's compartment ID

fn update context oracle.compartment-id <your Compartment OCID>

Provide a unique repository name prefix

fn update context registry <region-key>.ocir.io/<tenancy-namespace>/logzio

Generate an auth token

Already done with separate user “logzio_user”
If you did not create a separate user in the previous steps, generate an Auth Token for your personal user

docker login -u '<tenancy-namepsace>/logzio_user' <region-key>.ocir.io
password = Token from earlier step

Verify the setup

fn list apps

Generate a 'hello-world' boilerplate function

fn init --runtime python logzio-from-bucket

Switch into the generated directory

cd logzio-from-bucket

You can now modify the “func.py” and “requirement.txt” files to set up the actual function, or you can continue with the deployment of this template function to test it out.

Deploy your function

fn -v deploy --app logzio

Invoke your function

fn invoke logzio logzio-from-bucket

Under "Configuration" in the OCI console for this function, you can add key-value pairs. Add one for the bucket:

input-bucket = logfile_download_bucket

Technical details – OCI function

Modify the file “requirements.txt”:

fdk>=0.1.59

oci

requests

Modify the file “func.py” (disclaimer: this is PoC code, so use at your own peril 😉)

import io

import json

import logging

import oci

import gzip

import requests

from fdk import response

def handler(ctx, data: io.BytesIO = None):

input_bucket = ""

try:

cfg = ctx.Config()

input_bucket = cfg["input-bucket"]

logzio_url = cfg["logzio-url"]

logzio_token = cfg["logzio-token"]

except (Exception) as e:

logging.getLogger().info('Error getting context details: ' + str(e))

return response.Response(

ctx, response_data=json.dumps(

{"message": "Error getting context details: " + str(e) }),

headers={"Content-Type": "application/json"}

)

try:

body = json.loads(data.getvalue())

object_name = body["data"]["resourceName"]

namespace = body["data"]["additionalDetails"]["namespace"]

except Exception as e:

return response.Response(

ctx, response_data=json.dumps(

{"message": ": ERROR: During get event details: " + str(e) }),

headers={"Content-Type": "application/json"}

)

signer = oci.auth.signers.get_resource_principals_signer()

client = oci.object_storage.ObjectStorageClient(config={}, signer=signer)

try:

audit_data = client.get_object(namespace, input_bucket, object_name)

audit_bytesio = io.BytesIO(audit_data.data.content)

z = gzip.GzipFile(fileobj=audit_bytesio,mode='rb')

audit_data_text = z.read()

z.close()

except Exception as e:

logging.getLogger().info("ERROR: During load data: " + str(e))

raise

try:

url_string = 'https://'+logzio_url+'/?token='+logzio_token+'&type=http-bulk'

data_string = '[' + audit_data_text.decode('utf-8') + ']'

data_string = data_string.replace('\n','\n,')

data_string = data_string.replace('}{','},{')

json_string = json.loads(data_string)

logzio_string = ''

for record in json_string:

logzio_string += json.dumps(record) + '\n'

except Exception as e:

logging.getLogger().info("ERROR: During JSON formatting: " + str(e))

raise

try:

resp = requests.post(url_string, data=logzio_string)

if resp.status_code != 200:

logging.getLogger().info(resp.text)

raise Exception("Unexpected HTTP status code received")

except Exception as e:

logging.getLogger().info("ERROR: During LogzIO HTTP call: " + str(e))

raise

return response.Response(

ctx, response_data=json.dumps(

{"message": "Success"}),

headers={"Content-Type": "application/json"}

)

Redeploy the application

Technical details – Service Connector

With the bucket in place, create a Service Connector to place the audit logfiles.

From the hamburger menu: Observability & Management – Logging – Service Connectors
Select “Create Service Connector”

Connector Name = “Logfile_audit_archive_connector”
Resource compartment = “<yourCompartment>”
Source = “Logging”

Compartment = “<root>”
Log Group = “_Audit”

Check “Include _Audit in subcompartments”

Query code editor = “search "<tenancyOCID>/_Audit_Include_Subcompartment" | (data.request.action='POST' or data.request.action='PUT' or data.request.action='DELETE')”

Target = “Object Storage”

Compartment = “<yourCompartment>”
Bucket = “logfile_download_bucket”
Object Name Prefix = “audit”

The Service Connector will create a policy for itself to access the right resources.

Technical details – Rule

Last step is to connect the bucket to the function using an Events Service Rule.

From the hamburger menu: Observability & Management – Events Service – Rules
Select “Create Rule”

Display Name = “Process audit file from audit bucket”
Rule 1

Condition = “Event Type”
Service Name = “Object Storage”
Event Type = “Object - Create”

Rule 2

Condition = “Attribute”
Attribute Name = “CompartmentName”
Attribute Values = “<yourCompartment>”

Rule 3

Condition = “Attribute”
Attribute Name = “bucketName”
Attribute Values = “logfile_download_bucket”

Actions

Action Type = “Function”
Function Compartment = “<yourCompartment>”
Function Application = “logzio”
Function = “logzio-from-bucket”

After this step, logs should start flowing from Audit logs to your SIEM tooling.

Flow visualization

Thursday, June 15, 2023

RMAN - Validate plus Preview - the optimal combination?

For a long time, we have worked with VALIDATE and PREVIEW in RMAN to check the backups we created for completeness and consistency. There is a problem that neither of these commands covers the full spectrum of what you want to check. Reviewing this way of working, I (finally) improved the automation of these checks.

Basically, when using Preview, it lists all the files it needs to get to the specified point for restore/recover. On the other hand, Validate actually checks if files are available and usable. The drawback of Preview is that you don't check the files to be actually there. The drawback of Validate is that it only checks the files to restore, not any files needed after that to do a recovery. Only by using a combination of the two outputs, you can get a higher degree of confidence that you have all the files necessary and that they are actually usable.

I came across this statement (which can be augmented with a "summary" to shorten the output):

RMAN> RESTORE DATABASE VALIDATE PREVIEW;

It first selects all files needed to restore and recover, and then it checks the files with validate. That means that you don't need two commands and compare the output, but you can just use this single command. That works fine for Level 0 backups, archivelog backups and separate archives (those not yet in backup). Problem is that any Level 1 backups that are listed by Preview, are not actually validated.

Disclaimer: This syntax (validate and preview combined) is not documented like this, but it seems to do the trick. I will (cautiously) use this from now on and see if this holds up in the real world...

Checking the level 1 backups

With the output from the Preview, it is easy to write a script that extracts all the Level 1 backups and generate an RMAN script to validate all backupsets from the Level 1 backups. The output from that check plus the original output gives you a higher level of confidence in your backups. You now know that your Level 0, Level 1, archivelog backups and archivelogs are all present and usable in case of a restore/recover scenario.

Checking recoverability - check your SCNs

When using the "preview" command, somewhere at the end of the preview section output, it will say something like this:

recovery will be done up to SCN 2482924
Media recovery start SCN is 2482924
Recovery must be done beyond SCN 2484433 to clear datafile fuzziness

Here you can see that the "recovery up to SCN" is equal to the "Start SCN", while it should be higher than the "Recovery beyond SCN" to get to a consistent state. That's not good. However, this can easily be cleared by using an UNTIL TIME clause. It will do the same file checks, but the values displayed in the summary change. This makes it easier to (automatically) check if you are getting what you want.

RMAN> RESTORE DATABASE UNTIL TIME "sysdate" VALIDATE PREVIEW SUMMARY;

recovery will be done up to SCN 2641133
Media recovery start SCN is 2482924
Recovery must be done beyond SCN 2484433 to clear datafile fuzziness

Much better. It doesn't change the files it checks, but now the logging is clearer.

The ultimate RMAN check?

You could even add one final touch: check the logical sanity of your files:

RESTORE DATABASE UNTIL TIME "sysdate" VALIDATE CHECK LOGICAL PREVIEW SUMMARY;

Putting it all together - scripting your validation

A rather simple script can be made to give you the validation and some checks. Can probably be done much cleaner and nicer, but I didn't really focus on the scripting syntax yet. You can easily guess the contents of the ".rmn" script used here...

Some assumptions apply, like that the ORACLE_SID has been set before the script is run. Be aware that some modifications will probably be needed before using this on your own environment.

This version has some basic checks:

RMAN error checking
SCN validation (check if restore/recover goes beyond the "clear fuzziness" SCN)
Level 1 backup validation

#!/bin/sh
# -----------------------------------------------
# RMAN validate backups
# -----------------------------------------------
. /home/oracle/.bash_profile
export NLS_DATE_FORMAT="dd-mm-yyyy hh24:mi:ss"
export LOGFILE=rman_validate_preview.log
export BASEDIR=$(dirname "$0")
#
cd $BASEDIR
echo $(date +'%d-%m-%Y %H:%M:%S') " - starting backup validate" > $LOGFILE
echo $(date +'%d-%m-%Y %H:%M:%S') " - creating RMAN output files" >> $LOGFILE
#
# Call the rman script and write output to separate file
#
echo $(date +'%d-%m-%Y %H:%M:%S') " - RMAN validate preview" >> $LOGFILE
rman target / @validate_preview.rmn log=validate_preview.log
#
# Check level 1 backups (if any)
# Also check controlfile restore
# (comes in handy when no level 1 backups exist to prevent an empty run block)
#
LEVEL1CMDFILE=validate_level_1.rmn
echo "run {" > $LEVEL1CMDFILE
echo "restore controlfile validate;" >> $LEVEL1CMDFILE
awk '/ B 1 / {print "validate backupset " $1 ";"}' validate_preview.log >> $LEVEL1CMDFILE
echo "}" >> $LEVEL1CMDFILE
#
echo $(date +'%d-%m-%Y %H:%M:%S') " - RMAN validate controlfile and level 1 backups" >> $LOGFILE
rman target / @$LEVEL1CMDFILE log=validate_level_1.log
#
# check the output for errors
#
echo $(date +'%d-%m-%Y %H:%M:%S') " - Check output for errors" >> $LOGFILE
ERROR_COUNT=$(grep RMAN- validate_*.log | wc -l)
if [ $ERROR_COUNT -eq 0 ]
then
echo "No errors found in validation" >> $LOGFILE
else
echo "Errors found in validation" >> $LOGFILE
echo "--------------------------------" >> $LOGFILE
grep RMAN- validate_*.log >> $LOGFILE
echo "--------------------------------" >> $LOGFILE
fi
echo "######################################################" >> $LOGFILE
#
# Check the validate preview for SCN validity
#
echo $(date +'%d-%m-%Y %H:%M:%S') " - Compare SCNs for recovery" >> $LOGFILE
SCN_START=$(awk '/Media recovery start SCN is/ {print $6}' validate_preview.log)
SCN_RECOV=$(awk '/recovery will be done up to SCN/ {print $8}' validate_preview.log)
SCN_FUZZY=$(awk '/Recovery must be done beyond SCN/ {print $7}' validate_preview.log)
echo "Recovery start at SCN $SCN_START and ends at SCN $SCN_RECOV" >> $LOGFILE
if [ $SCN_RECOV -lt $SCN_FUZZY ]
then
echo "ERROR: Recovery ends at $SCN_RECOV, but should be done beyond $SCN_FUZZY" >> $LOGFILE
else
echo "Recovery ends at $SCN_RECOV and this is beyond $SCN_FUZZY, as it should be" >> $LOGFILE
fi
echo "######################################################" >> $LOGFILE
#
# Append all output to global logfile
#
echo $(date +'%d-%m-%Y %H:%M:%S') " - Detailed log output" >> $LOGFILE
echo "######################################################" >> $LOGFILE
cat validate_preview.log >> $LOGFILE
echo "######################################################" >> $LOGFILE
cat validate_level_1.log >> $LOGFILE
echo "######################################################" >> $LOGFILE
echo $(date +'%d-%m-%Y %H:%M:%S') " - End of report" >> $LOGFILE

Monday, March 27, 2023

Configure MFA for OCI / IDCS - how to make sure you have a plan B

When you want to configure MFA on oracle Cloud Infrastructure (OCI), using ICDS, there is a good starting point in the OCI documentation. This document contains a warning (prerequisite number 5), but could have been a little more emphasized:

"If you don’t register this client application and a Sign-On Policy configuration restricts access to everyone, then all users are locked out of the identity domain until you contact Oracle Support."

I recently had myself locked out of OCI and IDCS (and everyone else along with me), and this prerequisite step helped me through it. There were two challenges, though. The document does not tell you how to actually solve the problem of locking yourself out and it doesn't really explain why I locked myself out in the first place. Here, I will focus on the first part: how to solve the problem once you locked yourself out. It is nice to have this Emergency Access, but how do you use it?

Please note: if you skipped "prerequisite 5" (and did not create an OAuth2 application), this blog does not in fact really help. You will probably need Oracle Support in that case...

Prerequisite - Register a Client Application

So, very important step. Can't repeat it enough: if you want to enable MFA on IDCS, make sure this prerequisite is met. As mentioned in the documentation, use this link to Oracle by Example to set up your Emergency Access application. That's what I call it, as that is what it is for me. Should you lock yourself (and everyone else) out of OCI/IDCS, use this setup to get back in.

The Oracle by Example instructions are pretty clear on how to configure and use Postman. Due to some newer versions, some instructions may be a bit outdated, but all in all it works pretty well. Step 1 is done in IDCS and works as described, that is where you configure the Emergency Access application. Note the client ID and secret and store them in a safe location.

Step 2 and 3 are about getting Postman configured with a (full) Environment and Collection to work with IDCS. Step 4 is all about getting an actual OAuth2 token. Very important, make sure that that part works. In my version of Postman config, I needed to use "On the Collections tab, expand OAuth, then Tokens and select Obtain access_token (client credentials)" and then press "Send" to get the actual token.

Even more important: make sure that with this token, you can access your IDCS environment. I like to do that with "On the Collections tab, expand Users, then Search and select List all users" and after pressing "Send", you should see all your users as a result. If not: don't proceed, but troubleshoot first! Make sure you have the access working, as this might be your last resort at some point.

How does this save you?

Now we know we can access IDCS, even if our MFA policy would happen to shut us out with the message: "Sign-on policy denies access". If that happens, the steps to solve this are quite easy if you know what to do. First of all: I assume you will have created a separate policy in IDCS for this, and not re-use the default one. As a matter of fact: I highly recommend not messing with the default one.

Even if you are locked out, you can retrieve the list of policies in IDCS with Postman. The value for this policy can be obtained from "Collections tab, expand Policies, then Search and select List all policies". All policies have an id. Find the one for your MFA policy and either note it down somewhere, or better yet: store it in a new custom variable in your Postman Environment. I used "MFA_Policy_id" for this.

With the Policy ID, you can go to "On the Collections tab, expand Policies, then Policy, Modify and choose to duplicate Patch Update a Policy". Rename the duplicate to (for example) "Disable MFA Policy".

Replace the "{{id}}" in the URI with "{{MFA_Policy_id}}" (or just paste the id itself, if you didn't create a variable).

In the body of this new entry, paste:

{
  "schemas": [
  "urn:ietf:params:scim:api:messages:2.0:PatchOp"
],
  "Operations": [
{
  "op": "replace",
  "path": "active",
  "value": false }
]
}

Executing this request disables your new policy (literally setting "active" to "false"), allowing you to log in once again. That is why I leave the default policy as it is. It will now be the only one in effect and allows access to verified users.

Now, reconfigure the MFA policy, enable it again and see if it works this time. If not: disable the policy and try again. And remember: as long as you stay signed in to IDCS, you can disable the policy from the console as long as your connections stays valid. And should you get locked out (again), you will always have a plan B!