Directly Automate the Export of PostgreSQL logs to Amazon S3

PostgreSQL-logs-to-Amazon-S3

Now you can automate the PostgreSQL logs using extensions to Amazon S3.

For the PostgreSQL instance –Amazon RDS is used. With access to your AWS account, install & configure the AWS command line interface. Now let’s create an RDS for the PostgreSQL instance, if you need assistance in creating/connecting to a PostgreSQL Database with Amazon RDS – click here.

Then you need to launch Amazon EC2 instance so that you can access the database instance with psql client installed and configured.

To store the logs exported from the database, you need to create an S3 bucket.

To provide access to the S3 bucket, create IAM roles and policies and attach the role to your RDS for the PostgreSQL instance.

Now let’s create the PostgreSQL extensions in your RDS for the PostgreSQL database. You will need the extensions pg_cron ,log_fdw, aws_s3.

  • Modify the database parameter group and add pg_cron to the shared_preload_libraries parameter, and then restart the database.
  • The log_fdw is used to load all the available RDS for PostgreSQL, as a table.
  • Use aws_s3 to query data from your RDS for the PostgreSQL DB instance & export it directly into files stored in an S3 bucket. (Note: ensure you have created the S3 bucket & IAM role, and then attached the role to the DB instance).

Ensure you have done the above steps beforehand. Now let’s show how it works with extensions.

Steps:

1)        Creating custom parameter groups & allow extension support

awsrds create-db-parameter-group --db-parameter-group-name log-exporting-pg--db-parameter-group-family postgres12 --description 'Parameter group for allowing automated log exporting to s3 from RDS'

{ 
"DBParameterGroup": 
{
"DBParameterGroupArn":
  "arn:aws:rds:us-east-1:<AWS-ACCOUNT-ID>:pg:log-exporting-pg",
  "DBParameterGroupName": "log-exporting-pg",  "DBParameterGroupFamily": "postgres12",
  "Description": "Parameter group for allowing automated log
  exporting to s3 from RDS"   
 }
   }
awsrds modify-db-instance --db-instance-identifier log-stream-demo-instance --db-parameter-group-name log-exporting-pg

"DBParameterGroups": [
  {
   "DBParameterGroupName": "log-exporting-pg",
    "ParameterApplyStatus": "applying"
   }
 ]

To modify the parameter:

To modify the parameter:

awsrds modify-db-parameter-group --db-parameter-group-name log-exporting-pg--parameters
"ParameterName=shared_preload_libraries,ParameterValue='pg_cron',ApplyMethod=pending-reboot"--parameters
"ParameterName=log_destination,ParameterValue='csvlog',ApplyMethod=pending-reboot"

 {
   "DBParameterGroupName": "log-exporting-pg"
 }

Now to restart:

awsrds reboot-db-instance --db-instance-identifier log-stream-demo-instance  

"DBParameterGroups": [
 {
    "DBParameterGroupName": "log-exporting-pg",   
    "ParameterApplyStatus": "pending-reboot"
    }
 ],
DBInstanceStatus":"rebooting"

2)         Creating an IAM role and policy, & attached the role to your RDS for the PostgreSQL instance. To set up a bucket for exporting logs.

aws s3api create-bucket --bucket log-export-bucket --region us-east-1
{  
  "Location": "/log-export-bucket"
  }

Create an IAM policy

Note: policy should have access to the Amazon S3 bucket where the files are stored (here it is in; log-export-bucket).

{

"Version": "2012-10-17",
 "Statement": [
     {
    "Effect": "Allow",
     "Action": [
       "s3:GetObject",
        "s3:AbortMultipartUpload",
       "s3:DeleteObject",
      "s3:ListMultipartUploadParts",
      "s3:PutObject",
      "s3:ListBucket"
   ],
    "Resource": [
     "arn:aws:s3:::log-export-bucket/*",
       "arn:aws:s3:::log-export-bucket"
   ]
   }
  ]
}
awsiam create-policy --policy-name postgres-s3-access-policy --policy-document "$(cat accessPolicyDocument.txt)"
{
   "Policy": {
     "PolicyName": "postgres-s3-access-policy",
    "PermissionsBoundaryUsageCount": 0,
    "CreateDate": "",   
    "AttachmentCount": 0,
    "IsAttachable": true,
    "PolicyId": "",
    "DefaultVersionId": "v1",
     "Path": "/",   
    "Arn": "arn:aws:iam::000123456789:policy/postgres-s3-access-policy",
    "UpdateDate": ""   
}
}

Now create an IAM role – postgres-s3-export-role & modify the trust relationship.

{
 "Version": "2012-10-17",
 "Statement": [
      {
      "Effect": "Allow",
      "Principal": {
        "Service": "rds.amazonaws.com"
    },
      "Action": "sts:AssumeRole"
    }
   ]
}

awsiam create-role --role-name postgres-s3-export-role --assume-role-policy-document "$(cat assumeRolePolicyDocument.txt)" 
{
 "Role": {
     "AssumeRolePolicyDocument": {
       "Version": "2012-10-17",
        "Statement": [
         {
             "Action": "sts:AssumeRole",
           "Effect": "Allow",
           "Principal": {
                "Service": "rds.amazonaws.com"              
}
        }
    ]
    },

    "RoleId": "",
    "CreateDate": "",
    "RoleName": "postgres-s3-export-role",
    "Path": "/",
   "Arn": "arn:aws:iam::000123456789:role/postgres-s3-export-role"
   }
}

awsiam attach-role-policy --role-name postgres-s3-export-role --policy-arnarn:aws:iam::000123456789:policy/postgres-s3-access-policy

<No output indicates success>

Lastly, you need to associate the IAM roles to the DB instance and cluster.

awsrds add-role-to-db-instance --db-instance-identifier log-stream-demo-instance  --feature-name s3Export 
--role-arnarn:aws:iam::000123456789:role/postgres-s3-export-role

<No output indicates success>

Import PostgreSQL logs into the table by log_fdw:

First, you have to create the extension on the database instance before you use the log_fdw functions. Execute the command after initiating a psql connection to the database.

postgres=> CREATE EXTENSION log_fdw;

CREATE EXTENSION

After loading the extension, you can write a function that loads every PostgreSQL DB log file that is available as a table in the database. Now execute the function to load the PostgreSQL logs into the database after it is generated. The most recent engine logs are added to the logs.postgres logs table each time the following command is executed.

postgres=> SELECT public.load_postgres_log_files();
 NOTICE: CSV log format will be used.
 NOTICE: Processing log file - postgresql.log.2021-11-19-21.csv
 NOTICE: Processing log file - postgresql.log.2021-11-19-20.csv
              load_postgres_log_files
 --------------------------------------------------
  Postgres logs loaded to table logs.postgres_logs
 (1 row) 

A SELECT command on the logs.postgres logs table can be used to inspect the logs. A query can be tailored to a particular use case by limiting the scope of SELECT operation. The following example only considers engine log error messages that occurred within the previous hour.

SELECT * FROM logs.postgres_logs WHERE message SIMILAR TO '%(INSERT|UPDATE|SELECT|DELETE)%'
AND log_time>= NOW() - INTERVAL '1' HOUR ORDER BY log_time DESC;

Export PostgreSQL logs from the table to Amazon S3:

Based on the prerequisites, if you have an S3 bucket already and an IAM role that allows you to write to your S3 bucket attached to the DB instance. To export the recovered logs to Amazon S3, use aws_s3.

Create the aws s3 extension using the code below:

postgres=> CREATE EXTENSION aws_s3 CASCADE;

CREATE EXTENSION

Execute the query export to Amazon S3 function as follows:

postgres=> SELECT aws_s3.query_export_to_s3('SELECT * FROM logs.postgres_logs WHERE message
SIMILAR TO $Q1$%(INSERT|UPDATE|SELECT|DELETE)%$Q1$ AND log_time>= NOW() - INTERVAL $Q2$1$Q2$ HOUR ORDER BY log_time DESC;', 's3-bucket-name', 'exported-file-name', 's3-bucket-region', options := 'format text');

rows_uploaded | files_uploaded | bytes_uploaded 
 ---------------+----------------+---------------- 
              5 | 1 | 235 
 (1 row) 

Automate log exports by pg_cron:

Now you can use pg_cron to automate log uploads to Amazon S3 using the log_fdw and aws_s3 extensions. Use pg_cron to schedule database queries. As part of the prerequisites, pg_cron should be added to the shared_preload_libraries parameter in the parameter group of your database instance. To create the extension, run the following command after loading pg_cron into shared_preload_libraries:

postgres=> CREATE EXTENSION pg_cron;

CREATE EXTENSION

Use pg_cron to perform PostgreSQL log uploads on a cron-defined schedule. To accomplish this, you must set up a cron job with a name, a schedule, and the log export query you want to run. Now, run the following command to schedule log uploads every hour using the same query as earlier:

 postgres=> SELECT cron.schedule('postgres-s3-log-uploads', '0 * * * *', 'do $$ begin
     PERFORM public.load_postgres_log_files();
     PERFORM aws_s3.query_export_to_s3($Q1$SELECT * FROM 
     logs.postgres_logs WHERE message SIMILAR TO 
    
 $Q2$%(INSERT|UPDATE|SELECT|DELETE)%$Q2$ AND log_time >= NOW()
     - INTERVAL $Q3$1$Q3$ HOUR ORDER BY log_time DESC;$Q1$, 
     $Q4$s3-bucket-name$Q4$, $Q5$exported-file-name$Q5$, 
     $Q6$s3-bucket-region$Q6$, options := $Q7$format text$Q7$);
 end $$;');
  
  schedule
 ----------
         1
 (1 row) 

When the uploads are scheduled, check the cron.job table to see the scheduled details:

postgres=# SELECT * FROM cron.job WHERE jobname = 'postgres-s3-log-uploads';
jobid | schedule | command | nodename | nodeport | database | username | active | jobname  
      
 -------+-----------+---------+-----------+----------+----------+----------+-------+-----------
    1  

| 0 * * * * | do $$ begin PERFORM public.load_postgres_log_files(); PERFORM aws_s3.query_export_to_s3($Q1$SELECT * FROM logs.postgres_logs WHERE message SIMILAR TO $Q2$%(INSERT|UPDATE|SELECT|DELETE)%$Q2$ AND log_time >= NOW() - INTERVAL $Q3$1$Q3$ HOUR ORDER BY log_time DESC;$Q1$, $Q4$s3-bucket-name$Q4$, $Q5$exported-file-name$Q5$,Q6$s3-bucket-region$Q6$, options := $Q7$format text$Q7$); end $$; | localhost | 5432 

 | postgres | rdsadmin | t | postgres-s3-log-uploads 
 (1 row) 
      

By querying the pg_cron audit table, you can also check the run history of this cron job. This information includes the duration of each upload, success or failure, and other details. Check the following code:

postgres=> SELECT * FROM cron.job_run_details WHERE jobid IN (SELECT jobid FROM cron.job WHERE
jobname = 'postgres-s3-log-uploads');
jobid | runid | job_pid | database | username | command | status | return_message | start_time | end_time
 ------+-------+---------+----------+----------+---------+--------+------------------------+------------+---------- 
 (0 rows) 

If you want to cancel these automated log uploads at any time, you can do so by running the associated cron job and entering the job name. in the following code, the job name is postgres-s3-log-uploads:

postgres=> SELECT cron.unschedule('postgres-s3-log-uploads'); 
 unschedule 
 ------------ 
  t 
 (1 row) 

Hope this helped you to directly automate the export of PostgreSQL logs to Amazon S3. If you need any assistance feel free to reach us.

To get more updates you can follow us on Facebook, Twitter, LinkedIn

Subscribe to get free blog content in your Inbox
Loading

Written by actsupp-r0cks