Skip to content

Guide to RDAF Start, Stop Operations

This section explains how to safely start and stop the RDAF infrastructure, platform, application and worker services.

1. Starting RDAF Services

Login into RDAF platform VM using SSH client as rdauser for CLI access and start the below RDAF services in sequence.

  • Infrastructure Services
  • Platform Services
  • Worker Services
  • Application Services

Starting RDAF infrastructure services:

rdaf infra up

Verify RDAF infrastructure services status and make sure all of them are up & running.

rdaf infra status
+----------------+-----------------+---------------+--------------+--------------------------+
| Name           | Host            | Status        | Container Id | Tag                      |
+----------------+-----------------+---------------+--------------+--------------------------+
| haproxy        | 192.168.125.143 | Up 41 seconds | b68f8335d8ff | 1.0.1                    |
| haproxy        | 192.168.125.144 | Up 41 seconds | 9df14432767c | 1.0.1                    |
| keepalived     | 192.168.125.143 | active        | N/A          | N/A                      |
| keepalived     | 192.168.125.144 | active        | N/A          | N/A                      |
| nats           | 192.168.125.143 | Up 38 seconds | 4f1413239096 | 1.0.1                    |
| nats           | 192.168.125.144 | Up 38 seconds | 0762f5ef3d5e | 1.0.1                    |
| minio          | 192.168.125.143 | Up 37 seconds | c93731b02f95 | RELEASE.2022-05-08T23-50 |
|                |                 |               |              | -31Z                     |
| minio          | 192.168.125.144 | Up 37 seconds | 1b2b545cbd4a | RELEASE.2022-05-08T23-50 |
|                |                 |               |              | -31Z                     |
| minio          | 192.168.125.145 | Up 37 seconds | 289f96a2832e | RELEASE.2022-05-08T23-50 |
|                |                 |               |              | -31Z                     |
| minio          | 192.168.125.146 | Up 36 seconds | f6571bd5e000 | RELEASE.2022-05-08T23-50 |
|                |                 |               |              | -31Z                     |
| mariadb        | 192.168.125.143 | Up 36 seconds | 4e5ca8860c87 | 1.0.1                    |
| mariadb        | 192.168.125.144 | Up 35 seconds | 2c5a4986a6c1 | 1.0.1                    |
| mariadb        | 192.168.125.145 | Up 35 seconds | cf6656241efa | 1.0.1                    |
| opensearch     | 192.168.125.143 | Up 34 seconds | b04ece438490 | 1.0.1                    |
| opensearch     | 192.168.125.144 | Up 34 seconds | ab53cf0abf6d | 1.0.1                    |
| opensearch     | 192.168.125.145 | Up 34 seconds | 7c75c0cffe4a | 1.0.1                    |
| zookeeper      | 192.168.125.143 | Up 33 seconds | 14b23a0ce5d3 | 1.0.1                    |
| zookeeper      | 192.168.125.144 | Up 33 seconds | 51630587c9c2 | 1.0.1                    |
| zookeeper      | 192.168.125.145 | Up 32 seconds | 1eca7a3a0f70 | 1.0.1                    |
| kafka          | 192.168.125.143 | Up 11 seconds | 0278470dd416 | 1.0.1                    |
| kafka          | 192.168.125.144 | Up 12 seconds | ab3e888056a7 | 1.0.1                    |
| kafka          | 192.168.125.145 | Up 30 seconds | 972b78f159c3 | 1.0.1                    |
| redis          | 192.168.125.143 | Up 30 seconds | 4d3dbd1111f7 | 1.0.1                    |
| redis          | 192.168.125.144 | Up 29 seconds | abe4da626997 | 1.0.1                    |
| redis          | 192.168.125.145 | Up 29 seconds | 9fe580fa5e81 | 1.0.1                    |
| redis-sentinel | 192.168.125.143 | Up 28 seconds | c054e0bcf113 | 1.0.1                    |
| redis-sentinel | 192.168.125.144 | Up 28 seconds | a66fe0d2bdf3 | 1.0.1                    |
| redis-sentinel | 192.168.125.145 | Up 27 seconds | ac523a8c6ffb | 1.0.1                    |
+----------------+---------------+---------------+--------------+--------------------------+

Info

Note: Please wait for 60 seconds before starting RDAF platform services

Starting RDAF platform services:

rdaf platform up

Verify RDAF platform services status and make sure all of them are up & running.

rdaf platform status
+--------------------------+-----------------+---------------+--------------+---------+
| Name                     | Host            | Status        | Container Id | Tag     |
+--------------------------+-----------------+---------------+--------------+---------+
| cfx-rda-access-manager   | 192.168.125.141 | Up 42 seconds | e4f20012a888 | 3.0.5.8 |
| cfx-rda-resource-manager | 192.168.125.141 | Up 41 seconds | 52bd03970a53 | 3.0.5.8 |
| cfx-rda-user-preferences | 192.168.125.141 | Up 41 seconds | 289e90b70b85 | 3.0.5.8 |
| portal-backend           | 192.168.125.141 | Up 41 seconds | 1887eb44d63d | 3.0.5.8 |
| portal-frontend          | 192.168.125.141 | Up 40 seconds | 75fd3f691ad8 | 3.0.5.8 |
| rda_api_server           | 192.168.125.141 | Up 39 seconds | fcbbca53641f | 3.0.5.8 |
| rda_asm                  | 192.168.125.141 | Up 38 seconds | f931d1e748ae | 3.0.5.8 |
| rda_asset_dependency     | 192.168.125.141 | Up 37 seconds | e68e03eabe78 | 3.0.5.8 |
| rda_collector            | 192.168.125.141 | Up 36 seconds | 3c65bad1e013 | 3.0.5.8 |
| rda_identity             | 192.168.125.141 | Up 35 seconds | 94d67dcb82b9 | 3.0.5.8 |
| rda_registry             | 192.168.125.141 | Up 34 seconds | 752a0d8dd352 | 3.0.5.8 |
| rda_sched_admin          | 192.168.125.141 | Up 33 seconds | eabc9a908afb | 3.0.5.8 |
| rda_scheduler            | 192.168.125.141 | Up 32 seconds | 1b136bac290f | 3.0.5.8 |
+--------------------------+---------------+---------------+--------------+---------+

Info

Note: Please wait for 60 seconds before starting RDAF worker services

Starting RDAF worker services:

rdaf worker up

Verify RDAF worker services status and make sure all of them are up & running.

rdaf worker status
+------------+-----------------+--------------+--------------+---------+
| Name       | Host            | Status       | Container Id | Tag     |
+------------+-----------------+--------------+--------------+---------+
| rda_worker | 192.168.125.149 | Up 30 seconds | 8a933d1b82df | 3.0.5.8 |
| rda_worker | 192.168.125.150 | Up 35 seconds | 2a934r1b52dw | 3.0.5.8 |
+------------+---------------+--------------+--------------+---------+

Starting RDAF application services:

To start OIA application services

rdaf app up OIA

Verify RDAF application services status and make sure all of them are up & running.

rdaf app status
+--------------------------+-----------------+---------------+--------------+---------+
| Name                     | Host            | Status        | Container Id | Tag     |
+--------------------------+-----------------+---------------+--------------+---------+
| all-alerts-cfx-rda-      | 192.168.125.146 | Up 40 seconds | d9aed36ddf4b | 7.0.0.0 |
| dataset-caas             |                 |               |              |         |
| cfx-rda-alert-ingester   | 192.168.125.146 | Up 39 seconds | ef4f031a7b45 | 7.0.0.0 |
| cfx-rda-alert-processor  | 192.168.125.146 | Up 38 seconds | de9de2959dce | 7.0.0.0 |
| cfx-rda-app-builder      | 192.168.125.146 | Up 38 seconds | 438b53f06c61 | 7.0.0.0 |
| cfx-rda-app-controller   | 192.168.125.146 | Up 37 seconds | 2cb10582f881 | 7.0.0.0 |
| cfx-rda-collaboration    | 192.168.125.146 | Up 36 seconds | 407055e4b862 | 7.0.0.0 |
| cfx-rda-configuration-   | 192.168.125.146 | Up 35 seconds | b7b08bcb923e | 7.0.0.0 |
| service                  |                 |               |              |         |
| cfx-rda-event-consumer   | 192.168.125.146 | Up 35 seconds | 73ef798cf0bf | 7.0.0.0 |
| cfx-rda-file-browser     | 192.168.125.146 | Up 34 seconds | 12135eeccb2d | 7.0.0.0 |
| cfx-rda-ingestion-       | 192.168.125.146 | Up 33 seconds | a2010475d060 | 7.0.0.0 |
| tracker                  |                 |               |              |         |
| cfx-rda-irm-service      | 192.168.125.146 | Up 32 seconds | 0e969df37ad0 | 7.0.0.0 |
| cfx-rda-ml-config        | 192.168.125.146 | Up 31 seconds | c907949bff1d | 7.0.0.0 |
| cfx-rda-notification-    | 192.168.125.146 | Up 31 seconds | 215c67affb68 | 7.0.0.0 |
| service                  |                 |               |              |         |
| cfx-rda-reports-registry | 192.168.125.146 | Up 30 seconds | 21828b867a03 | 7.0.0.0 |
| cfx-rda-smtp-server      | 192.168.125.146 | Up 29 seconds | ee6c90d25afe | 7.0.0.0 |
| cfx-rda-webhook-server   | 192.168.125.146 | Up 28 seconds | 4659fe639e3c | 7.0.0.0 |
| current-alerts-cfx-rda-  | 192.168.125.146 | Up 27 seconds | 9c6d30851fe3 | 7.0.0.0 |
| dataset-caas             |                 |               |              |         |
+--------------------------+---------------+---------------+--------------+---------+

2. Stopping RDAF Services

Login into RDAF platform VM using SSH client as rdauser for CLI access and stop the below RDAF services in sequence.

  • Application Services
  • Worker Services
  • Platform Services
  • Infrastructure Services

To stop RDAF OIA application services, run the below command. Wait until all of the services are stopped.

rdaf app down OIA

To stop RDAF worker services, run the below command. Wait until all of the services are stopped.

rdaf worker down

To stop RDAF platform services, run the below command. Wait until all of the services are stopped.

rdaf platform down

To stop RDAF infrastructure services, run the below command. Wait until all of the services are stopped.

rdaf infra down

3. MariaDB Cluster Service

MariaDB is a relational database service that is used to store RDAF platform's user configuration, platform & application service configuration and it's data. RDAF applications such as OIA and AIA uses MariaDB to store alerts, incidents, asset inventory data etc. MariaDB supports high availability natively and it can be deployed as Master/Slave or Master/Master configuration using the Galera clustering feature. Within the RDAF platform, MariaDB is deployed as Master/Master (Galera cluster) node configuration. MariaDB service is containerized and configured in a specific way to be compatible with RDAF platform and its application services.

For detailed general documentation, please refer to About MariaDB and About Galera Cluster

MariaDB database mount point on each cluster node:

  • Data mount point: /var/mysql

  • DB service logs path: /opt/rdaf/logs/mariadb/mariadb.log

MariaDB Galera Cluster graceful start & stop sequence:

Run the below RDAF CLI command from the VM it was installed to start 3 node MariaDB cluster service.

rdaf infra up --service mariadb

Run the below RDAF CLI command to check mariadb service's UP status.

rdaf infra status | grep mariadb
| mariadb        | 192.168.125.143 | Up 4 weeks | ebcc659a4e07 | 1.0.1              |
| mariadb        | 192.168.125.144 | Up 4 weeks | 89607a3feb76 | 1.0.1              |
| mariadb        | 192.168.125.145 | Up 4 weeks | 482cb9c1e3b3 | 1.0.1              |

Run the below RDAF CLI command to check mariadb service's functional health check status.

rdaf infra healthcheck | grep mariadb
2022-10-28 20:52:31,926 [rdaf.cmd.infra] INFO     - Running Health Check on mariadb on host 192.168.125.143
2022-10-28 20:52:32,313 [rdaf.cmd.infra] INFO     - Running Health Check on mariadb on host 192.168.125.144
2022-10-28 20:52:32,657 [rdaf.cmd.infra] INFO     - Running Health Check on mariadb on host 192.168.125.145
| mariadb        | Port Connection | OK     | N/A             | 192.168.125.143 | ebcc659a4e07 |
| mariadb        | Service Status  | OK     | N/A             | 192.168.125.143 | ebcc659a4e07 |
| mariadb        | Firewall Port   | OK     | N/A             | 192.168.125.143 | ebcc659a4e07 |
| mariadb        | Port Connection | OK     | N/A             | 192.168.125.144 | 89607a3feb76 |
| mariadb        | Service Status  | OK     | N/A             | 192.168.125.144 | 89607a3feb76 |
| mariadb        | Firewall Port   | OK     | N/A             | 192.168.125.144 | 89607a3feb76 |
| mariadb        | Port Connection | OK     | N/A             | 192.168.125.145 | 482cb9c1e3b3 |
| mariadb        | Service Status  | OK     | N/A             | 192.168.125.145 | 482cb9c1e3b3 |
| mariadb        | Firewall Port   | OK     | N/A             | 192.168.125.145 | 482cb9c1e3b3 |

The above command brings up each MariaDB Node in sequential order. It brings up Node01 first to bootstrap the cluster and starts the Node02 & Node03 subsequently to join the MariaDB Galera cluster.

When Node01 is started first to bootstrap the MariaDB galera cluster, it starts with the below highlighted parameter MARIADB_GALERA_CLUSTER_BOOTSTRAP is set to yes inside the MariaDB docker-compose YAML file. (/opt/rdaf/deployment-scripts/cluster-node-ip/infra.yaml)

  mariadb:
    image: 192.168.125.140:5000/rda-platform-mariadb:1.0.1
    restart: 'no'
    network_mode: host
    mem_limit: 8G
    memswap_limit: 8G
    oom_kill_disable: false
    volumes:
    - /var/mysql:/bitnami/mariadb/data/
    - /opt/rdaf/config/mariadb:/opt/bitnami/mariadb/conf/bitnami/
    - /opt/rdaf/logs/mariadb:/opt/rdaf/log/
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: '5'
    environment:
    - MARIADB_GALERA_MARIABACKUP_USER=rdaf_backup
    ...
    ...
    ...
    - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://192.168.125.143,192.168.125.144,192.168.125.145
    - MARIADB_GALERA_CLUSTER_BOOTSTRAP=yes

Note

Once the MariaDB Galera cluster is functionally up and running, the determination of the bootstrap node next time to start the cluster depends on the shutdown sequence of the cluster nodes. The cluster node which was stopped last should be used to bootstrap the MariaDB Galera cluster next time.

Run the below rdaf CLI command to stop the MariaDB cluster service on 3 nodes gracefully.

rdaf infra down --service mariadb

The above command stops the Node03 first, Node02 next, and finally the Node01. In this sequence, since Node01 is stopped last, Node01 always becomes the bootstrap node to start and initializes the Galera cluster appropriately.

Info

Three node MariaDB Galera cluster provides high availability with a tolerance of 1 node failure.

3.1 MariaDB Galera cluster multi-node recovery on power failure or a full crash

If the MariaDB Galera cluster nodes are crashed because of power failure on all servers or because of some other server hardware failure, the cluster needs to be brought up carefully in a particular order to avoid any data loss.

First, we need to identify which node is eligible to bootstrap the MariaDB Galera cluster. For that, below are the two available methods.

  • Identify the node which has highest seqno value.

Tip

A cluster node will only have positive highest seqno value when atleast one of the node was able to gracefully shutdown. This is the node that needs to be started first to bootstrap the MariaDB Galera cluster.

OR

  • Identify the node which has recorded the last committed transaction.

3.1.1 Recover MariaDB Galera cluster using a Node which has the highest seqno value:

Login into MariaDB cluster Node03 using SSH client to access the CLI. (username: rdauser)

Following shows the content of grastate.dat in Node03. In this example, this node has negative seqno and no group ID (uuid). This is the case when a node crashes during Data Definition Language (DDL) processing:

cat /var/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 886dd8da-3d07-11e8-a109-8a3c80cebab4
seqno: -1
safe_to_bootstrap: 0

Following is the content of grastate.dat on Node01 with the highest seqno value:

cat /var/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 886dd8da-3d07-11e8-a109-8a3c80cebab4
seqno: 31929
safe_to_bootstrap: 1

Note

If all of the 3 cluster nodes contain the value of -1 for seqno and 0 for safe_to_bootstrap, that is an indication that a full cluster crash has occurred. Go to the Next Section of this document (Recover MariaDB Galera cluster using a Node that has committed the last transaction)

The MariaDB node with the highest seqno value is an appropriate Node to bootstrap the MariaDB Galera cluster.

Follow the below steps to bootstrap and bring up the MariaDB cluster:

Step-1: Login into the MariaDB bootstrap node using an SSH client as a rdauser user. (bootstrap node is identified using one of the above-mentioned procedures)

Step-2: Stop the mariadb service.

Change the directory to /opt/rdaf/deployment-scripts/<node_ip_address> and execute the below command.

docker-compose -f infra.yaml --project-name infra rm -fsv mariadb

Step-3: Change the directory to /opt/rdaf/deployment-scripts/<node_ip_address>. Edit infra.yaml docker-compose file and configure the environment variable as highlighted below to enable boot-strapping the mariadb cluster.

  mariadb:
    image: 192.168.125.140:5000/rda-platform-mariadb:1.0.1
    restart: 'no'
    network_mode: host
    mem_limit: 8G
    memswap_limit: 8G
    oom_kill_disable: false
    volumes:
    - /var/mysql:/bitnami/mariadb/data/
    - /opt/rdaf/config/mariadb:/opt/bitnami/mariadb/conf/bitnami/
    - /opt/rdaf/logs/mariadb:/opt/rdaf/log/
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: '5'
    environment:
    - MARIADB_GALERA_MARIABACKUP_USER=rdaf_backup
    ...
    ...
    ...
    - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://192.168.125.143,192.168.125.144,192.168.125.145
    - MARIADB_GALERA_CLUSTER_BOOTSTRAP=yes

Step-4: Edit /var/mysql/grastate.dat file and make sure safe_to_bootstrap value is set to 1 and save the file.

Step-5: Start the MariaDB container using the below command.

docker-compose -f infra.yaml --project-name infra up -d mariadb

After starting the MariaDB container, watch the log messages @ the below log file

/opt/rdaf/logs/mariadb/mariadb.log

and look for the below log message which confirms the Node is completely up and in the synced state.

WSREP: Server status change joined -> synced

Additionally, run the below command to verify the MariaDB cluster bootstrap node is completely up in the synced state.

mysql -u <username> -p<password> -h <node-ip> -P 3306 -e "show status like 'wsrep_local_state_comment';"
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+

Once the MariaDB bootstrap cluster node is up, continue the below steps to bring up the rest of the 2 Nodes.

Step-5: Login into the MariaDB rest of the nodes (no specific order) using SSH client as rdauser.

Step-6: Change the directory to /opt/rdaf/deployment-scripts/<node_ip_address>. Edit infra.yaml docker-compose file and make sure the below highlighted parameter MARIADB_GALERA_CLUSTER_BOOTSTRAP doesn't exist, if yes, just remove it to disable boot-strapping the mariadb cluster.

  mariadb:
    image: 192.168.125.140:5000/rda-platform-mariadb:1.0.1
    restart: 'no'
    network_mode: host
    mem_limit: 8G
    memswap_limit: 8G
    oom_kill_disable: false
    volumes:
    - /var/mysql:/bitnami/mariadb/data/
    - /opt/rdaf/config/mariadb:/opt/bitnami/mariadb/conf/bitnami/
    - /opt/rdaf/logs/mariadb:/opt/rdaf/log/
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: '5'
    environment:
    - MARIADB_GALERA_MARIABACKUP_USER=rdaf_backup
    ...
    ...
    ...
    - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://192.168.125.143,192.168.125.144,192.168.125.145
    - MARIADB_GALERA_CLUSTER_BOOTSTRAP=yes

Note

The above parameter is applicable only on the MariaDB cluster's bootstrap node which initializes the Galera cluster.

Step-7: Start the MariaDB container using the below command.

docker-compose -f infra.yaml --project-name infra up -d mariadb

After starting the MariaDB container, watch the log messages @ the below log file

/opt/rdaf/logs/mariadb/mariadb.log

and look for the below log message which confirms the Node is completely up and in the synced state.

WSREP: Server status change joined -> synced

Additionally, run the below command to verify the MariaDB cluster bootstrap node is completely up in the synced state.

mysql -u <username> -p<password> -h <node-ip> -P 3306 -e "show status like 'wsrep_local_state_comment';"
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+

Note

When second or third nodes are coming up after the crash and syncing up with the Cluster's bootstrap node, it may take few minutes or a little longer to be completely up and synced state.

Step-8: On the last MariaDB node, please follow the procedure listed in Step-6 and Step-7

Step-9: Once the MariaDB cluster nodes are completely up and functional, login into Node01 and edit the MariaDB docker-compose file infra.yaml and make sure to add the highlighted parameter MARIADB_GALERA_CLUSTER_BOOTSTRAP as environment variable and save it. (Configuration file location: /opt/rdaf/deployment-scripts/<node-ip-address>/infra.yaml)

  mariadb:
    image: 192.168.125.140:5000/rda-platform-mariadb:1.0.1
    restart: 'no'
    network_mode: host
    mem_limit: 8G
    memswap_limit: 8G
    oom_kill_disable: false
    volumes:
    - /var/mysql:/bitnami/mariadb/data/
    - /opt/rdaf/config/mariadb:/opt/bitnami/mariadb/conf/bitnami/
    - /opt/rdaf/logs/mariadb:/opt/rdaf/log/
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: '5'
    environment:
    - MARIADB_GALERA_MARIABACKUP_USER=rdaf_backup
    ...
    ...
    ...
    - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://192.168.125.143,192.168.125.144,192.168.125.145
    - MARIADB_GALERA_CLUSTER_BOOTSTRAP=yes

On Node2 & Node03, edit the MariaDB docker-compose file infra.yaml and make sure the above environment variable is not set. This is to make sure rdaf CLI starts the Node01 as cluster bootstrap node first when it is executed manually to bring up the MariaDB cluster nodes.

Note

MariaDB Galera cluster node order (i.e Node01, Node02 & Node03) is determined based on the order of comma-separated IP address list provided during the rdaf setup command which configures initial configuration of the RDAF platform.

3.1.2 Recover MariaDB Galera cluster using a Node that has committed the last transaction:

Step-1: Login into MariaDB cluster Node01 using SSH client to access the CLI. (username: rdauser)

Step-2: Run the below command to find the Mariadb container ID

docker ps -a | grep mariadb

Note

Please make sure the MariaDB container is in a stopped state or run the below command to stop the MariaDB container.

docker stop -t 120 <mariadb-container-id>

Step-3: Take a backup of the MariaDB configuration file.

cp /opt/rdaf/config/mariadb/my_custom.cnf /opt/rdaf/config/mariadb/my_custom.cnf.bak

Change the directory to /opt/rdaf/deployment-scripts/<node_ip_address>. Edit infra.yaml docker-compose file and configure the environment variable as highlighted below to disable boot-strapping the mariadb cluster.

  mariadb:
    image: 192.168.125.140:5000/rda-platform-mariadb:1.0.1
    restart: 'no'
    network_mode: host
    mem_limit: 8G
    memswap_limit: 8G
    oom_kill_disable: false
    volumes:
    - /var/mysql:/bitnami/mariadb/data/
    - /opt/rdaf/config/mariadb:/opt/bitnami/mariadb/conf/bitnami/
    - /opt/rdaf/logs/mariadb:/opt/rdaf/log/
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: '5'
    environment:
    - MARIADB_GALERA_MARIABACKUP_USER=rdaf_backup
    ...
    ...
    ...
    - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://192.168.125.143,192.168.125.144,192.168.125.145
    - MARIADB_GALERA_CLUSTER_BOOTSTRAP=no

Note

The above environment variable MARIADB_GALERA_CLUSTER_BOOTSTRAP is applicable only on the MariaDB cluster's bootstrap node which initializes the Galera cluster.

If MARIADB_GALERA_CLUSTER_BOOTSTRAP is modified in infra.yml file, please run the below commands to stop the MariaDB service.

docker-compose -f infra.yaml --project-name infra rm -fsv mariadb

Step-4: Edit the MariaDB configuration file and add the below specified option. (Configuration file location: /opt/rdaf/config/mariadb/my_custom.cnf)

wsrep-recover=1

Step-5: Start the MariaDB service and wait for 2 to 3 minutes to allow it to be completely up.

docker-compose -f infra.yaml --project-name infra up -d mariadb

Step-6: Tail mariadb service log and look for similar to the below message. (/opt/rdaf/logs/mariadb/mariadb.log)

2021-06-07 9:50:36 0 [Note] WSREP: Recovered position: afa02221-c422-11eb-8a24-96c95f63c95b:397159

Note down the above highlighted value and follow the same steps from Step-4 through Step-6 for Node02 & Node03

The MariaDB node with the latest data will have the highest value and that is an appropriate Node to bootstrap the MariaDB Galera cluster.

Follow the below steps to bring up the MariaDB Galera cluster:

Step-1: Login into the MariaDB that was identified as a bootstrap node (node that has the highest recovered position value) using an SSH client as a rdauser user.

Step-2: Change the directory to /opt/rdaf/deployment-scripts/<node_ip_address>. Edit infra.yaml docker-compose file and configure the environment variable as highlighted below to enable boot-strapping the mariadb cluster.

  mariadb:
    image: 192.168.125.140:5000/rda-platform-mariadb:1.0.1
    restart: 'no'
    network_mode: host
    mem_limit: 8G
    memswap_limit: 8G
    oom_kill_disable: false
    volumes:
    - /var/mysql:/bitnami/mariadb/data/
    - /opt/rdaf/config/mariadb:/opt/bitnami/mariadb/conf/bitnami/
    - /opt/rdaf/logs/mariadb:/opt/rdaf/log/
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: '5'
    environment:
    - MARIADB_GALERA_MARIABACKUP_USER=rdaf_backup
    ...
    ...
    ...
    - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://192.168.125.143,192.168.125.144,192.168.125.145
    - MARIADB_GALERA_CLUSTER_BOOTSTRAP=yes

Edit my_custom.cnf configuration file and make sure the below parameter is removed and save it. (Configuration file location: /opt/rdaf/config/mariadb/my_custom.cnf)

wsrep-recover=1

Step-3: Edit /var/mysql/grastate.dat file and set safe_to_bootstrap value as 1 and save the file.

Step-4: Stop the MariaDB container using the below command. (infra.yaml file is under /opt/rdaf/deployment-scripts/<node_ip_address>)

docker-compose -f infra.yaml --project-name infra rm -fsv mariadb

Step-5: Start the MariaDB service using the below command.

docker-compose -f infra.yaml --project-name infra up -d mariadb

After starting the MariaDB service, watch the log messages @ the below log file

/opt/rdaf/logs/mariadb/mariadb.log

and look for the below log message which confirms the Node is completely up and in the synced state.

WSREP: Server status change joined -> synced

Additionally, run the below command to verify the MariaDB cluster bootstrap node is completely up in the synced state.

mysql -u <username> -p<password> -h <node-ip> -P 3306 -e "show status like 'wsrep_local_state_comment';"
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+

Once the MariaDB bootstrap cluster node is up, continue the below steps to bring up the rest of the 2 Nodes.

Step-5: Login into the MariaDB rest of the nodes (no specific order) using SSH client as rdauser.

Step-6: Change the directory to /opt/rdaf/deployment-scripts/<node_ip_address>. Edit infra.yaml docker-compose file and make sure the below highlighted parameter MARIADB_GALERA_CLUSTER_BOOTSTRAP doesn't exist, if yes, just remove it to disable boot-strapping the mariadb cluster.

  mariadb:
    image: 192.168.125.140:5000/rda-platform-mariadb:1.0.1
    restart: 'no'
    network_mode: host
    mem_limit: 8G
    memswap_limit: 8G
    oom_kill_disable: false
    volumes:
    - /var/mysql:/bitnami/mariadb/data/
    - /opt/rdaf/config/mariadb:/opt/bitnami/mariadb/conf/bitnami/
    - /opt/rdaf/logs/mariadb:/opt/rdaf/log/
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: '5'
    environment:
    - MARIADB_GALERA_MARIABACKUP_USER=rdaf_backup
    ...
    ...
    ...
    - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://192.168.125.143,192.168.125.144,192.168.125.145
    - MARIADB_GALERA_CLUSTER_BOOTSTRAP=yes

Note

The above environment variable MARIADB_GALERA_CLUSTER_BOOTSTRAP is applicable only on the MariaDB cluster's bootstrap node which initializes the Galera cluster.

Step-7: Edit my_custom.cnf configuration file and make sure the below parameter doesn't exist, if yes, just remove it. (Configuration file location: /opt/rdaf/config/mariadb/my_custom.cnf)

wsrep-recover=1

Step-8: Start the MariaDB container using the below command. Change the directory to /opt/rdaf/deployment-scripts/<node_ip_address>

docker-compose -f infra.yaml --project-name infra up -d mariadb

After starting the MariaDB container, watch the log messages @ the below log file

/opt/rdaf/logs/mariadb/mariadb.log

and look for the below log message which confirms the Node is completely up and in the synced state.

WSREP: Server status change joined -> synced

Additionally, run the below command to verify the MariaDB cluster bootstrap node is completely up in the synced state.

mysql -u <username> -p<password> -h <node-ip> -P 3306 -e "show status like 'wsrep_local_state_comment';"
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+

Note

When second or third nodes are coming up after the crash and syncing up with the Cluster's bootstrap node, it may take few minutes or a little longer to be completely up and in synced state.

Step-9: On the last MariaDB node, please follow the procedure listed in Step-6

Step-10: Once the MariaDB cluster nodes are completely up and functional, login into Node01 and edit the MariaDB docker-compose file infra.yaml and make sure to add the highlighted parameter MARIADB_GALERA_CLUSTER_BOOTSTRAP as environment variable and save it. (Configuration file location: /opt/rdaf/deployment-scripts/<node-ip-address>/infra.yaml)

  mariadb:
    image: 192.168.125.140:5000/rda-platform-mariadb:1.0.1
    restart: 'no'
    network_mode: host
    mem_limit: 8G
    memswap_limit: 8G
    oom_kill_disable: false
    volumes:
    - /var/mysql:/bitnami/mariadb/data/
    - /opt/rdaf/config/mariadb:/opt/bitnami/mariadb/conf/bitnami/
    - /opt/rdaf/logs/mariadb:/opt/rdaf/log/
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: '5'
    environment:
    - MARIADB_GALERA_MARIABACKUP_USER=rdaf_backup
    ...
    ...
    ...
    - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://192.168.125.143,192.168.125.144,192.168.125.145
    - MARIADB_GALERA_CLUSTER_BOOTSTRAP=yes

On Node2 & Node03, edit the MariaDB docker-compose file infra.yaml and make sure the above environment variable is not set or removed. This is to make sure rdaf CLI starts the Node01 as cluster bootstrap node first when it is executed manually to bring up the MariaDB cluster nodes.

Note

MariaDB Galera cluster node order (i.e Node01, Node02 & Node03) is determined based on the order of comma-separated IP address list provided during the rdaf setup command which configures initial configuration of the RDAF platform.

4. Install & Configure RDAF Log Streaming

RDAF is built on cloud native and distributed microservices architecture. When it is deployed, it installs below services.

  • Infrastructure Services
  • Core Platform Services
  • Application Services
  • Worker Services

All of these services generate log events which reflects the operational health of the RDAF platform in realtime.

As RDAF platform has many microservices, it becomes difficult to monitor and analyze all of the microservices logs for any operational analysis or troubleshooting when needed.

To address the above mentioned challenge, RDAF provides below log streaming services which helps to stream the logs of all RDAF platform's microservices and ingest them into RDAF pstreams in realtime.

  • Logstash: It is a service which processes incoming log stream from Fluentbit, normalizes different log formats of RDAF service logs into a common data model and ingest them into RDAF's Opensearch index store. Additionally, it supports forwarding the processed logs to external log management tools such as Splunk, Elasticsearch, IBM Qradar etc..
  • Fluentbit: It is a very light weight log shipping agent which monitors the RDAF service logs and forward them to Logstash service in realtime.

Once the RDAF platform service's logs are ingested into pstreams, they can be visualized and analyzed using RDAF's composable dashboards or use rdac pstream query or rdac pstream tail CLI options to access the logs in realtime.

The following section provides you the instructions on how to install and configure both Logstash and Fluentbit log streaming services.

4.1 Logstash Installation & Configuration

Important

Pre-requisites:

  • Install Logstash on where RDAF CLI was installed and the rdaf setup was run.
  • Access to /opt/rdaf/rdaf.cfg configuration file
  • Access to /opt/rdaf/config/network_config/config.json configuration file
  • Access to /opt/rdaf/cert/ca/ca.crt certificate file
  • rdac CLI was installed, please refer RDAC CLI Installation

Tip

To use rdac.py as a regular command, follow the below steps

sudo cp rdac.py /usr/bin/rdac
sudo chmod +x /usr/bin/rdac
rdac --help

Step-1:

Run the below command to create and save the docker login session into CloudFabrix's secure docker repository.

docker login -u='readonly' -p='readonly' cfxregistry.cloudfabrix.io

Run the below sequence of commands to create the required directory structure and set the permissions.

sudo mkdir -p /opt/logstash/config
sudo mkdir -p /opt/logstash/config/cert
sudo mkdir -p /opt/logstash/pipeline
sudo mkdir -p /opt/logstash/templates
sudo mkdir -p /opt/logstash/data
sudo mkdir -p /opt/logstash/logs
sudo chown -R `id -u`:`id -g` /opt/logstash

Step-2:

Copy the CA certificate to Logstash configuration folder

cp /opt/rdaf/cert/ca/ca.crt /opt/logstash/config/cert/ca.crt

Step-3:

Enable the required firewall ports for Logstash to receive the log events from Fluentbit

sudo ufw allow 5045/tcp
sudo ufw allow 5046/tcp

Step-4:

Create the required RDAF pstreams to ingest the RDAF service logs.

tenant_id=`cat /opt/rdaf/config/network_config/config.json | grep tenant_id | awk '{print $2}' | cut -f2 -d"\""`

rdac pstream add --name rdaf_services_logs --index $tenant_id-stream-rdaf_services_logs --retention_days 15 --timestamp @timestamp

Step-5:

Run the below command to view and verify the above RDAF pstreams are created.

rdac pstream list

Step-6:

Create the docker-compose file as shown below, install and bring the service up.

cd /opt/logstash

cat > logstash-docker-compose.yml <<'EOF'
version: '3'

services:
  logstash:
    image: "cfxregistry.cloudfabrix.io/rda-platform-logstash:1.0.2"
    container_name: rda_logstash
    hostname: rda_logstash
    network_mode: host
    restart: always
    oom_kill_disable: false
    user: root
    mem_limit: 6G
    memswap_limit: 6G
    logging:
      driver: "json-file"
      options:
        max-size: "25m"
        max-file: "5"
    volumes:
      - /opt/logstash/config:/usr/share/logstash/config
      - /opt/logstash/pipeline:/usr/share/logstash/pipeline
      - /opt/logstash/templates:/usr/share/logstash/templates
      - /opt/logstash/data:/usr/share/logstash/data
      - /opt/logstash/logs:/usr/share/logstash/logs
    environment:
      LS_JAVA_OPTS: -Xmx4g -Xms4g
    command: logstash

EOF
docker-compose -f logstash-docker-compose.yml up -d
sleep 30

Step-7:

Configure the Logstash service and restart it.

tenant_id=`cat /opt/rdaf/config/network_config/config.json | grep tenant_id | awk '{print $2}' | cut -f2 -d"\""`
opensearch_host=`cat /opt/rdaf/rdaf.cfg | grep -A3 "\[opensearch\]" | grep datadir | awk '{print $3}' | cut -f1 -d"/"`
opensearch_user=`cat /opt/rdaf/rdaf.cfg | grep -A3 "\[opensearch\]" | grep user | awk '{print $3}' | base64 -d`
opensearch_password=`cat /opt/rdaf/rdaf.cfg | grep -A3 "\[opensearch\]" | grep password | awk '{print $3}' | base64 -d`

sed -i "s/TENANT_ID/$tenant_id/g" /opt/logstash/pipeline/rda_services.conf
sed -i "s/localhost/$opensearch_host/g" /opt/logstash/pipeline/rda_services.conf
sed -i "s/OS_USERNAME/$opensearch_user/g" /opt/logstash/pipeline/rda_services.conf
sed -i "s/OS_PASSWORD/$opensearch_password/g" /opt/logstash/pipeline/rda_services.conf

sed -i "s/TENANT_ID/$tenant_id/g" /opt/logstash/pipeline/rda_minio.conf
sed -i "s/localhost/$opensearch_host/g" /opt/logstash/pipeline/rda_minio.conf
sed -i "s/OS_USERNAME/$opensearch_user/g" /opt/logstash/pipeline/rda_minio.conf
sed -i "s/OS_PASSWORD/$opensearch_password/g" /opt/logstash/pipeline/rda_minio.conf

logstash_container_id=`docker ps -a | grep rda-platform-logstash | awk '{print $1}'`

docker restart $logstash_container_id

4.2 Fluentbit Installation & Configuration

Important

Pre-requisites:

  • Logstash service was installed, please refer above section for installing the Logstash service.
  • Firewall ports 5045 & 5046 are open on Logstash host.

Install & configure Fluentbit log shipping agent on all of the RDAF infrastructure, platform, application and worker service VMs.

Step-1:

Run the below command to create and save the docker login session into CloudFabrix's secure docker repository.

docker login -u='readonly' -p='readonly' cfxregistry.cloudfabrix.io

Run the below sequence of commands to create the required directory structure and set the permissions.

sudo mkdir -p /opt/fluent-bit/config
sudo mkdir -p /opt/fluent-bit/logs
sudo mkdir -p /opt/fluent-bit/data
sudo chown -R `id -u`:`id -g` /opt/fluent-bit

Step-2:

Create the docker-compose file as shown below, install and bring the service up.

cd /opt/fluent-bit

cat > fluentbit-docker-compose.yml <<'EOF'
version: "3"
services:
  fluentbit:
    container_name: rda-platform-fluentbit
    image: cfxregistry.cloudfabrix.io/rda-platform-fluent-bit:1.0.2
    restart: always
    network_mode: host
    oom_kill_disable: false
    mem_limit: 4G
    memswap_limit: 4G
    logging:
      driver: "json-file"
      options:
        max-size: "25m"
        max-file: "5"
    volumes:
      - /opt/fluent-bit/config:/fluent-bit/config
      - /opt/fluent-bit/logs:/fluent-bit/logs
      - /opt/fluent-bit/data:/fluent-bit/data
      - /opt/rdaf/logs:/applogs
      - /var/log:/syslog:ro
    entrypoint: ["/fluent-bit/bin/docker-entry-point.sh"]

EOF
docker-compose -f fluentbit-docker-compose.yml up -d
sleep 5

Step-3:

Configure the Fluentbit log shipping agent and restart it.

Set the Logstash IP Address for below variable.

export logstash_ip=<LOGSTASH_IP_ADDRESS>

Warning

Please make sure to set the correct Logstash host's IP address to above variable before running the below commands.

sed -i "s/localhost/$logstash_ip/g" /opt/fluent-bit/config/fluent-bit-output.conf

fluentbit_container_id=`docker ps -a | grep rda-platform-fluent-bit | awk '{print $1}'`

docker restart $fluentbit_container_id

4.3 Enabling Minio service logs

Minio object storage service does not write the server and audit log messages to disk, instead, it provides an option to configure a webhook endpoint to push the server and audit log events.

Follow the below steps to enable and stream the Minio logs to Logstash service.

Important

  • Run the below commands where rdaf setup was run
  • Access to /opt/rdaf/rdaf.cfg
  • mc CLI (Minio Client)

Run the below commands to configure Minio service to push the server and audit logs to Logstash service.

  • Configure Minio service access settings using mc CLI (Minio Client)
minio_host=`cat /opt/rdaf/rdaf.cfg | grep -A3 "\[minio\]" | grep datadir | awk '{print $3}' | cut -f1 -d"/"`
minio_user=`cat /opt/rdaf/rdaf.cfg | grep -A3 "\[minio\]" | grep user | awk '{print $3}' | base64 -d`
minio_password=`cat /opt/rdaf/rdaf.cfg | grep -A4 "\[minio\]" | grep password | awk '{print $3}' | base64 -d`

mc alias set myminio http://$minio_host:9000 $minio_user $minio_password
  • Set the Logstash IP Address for below variable.
export logstash_ip=<LOGSTASH_IP_ADDRESS>

Warning

Please make sure to set the correct Logstash host's IP address to above variable before running the below commands.

  • Configure the Minio service to forward both server and audit logs to Logstash service.
mc admin config set myminio/ logger_webhook:"rdaf_log_streaming" endpoint="http://$logstash_ip:5046"
mc admin config set myminio/ audit_webhook:"rdaf_log_streaming" endpoint="http://$logstash_ip:5046"

4.4 Search and Query RDAF service logs from pstreams

RDAF service logs are ingested into the below pstream which can be queried or tailed using rdac CLI

PStream Name:

  • rdaf_services_logs: It indexes the RDAF service logs from below
    • OS syslog / messages
    • MariaDB
    • NATs
    • Kafka
    • Zookeeper
    • Opensearch
    • Minio
    • HAProxy Supervisor
    • RDA core platform services
    • OIA & AIA application services
    • RDA Portal backend service
    • RDA Portal frontend service
    • RDA HAProxy Access Logs

Below are the extracted / enriched / normalized attributes which can be used to query the logs from the above pstreams.

Attribute Name
Attribute Value
service_name rda_access_manager
rda_alert_ingester
 rda_alert_processor
rda_all_alerts_caas
rda_api_server
rda_app_builder
rda_app_controller
rda_asset_dependency
rda_collaboration
rda_collector
rda_config_service
rda_current_alerts_caas
rda_event_consumer
rda_file_browser
rda_haproxy
rda_identity
rda_ingestion_tracker
 rda_irm_service
rda_kafka
rda_mariadb
rda_minio
rda_ml_config
rda_nats
rda_notification_service
 rda_opensearch
rda_os_syslog
rda_portal_backend
rda_portal_frontend
rda_registry
rda_reports_registry
 rda_resource_manager
rda_schedule_admin
rda_scheduler
rda_smtp_server
rda_user_preferences
rda_webhook_server
rda_worker
rda_zookeeper
service_category rda_app_svcs
rda_pfm_svcs
rda_infra_svcs
service_host Hostname of RDAF VM. Ex: rda-platform-vm01
log_severity INFO
WARNING
ERROR
DEBUG
TRACE
log_message Extracted log message. Ex: pod age greater than 60 pod_id: 1091635f, pod_type:cfxdimensions-app-access-manager, inactive_pods: {'d0a17813', '1091635f', '766cbf82'}
process_name RDAF platform or app service's internal process name captured within the log message. Ex: rda_messaging.nats_client
process_function RDAF platform or app service's internal process function name captured within the log message. Ex: health_check
thread_id RDAF platform or app service's internal thread ID captured within the log message. Ex: Thread-9
log Full raw log message. Ex: 2022-09-22 01:47:59,611 [PID=8:TID=Thread-9:rda_messaging.nats_client:health_check:435] INFO - Sef health check successfull
host RDAF VM host's IP Address. Ex: 192.168.10.11

Tail logs from pstream:

Example:

Run the below command to tail pstream rdaf_services_logs for RDAF platform registry service.

rdac pstream tail --name rdaf_services_logs --ts @timestamp --query "service_name = 'rda_registry'" --out_cols 'log'

--query supports CFXQL query. However it doesn't support get columns option.

--out_cols use this option to get specific attributes from the pstream as shown in the above example.

--json use this option to get the log output in JSON format. However, it doesn't support limiting the selective attributes listed under --out_cols option.

Run the below command to tail only ERROR messages across all RDAF platform and application services.

rdac pstream tail --name rdaf_services_logs --ts @timestamp --query "log_severity = 'ERROR'" --out_cols 'service_name,log'

Query logs from pstream:

Example:

Run the below command to query the pstream rdaf_services_logs for ERROR messages from all services within last 24 hours.

rdac pstream query --name rdaf_services_logs --ts @timestamp --query '"\`@timestamp\`" is after -1d and log_severity = 'ERROR' get service_name,log' --json

4.5 Add RDAF Log Analytics Dashboard to the portal

Login into RDAF portal as a tenant admin user.

Go to Configuration menu and click on Artifacts

Under Dashboards section, click on View Details

Click on Add YAML button to add a new RDAF Log Analytics dashboard.

Copy and Paste the below content to it and click on Save

{
    "name": "rdaf-platform-log-analytics",
    "label": "RDAF Platform Logs",
    "description": "RDAF Platform service's log analysis dashboard",
    "version": "22.9.22.1",
    "enabled": true,
    "dashboard_style": "tabbed",
    "status_poller": {
        "stream": "rdaf_services_logs",
        "frequency": 15,
        "columns": [
            "@timestamp"
        ],
        "sorting": [
            {
                "@timestamp": "desc"
            }
        ],
        "query": "`@timestamp` is after '${timestamp}'",
        "defaults": {
            "@timestamp": "$UTCNOW"
        },
        "action": "refresh"
    },
    "dashboard_filters": {
        "time_filter": true,
        "columns_filter": [
            {
                "id": "@timestamp",
                "label": "Timestamp",
                "type": "DATETIME"
            },
            {
                "id": "service_name",
                "label": "Service Name",
                "type": "TEXT"
            },
            {
                "id": "service_category",
                "label": "Service Category",
                "type": "TEXT"
            },
            {
                "id": "log_severity",
                "label": "Log Severity",
                "type": "TEXT"
            },
            {
                "id": "log",
                "label": "Log Message",
                "type": "TEXT"
            },
            {
                "id": "process_name",
                "label": "Process Name",
                "type": "TEXT"
            },
            {
                "id": "process_function",
                "label": "Process Function",
                "type": "TEXT"
            },
            {
                "id": "thread_id",
                "label": "Thread ID",
                "type": "TEXT"
            },
            {
                "id": "service_host",
                "label": "Hostname",
                "type": "TEXT"
            },
            {
                "id": "host",
                "label": "IP Address",
                "type": "TEXT"
            }
        ],
        "group_filters": [
            {
                "stream": "rdaf_services_logs",
                "title": "Log Severity",
                "group_by": [
                    "log_severity"
                ],
                "ts_column": "@timestamp",
                "agg": "value_count",
                "column": "_id",
                "type": "int"
            },
            {
                "stream": "rdaf_services_logs",
                "title": "Service Name",
                "group_by": [
                    "service_name"
                ],
                "ts_column": "@timestamp",
                "agg": "value_count",
                "column": "_id",
                "type": "int"
            },
            {
                "stream": "rdaf_services_logs",
                "title": "Service Category",
                "group_by": [
                    "service_category"
                ],
                "ts_column": "@timestamp",
                "agg": "value_count",
                "column": "_id",
                "type": "int"
            },
            {
                "stream": "rdaf_services_logs",
                "title": "RDA Hostname",
                "group_by": [
                    "service_host"
                ],
                "ts_column": "@timestamp",
                "agg": "value_count",
                "column": "_id",
                "type": "int"
            },
            {
                "stream": "rdaf_services_logs",
                "title": "RDA Host IPAddress",
                "group_by": [
                    "service_host"
                ],
                "ts_column": "@timestamp",
                "agg": "value_count",
                "column": "_id",
                "type": "int"
            }
        ]
    },
    "dashboard_sections": [
        {
            "title": "Overall Summary",
            "show_filter": true,
            "widgets": [
                {
                    "title": "Log Severity Trend",
                    "widget_type": "timeseries",
                    "stream": "rdaf_services_logs",
                    "ts_column": "@timestamp",
                    "max_width": 12,
                    "height": 3,
                    "min_width": 12,
                    "chartProperties": {
                        "yAxisLabel": "Count",
                        "xAxisLabel": null,
                        "legendLocation": "bottom"
                    },
                    "interval": "15Min",
                    "group_by": [
                        "log_severity"
                    ],
                    "series_spec": [
                        {
                            "column": "log_severity",
                            "agg": "value_count",
                            "type": "int"
                        }
                    ],
                    "widget_id": "06413884"
                },
                {
                    "widget_type": "label",
                    "label": "<h3><font style=\"color: #ffffff;\"><table border=0> <tr><td width=\"50%\" align=\"left\" rowspan=1><b>TOTAL Logs:</b></td><td width=\"50%\" align=\"right\" rowspan=1>{{ \"{:,}\".format(TOTAL | int) }}</td></tr> <tr><td height=\"20px\" colspan=1></td></tr> <tr><td width=\"50%\" align=\"left\" rowspan=1><b>INFO Logs:</b></td><td width=\"50%\" align=\"right\" rowspan=1>{{ \"{:,}\".format(INFO | int) }}</td></tr> <tr><td height=\"20px\" colspan=1></td></tr> <tr><td width=\"50%\" align=\"left\" rowspan=1><b>WARN Logs:</b></td><td width=\"50%\" align=\"right\" rowspan=1>{{ \"{:,}\".format(WARNING | int) }}</td></tr> <tr><td height=\"20px\" colspan=1></td></tr> <tr><td width=\"50%\" align=\"left\" rowspan=1><b>ERROR Logs:</b></td><td width=\"50%\" align=\"right\" rowspan=1>{{ \"{:,}\".format(ERROR | int) }}</td></tr> <tr><td height=\"20px\" colspan=1></td></tr> </table></font></h3>",
                    "min_width": 3,
                    "max_width": 4,
                    "height": 4,
                    "style": {
                        "backgroundColor": "#1976d2",
                        "color": "#ffffff"
                    },
                    "segments": [
                        {
                            "variable": "TOTAL",
                            "agg": "value_count",
                            "type": "int",
                            "stream": "rdaf_services_logs",
                            "ts_column": "@timestamp",
                            "extra_filter": "",
                            "column": "service_category.keyword"
                        },
                        {
                            "variable": "INFO",
                            "agg": "value_count",
                            "type": "int",
                            "stream": "rdaf_services_logs",
                            "ts_column": "@timestamp",
                            "extra_filter": "log_severity is 'INFO'",
                            "column": "log_severity.keyword"
                        },
                        {
                            "variable": "WARNING",
                            "agg": "value_count",
                            "type": "int",
                            "stream": "rdaf_services_logs",
                            "ts_column": "@timestamp",
                            "extra_filter": "log_severity is 'WARNING'",
                            "column": "log_severity.keyword"
                        },
                        {
                            "variable": "ERROR",
                            "agg": "value_count",
                            "type": "int",
                            "stream": "rdaf_services_logs",
                            "ts_column": "@timestamp",
                            "extra_filter": "log_severity is 'ERROR'",
                            "column": "log_severity.keyword"
                        },
                        {
                            "variable": "DEBUG",
                            "agg": "value_count",
                            "type": "int",
                            "stream": "rdaf_services_logs",
                            "ts_column": "@timestamp",
                            "extra_filter": "log_severity = 'DEBUG'",
                            "column": "log_severity.keyword"
                        }
                    ],
                    "widget_id": "5ae002f1"
                },
                {
                    "widget_type": "pie_chart",
                    "title": "Logs by Severity",
                    "stream": "rdaf_services_logs",
                    "ts_column": "@timestamp",
                    "column": "_id",
                    "agg": "value_count",
                    "group_by": [
                        "log_severity"
                    ],
                    "type": "str",
                    "style": {
                        "color-map": {
                            "ERROR": [
                                "#ef5350",
                                "#ffffff"
                            ],
                            "WARNING": [
                                "#FFA726",
                                "#ffffff"
                            ],
                            "INFO": [
                                "#388e3c",
                                "#ffffff"
                            ],
                            "DEBUG": [
                                "#000000",
                                "#ffffff"
                            ],
                            "UNKNOWN": [
                                "#bcaaa4",
                                "#ffffff"
                            ]
                        }
                    },
                    "min_width": 4,
                    "height": 4,
                    "max_width": 4,
                    "widget_id": "b2ffa8e9"
                },
                {
                    "widget_type": "pie_chart",
                    "title": "Logs by RDA Host",
                    "stream": "rdaf_services_logs",
                    "ts_column": "@timestamp",
                    "column": "_id",
                    "agg": "value_count",
                    "group_by": [
                        "service_host"
                    ],
                    "type": "str",
                    "min_width": 4,
                    "height": 2,
                    "max_width": 4,
                    "widget_id": "79355cb8"
                },
                {
                    "widget_type": "pie_chart",
                    "title": "Logs by RDA Host IP",
                    "stream": "rdaf_services_logs",
                    "ts_column": "@timestamp",
                    "column": "_id",
                    "agg": "value_count",
                    "group_by": [
                        "host"
                    ],
                    "type": "str",
                    "min_width": 4,
                    "height": 4,
                    "max_width": 4,
                    "widget_id": "a4f2d8bd"
                },
                {
                    "widget_type": "pie_chart",
                    "title": "Logs by Service Category",
                    "stream": "rdaf_services_logs",
                    "ts_column": "@timestamp",
                    "column": "_id",
                    "agg": "value_count",
                    "group_by": [
                        "service_category"
                    ],
                    "type": "str",
                    "min_width": 4,
                    "height": 4,
                    "max_width": 4,
                    "widget_id": "89ac5ce9"
                },
                {
                    "widget_type": "pie_chart",
                    "title": "Logs by Service Name",
                    "stream": "rdaf_services_logs",
                    "ts_column": "@timestamp",
                    "column": "_id",
                    "agg": "value_count",
                    "group_by": [
                        "service_name"
                    ],
                    "type": "str",
                    "min_width": 4,
                    "height": 4,
                    "max_width": 4,
                    "widget_id": "4b267fce"
                }
            ]
        },
        {
            "title": "App Services",
            "show_filter": true,
            "widgets": [
                {
                    "widget_type": "tabular",
                    "title": "Log Messages",
                    "stream": "rdaf_services_logs",
                    "extra_filter": "service_category in ['rda_app_svcs', 'rda_pfm_svcs']",
                    "ts_column": "@timestamp",
                    "sorting": [
                        {
                            "@timestamp": "desc"
                        }
                    ],
                    "columns": {
                        "@timestamp": {
                            "title": "Timestamp",
                            "type": "DATETIME"
                        },
                        "state_color2": {
                            "type": "COLOR-MAP",
                            "source-column": "log_severity",
                            "color-map": {
                                "INFO": "#388e3c",
                                "ERROR": "#ef5350",
                                "WARNING": "#ffa726",
                                "DEBUG": "#000000"
                            }
                        },
                        "log_severity": {
                            "title": "Severity",
                            "htmlTemplateForRow": "<span class='badge' style='background-color: {{ row.state_color2 }}' > {{ row.log_severity }} </span>"
                        },
                        "service_name": "Service Name",
                        "process_name": "Process Name",
                        "process_function": "Process Function",
                        "log": "Message"
                    },
                    "widget_id": "6895c8f0"
                }
            ]
        },
        {
            "title": "Infra Services",
            "show_filter": true,
            "widgets": [
                {
                    "widget_type": "tabular",
                    "title": "Log Messages",
                    "stream": "rdaf_services_logs",
                    "extra_filter": "service_category in ['rda_infra_svcs']",
                    "ts_column": "@timestamp",
                    "sorting": [
                        {
                            "@timestamp": "desc"
                        }
                    ],
                    "columns": {
                        "@timestamp": {
                            "title": "Timestamp",
                            "type": "DATETIME"
                        },
                        "log_severity": {
                            "title": "Severity",
                            "htmlTemplateForRow": "<span class='badge' style='background-color: {{ row.state_color2 }}' > {{ row.log_severity }} </span>"
                        },
                        "state_color2": {
                            "type": "COLOR-MAP",
                            "source-column": "log_severity",
                            "color-map": {
                                "INFO": "#388e3c",
                                "ERROR": "#ef5350",
                                "WARNING": "#ffa726",
                                "DEBUG": "#000000",
                                "UNKNOWN": "#bcaaa4"
                            }
                        },
                        "service_name": "Service Name",
                        "process_name": "Process Name",
                        "log": "Message",
                        "minio_object": "Minio Object"
                    },
                    "widget_id": "98f10587"
                }
            ]
        }
    ],
    "saved_time": "2022-09-30T06:34:19.205249"
}

Click on the added dashboard rdaf-platform-log-analytics to visualize the logs.

Chart

4.6 Un-installing Logstash and Fluentbit

Follow the below steps to un-install both Logstash and Fluentbit services.

Un-installing Logstash service:

  • Login into RDAF host as rdauser (using SSH client) on which Logstash service was installed

  • Stop the Logstash service and remove the container

docker rm -f $(docker ps -a | grep rda-platform-logstash | awk '{print $1}')
  • Remove the Logstash docker image
docker rmi $(docker images | grep rda-platform-logstash | awk '{print $3}')
  • Remove the Logstash service configuration

Danger

Below steps will remove all of the existing Logstash configuration data.

rm -rf /opt/logstash/*

Un-installing Fluentbit service:

  • Login into RDAF host as rdauser (using SSH client) on which Fluentbit service was installed

  • Stop the Fluentbit service and remove the container

docker rm -f $(docker ps -a | grep rda-platform-fluent-bit | awk '{print $1}')
  • Remove the Fluentbit docker image
docker rmi $(docker images | grep rda-platform-fluent-bit | awk '{print $3}')
  • Remove the Fluentbit service configuration

Danger

Below steps will remove all of the existing Fluentbit configuration data.

rm -rf /opt/fluent-bit/*