Skip to content

Installing OIA (Operations Intelligence & Analytics)

This document provides instructions about fresh Installation of OIA (Operations Intelligence & Analytics), which is also referred as AIOps.

1. Setup & Install

cfxOIA is an application that is installed on top of RDA Fabric platform.

1.1 Tag Verion: 7.0.6

Pre-requisites:

Below are the pre-requisites which need to be in place before installing the OIA (AIOps) application services.

RDAF Deployment CLI Version: 1.1.2

RDAF Infrastructure Services Tag Version: 1.0.2

RDAF Core Platform & Worker Services Tag Version: 3.1.0

RDAF Client (RDAC) Tag Version: 3.1.0

Warning

Please complete all of the above pre-requisites before installing the OIA (AIOps) application services.

Login as rdauser user into on-premise docker registry or RDA Fabric Platform VM on which RDAF deployment CLI was installed (ex: putty)

Before installing the OIA (AIOps) application services, please run the below command to update HAProxy (Loadbalancer) configuration.

rdaf app update-config OIA
rdafk8s app update-config OIA

Run the below rdaf or rdafk8scommand, to make sure all of the RDAF infrastructure services are up and running.

rdaf infra status
rdafk8s infra status

Run the below rdac pods command, to make sure all of the RDAF core platform and worker services are up and running.

rdac pods
+-------+----------------------------------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Host           | ID       | Site        | Age      |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------|
| App   | asset-dependency                       | rda-asset-depe | 090669bf |             | 20:18:21 |      8 |        47.03 |               |              |
| App   | authenticator                          | rda-identity-5 | 57905b20 |             | 20:19:11 |      8 |        47.03 |               |              |
| App   | cfxdimensions-app-access-manager       | rda-access-man | 6338ad29 |             | 20:18:44 |      8 |        47.03 |               |              |
| App   | cfxdimensions-app-notification-service | rda-notificati | bb9e3e7b |             | 20:09:52 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-resource-manager     | rda-resource-m | e5a28e16 |             | 20:18:34 |      8 |        47.03 |               |              |
| App   | user-preferences                       | rda-user-prefe | fd09d3ba |             | 20:18:08 |      8 |        47.03 |               |              |
| Infra | api-server                             | rda-api-server | b1b910d9 |             | 20:19:22 |      8 |        47.03 |               |              |
| Infra | collector                              | rda-collector- | 99553e51 |             | 20:18:17 |      8 |        47.03 |               |              |
| Infra | registry                               | rda-registry-7 | a46cd712 |             | 20:19:15 |      8 |        47.03 |               |              |
| Infra | scheduler                              | rda-scheduler- | d5537051 | *leader*    | 20:18:26 |      8 |        47.03 |               |              |
| Infra | worker                                 | rda-worker-54d | 1f769792 | rda-site-01 | 20:06:48 |      4 |        15.6  | 0             | 0            |
+-------+----------------------------------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+

Run the below rdac healthcheck command to check the health status of all of the RDAF core platform and worker services.

All of the dependency checks should show as ok under Status column.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                               |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_infra | api-server                             | rda-api-serv | b1b910d9 |             | service-status                                      | ok       |                                                       |
| rda_infra | api-server                             | rda-api-serv | b1b910d9 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | asset-dependency                       | rda-asset-de | 090669bf |             | service-status                                      | ok       |                                                       |
| rda_app   | asset-dependency                       | rda-asset-de | 090669bf |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | authenticator                          | rda-identity | 57905b20 |             | service-status                                      | ok       |                                                       |
| rda_app   | authenticator                          | rda-identity | 57905b20 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | authenticator                          | rda-identity | 57905b20 |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | rda-access-m | 6338ad29 |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | rda-access-m | 6338ad29 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | rda-access-m | 6338ad29 |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | cfxdimensions-app-access-manager       | rda-access-m | 6338ad29 |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | rda-access-m | 6338ad29 |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | service-dependency:cfxdimensions-app-access-manager | ok       | 1 pod(s) found for cfxdimensions-app-access-manager   |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | DB-connectivity                                     | ok       |                                                       |
| rda_infra | collector                              | rda-collecto | 99553e51 |             | service-status                                      | ok       |                                                       |
| rda_infra | collector                              | rda-collecto | 99553e51 |             | minio-connectivity                                  | ok       |                                                       |
| rda_infra | collector                              | rda-collecto | 99553e51 |             | opensearch-connectivity:default                     | ok       |                                                       |
| rda_infra | registry                               | rda-registry | a46cd712 |             | service-status                                      | ok       |                                                       |
| rda_infra | registry                               | rda-registry | a46cd712 |             | minio-connectivity                                  | ok       |                                                       |
| rda_infra | scheduler                              | rda-schedule | d5537051 |             | service-status                                      | ok       |                                                       |
| rda_infra | scheduler                              | rda-schedule | d5537051 |             | minio-connectivity                                  | ok       |                                                       |
| rda_infra | scheduler                              | rda-schedule | d5537051 |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | user-preferences                       | rda-user-pre | fd09d3ba |             | service-status                                      | ok       |                                                       |
| rda_app   | user-preferences                       | rda-user-pre | fd09d3ba |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | user-preferences                       | rda-user-pre | fd09d3ba |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | user-preferences                       | rda-user-pre | fd09d3ba |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | user-preferences                       | rda-user-pre | fd09d3ba |             | DB-connectivity                                     | ok       |                                                       |
| rda_infra | worker                                 | rda-worker-5 | 1f769792 | rda-site-01 | service-status                                      | ok       |                                                       |
| rda_infra | worker                                 | rda-worker-5 | 1f769792 | rda-site-01 | minio-connectivity                                  | ok       |                                                       |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+

Installing OIA (AIOps) Application Services:

Run the below command to deploy RDAF OIA (AIOps) application services. (Note: Below shown tag name is a sample one for a reference only, for actual tag, please contact CloudFabrix support team at support@cloudfabrix.com)

rdaf app install OIA --tag 7.0.6
rdafk8s app install OIA --tag 7.0.6

After installing the OIA (AIOps) application services, run the below command to see the running status of the deployed application services.

rdaf app status
+---------------------------------+----------------+-----------------+--------------+-------+
| Name                            | Host           | Status          | Container Id | Tag   |
+---------------------------------+----------------+-----------------+--------------+-------+
| rda-alert-ingester              | 192.168.125.46 | Up 20 Hours ago | 610bb0e286d6 | 7.0.6 |
| rda-alert-processor             | 192.168.125.46 | Up 20 Hours ago | 79ee6788f73e | 7.0.6 |
| rda-app-controller              | 192.168.125.46 | Up 20 Hours ago | 6c672102d5ff | 7.0.6 |
| rda-collaboration               | 192.168.125.46 | Up 20 Hours ago | 34f25c05afce | 7.0.6 |
| rda-configuration-service       | 192.168.125.46 | Up 20 Hours ago | 112ccaf4b0e6 | 7.0.6 |
| rda-dataset-caas-all-alerts     | 192.168.125.46 | Up 20 Hours ago | 2b48d4dfbfd0 | 7.0.6 |
| rda-dataset-caas-current-alerts | 192.168.125.46 | Up 20 Hours ago | 03cdc77ddf1f | 7.0.6 |
| rda-event-consumer              | 192.168.125.46 | Up 20 Hours ago | 21113ba951a1 | 7.0.6 |
| rda-file-browser                | 192.168.125.46 | Up 20 Hours ago | 425dac228fc9 | 7.0.6 |
| rda-ingestion-tracker           | 192.168.125.46 | Up 20 Hours ago | 8a984a536a97 | 7.0.6 |
| rda-irm-service                 | 192.168.125.46 | Up 20 Hours ago | 258aadc0c1af | 7.0.6 |
| rda-ml-config                   | 192.168.125.46 | Up 20 Hours ago | bf23d58903f7 | 7.0.6 |
| rda-notification-service        | 192.168.125.46 | Up 20 Hours ago | a15c5232b25d | 7.0.6 |
| rda-reports-registry            | 192.168.125.46 | Up 20 Hours ago | 3890b5dfb8ae | 7.0.6 |
| rda-smtp-server                 | 192.168.125.46 | Up 20 Hours ago | 6aadab781947 | 7.0.6 |
| rda-webhook-server              | 192.168.125.46 | Up 20 Hours ago | 6bf555aed18b | 7.0.6 |
+---------------------------------+--------------+-----------------+--------------+-------+
rdafk8s app status
+---------------------------------+----------------+-----------------+--------------+-------+
| Name                            | Host           | Status          | Container Id | Tag   |
+---------------------------------+----------------+-----------------+--------------+-------+
| rda-alert-ingester              | 192.168.125.46 | Up 20 Hours ago | 610bb0e286d6 | 7.0.6 |
| rda-alert-processor             | 192.168.125.46 | Up 20 Hours ago | 79ee6788f73e | 7.0.6 |
| rda-app-controller              | 192.168.125.46 | Up 20 Hours ago | 6c672102d5ff | 7.0.6 |
| rda-collaboration               | 192.168.125.46 | Up 20 Hours ago | 34f25c05afce | 7.0.6 |
| rda-configuration-service       | 192.168.125.46 | Up 20 Hours ago | 112ccaf4b0e6 | 7.0.6 |
| rda-dataset-caas-all-alerts     | 192.168.125.46 | Up 20 Hours ago | 2b48d4dfbfd0 | 7.0.6 |
| rda-dataset-caas-current-alerts | 192.168.125.46 | Up 20 Hours ago | 03cdc77ddf1f | 7.0.6 |
| rda-event-consumer              | 192.168.125.46 | Up 20 Hours ago | 21113ba951a1 | 7.0.6 |
| rda-file-browser                | 192.168.125.46 | Up 20 Hours ago | 425dac228fc9 | 7.0.6 |
| rda-ingestion-tracker           | 192.168.125.46 | Up 20 Hours ago | 8a984a536a97 | 7.0.6 |
| rda-irm-service                 | 192.168.125.46 | Up 20 Hours ago | 258aadc0c1af | 7.0.6 |
| rda-ml-config                   | 192.168.125.46 | Up 20 Hours ago | bf23d58903f7 | 7.0.6 |
| rda-notification-service        | 192.168.125.46 | Up 20 Hours ago | a15c5232b25d | 7.0.6 |
| rda-reports-registry            | 192.168.125.46 | Up 20 Hours ago | 3890b5dfb8ae | 7.0.6 |
| rda-smtp-server                 | 192.168.125.46 | Up 20 Hours ago | 6aadab781947 | 7.0.6 |
| rda-webhook-server              | 192.168.125.46 | Up 20 Hours ago | 6bf555aed18b | 7.0.6 |
+---------------------------------+--------------+-----------------+--------------+-------+

Configuring OIA (AIOps) Application:

Login into RDAF portal as admin@cfx.com user.

Create a new Service Blueprint for OIA (AIOps) application and Machine Learning (ML) application.

For OIA (AIOps) Application: Go to Main Menu --> Configuration --> Artifacts --> Service Blueprints --> View details --> Click on Add and copy & paste the below configuration and Click on Save

name: cfxOIA
id: 81a1a2202
version: 2023_02_12_01
category: ITOM
comment: Operations Intelligence & Analytics (AIOps)
enabled: true
type: Service
provider: CloudFabrix Software, Inc.
attrs: {}
apps:
    -   label: cfxOIA
        appType: dimensions
        appName: incident-room-manager
        icon_url: /assets/img/applications/OIA.png
        permission: app:irm:read
service_pipelines: []

For Machine Learning (ML) Application: Go to Main Menu --> Configuration --> Artifacts --> Service Blueprints --> View details --> Click on Add and copy & paste the below configuration and Click on Save

name: cfxML
id: 81a1a030
version: 2023_02_12_01
category: ITOM
comment: Machine Learning (ML) Experiments
enabled: true
type: Service
provider: CloudFabrix Software, Inc.
attrs: {}
apps:
    -   label: cfxML
        appType: dimensions
        appName: ml-config
        icon_url: /assets/img/applications/ML.png
        permission: app:irm:read
service_pipelines: []

CFXOIA_App_ML_App_Add

2. Upgrade

This section provides instructions on how to upgrade existing deployment of RDAF platform and it's application OIA (Operations Intelligence & Analytics), which is also referred as AIOps.

2.1 Upgrade from 7.0.x to 7.0.6

Upgrade Prerequisites

Below are the pre-requisites which need to be in place before upgrafing the OIA (AIOps) application services.

RDAF Deployment CLI Version Upgrade: From 1.0.6 or higher to 1.1.2

RDAF Infrastructure Services Tag Version: From 1.0.1 or higher to 1.0.2 (Note: Not applicable if the services are already running at 1.0.2 version)

RDAF Core Platform & Worker Services Tag Version: From 3.0.9 to 3.1.0

RDAF Client (RDAC) Tag Version: From 3.0.9 to 3.1.0

Warning

Please complete all of the above pre-requisites before installing the OIA (AIOps) application services.

On-premise docker-registry

Login into RDAF on-premise docker-registry VM or RDAF platform VM as rdauser using SSH client on which rdaf CLI was installed and run the below command to verify status of the docker-registry service.

rdaf status
+-----------------+---------------+------------+--------------+-------+
| Name            | Host          | Status     | Container Id | Tag   |
+-----------------+---------------+------------+--------------+-------+
| docker-registry | 111.92.12.140 | Up 4 weeks | 71b8036fc64f | 1.0.1 |
+-----------------+---------------+------------+--------------+-------+

RDAF Infrastructure, Platform and Application services:

Login into RDAF on-premise docker-registry VM or RDAF platform VM as rdauser using SSH client on which rdaf CLI was installed and run the below command to verify status of the RDAF platform's infrastructure, core platform, application and worker services.

rdafk8s infra status
+----------------+--------------+-----------------+--------------+------------------------------+
| Name           | Host         | Status          | Container Id | Tag                          |
+----------------+--------------+-----------------+--------------+------------------------------+
| haproxy        | 111.92.12.41 | Up 6 days       | 245a37201207 | 1.0.2                        |
| keepalived     | 111.92.12.41 | Not Provisioned | N/A          | N/A                          |
| nats           | 111.92.12.41 | Up 6 days       | 15469a93d96f | 1.0.2                        |
| minio          | 111.92.12.41 | Up 6 days       | 3fd3f97bf25b | RELEASE.2022-11-07T23-47-39Z |
| mariadb        | 111.92.12.41 | Up 6 days       | 0fa1a0027993 | 1.0.2                        |
| opensearch     | 111.92.12.41 | Up 6 days       | dae308716400 | 1.0.2                        |
| zookeeper      | 111.92.12.41 | Up 6 days       | 4d8f61b4ab17 | 1.0.2                        |
| kafka          | 111.92.12.41 | Up 6 days       | 0dee08cd9c59 | 1.0.2                        |
| redis          | 111.92.12.41 | Up 6 days       | d1eccf90846e | 1.0.2                        |
| redis-sentinel | 111.92.12.41 | Up 6 days       | 683beb7b913e | 1.0.2                        |
+----------------+--------------+-----------------+--------------+------------------------------+
rdafk8s platform status
+--------------------------+--------------+-----------+--------------+-------+
| Name                     | Host         | Status    | Container Id | Tag   |
+--------------------------+--------------+-----------+--------------+-------+
| cfx-rda-access-manager   | 111.92.12.41 | Up 6 days | e487cdf24b46 | 3.0.9 |
| cfx-rda-resource-manager | 111.92.12.41 | Up 6 days | a7a21a31a26e | 3.0.9 |
| cfx-rda-user-preferences | 111.92.12.41 | Up 6 days | 9306d8da4b5a | 3.0.9 |
| portal-backend           | 111.92.12.41 | Up 6 days | 55df761dad1d | 3.0.9 |
| portal-frontend          | 111.92.12.41 | Up 6 days | 2183f00efa64 | 3.0.9 |
| rda_api_server           | 111.92.12.41 | Up 6 days | 3ba6256d1694 | 3.0.9 |
| rda_asset_dependency     | 111.92.12.41 | Up 6 days | d1a8b76bb114 | 3.0.9 |
| rda_collector            | 111.92.12.41 | Up 6 days | 441427d2bb1e | 3.0.9 |
| rda_identity             | 111.92.12.41 | Up 6 days | 2c1215d9155a | 3.0.9 |
| rda_registry             | 111.92.12.41 | Up 6 days | 7358e6ee6298 | 3.0.9 |
| rda_scheduler            | 111.92.12.41 | Up 6 days | ee72c66f8c80 | 3.0.9 |
+--------------------------+--------------+-----------+--------------+-------+
rdafk8s worker status
+------------+--------------+-----------+--------------+-------+
| Name       | Host         | Status    | Container Id | Tag   |
+------------+--------------+-----------+--------------+-------+
| rda_worker | 111.92.12.43 | Up 6 days | 88f4916ce18e | 3.0.9 |
| rda_worker | 111.92.12.43 | Up 6 days | 88f491612345 | 3.0.9 |
+------------+--------------+-----------+--------------+-------+
rdafk8s app status
+------------------------------+--------------+-----------+--------------+-------+
| Name                         | Host         | Status    | Container Id | Tag   |
+------------------------------+--------------+-----------+--------------+-------+
| all-alerts-cfx-rda-dataset-  | 111.92.12.42 | Up 6 days | 58a75c01c51f | 7.0.5 |
| caas                         |              |           |              |       |
| cfx-rda-alert-ingester       | 111.92.12.42 | Up 6 days | bc9a78953b73 | 7.0.5 |
| cfx-rda-alert-processor      | 111.92.12.42 | Up 6 days | 28401e5c2570 | 7.0.5 |
| cfx-rda-app-builder          | 111.92.12.42 | Up 6 days | be8f100056fd | 7.0.5 |
| cfx-rda-app-controller       | 111.92.12.42 | Up 6 days | a7a4ef35097d | 7.0.5 |
| cfx-rda-collaboration        | 111.92.12.42 | Up 6 days | d9d980b28a2b | 7.0.5 |
| cfx-rda-configuration-       | 111.92.12.42 | Up 6 days | db1a45835e1a | 7.0.5 |
| service                      |              |           |              |       |
| cfx-rda-event-consumer       | 111.92.12.42 | Up 6 days | baf09bad3ce1 | 7.0.5 |
| cfx-rda-file-browser         | 111.92.12.42 | Up 6 days | 32ccdfca8d8f | 7.0.5 |
| cfx-rda-ingestion-tracker    | 111.92.12.42 | Up 6 days | 1030345f2179 | 7.0.5 |
| cfx-rda-irm-service          | 111.92.12.42 | Up 6 days | 89d931f7d7b8 | 7.0.5 |
| cfx-rda-ml-config            | 111.92.12.42 | Up 6 days | 57fc39489a08 | 7.0.5 |
| cfx-rda-notification-service | 111.92.12.42 | Up 6 days | 408dbebb33c5 | 7.0.5 |
| cfx-rda-reports-registry     | 111.92.12.42 | Up 6 days | 3296cba8b3e4 | 7.0.5 |
| cfx-rda-smtp-server          | 111.92.12.42 | Up 6 days | 0f9884b6e7c8 | 7.0.5 |
| cfx-rda-webhook-server       | 111.92.12.42 | Up 6 days | a4403dee414e | 7.0.5 |
| current-alerts-cfx-rda-      | 111.92.12.42 | Up 6 days | d6cc63214103 | 7.0.5 |
| dataset-caas                 |              |           |              |       |
+------------------------------+--------------+-----------+--------------+-------+

Important

Please take RDAF platform's full data backup before performing an upgrade. For more information on RDAF platform's backup and restore commands using rdaf CLI, please refer at RDAF Platform Backup

Download RDAF Platform & OIA Images

  • Login into On-premise docker registry VM as rdauser using SSH client and run the below command to download RDAF platform's and OIA (AIOps) application service's updated images.
rdaf registry fetch --tag 1.0.2,3.1.0,7.0.6
  • Please wait until all of the RDAF platform's and OIA (AIOps) application service's images are downloaded. Run the below command to verify if the images are downloaded appropriately.
rdaf registry list-tags

Upgrade RDAF deployment CLI on RDAF Platform VM

Please follow and repeat the steps to download and upgrade the rdaf deployment CLI on RDAF platform VM using the steps outlined under RDAF CLI Upgrade on On-premise docker registry VM section.

Upgrade RDAF Platform & OIA Services

RDAF Platform Services Upgrade:

Run the below command to upgrade RDAF platform's services to version

rdafk8s platform upgrade --tag 3.1.0

Once above command is completed, run the below command to verify all of RDAF platform's services are upgraded to the specified version and all of their corresponding containers are in running state.

kubectl get pods -n rda-fabric -l app_category=rdaf-platform

RDAF Client CLI Upgrade:

Run the below command to upgrade the RDAF client CLI rdac to latest version.

rdafk8s rdac_cli upgrade --tag 3.1.0

After rdac CLI is upgraded, run the below commands to see all of the running RDAF platform's services pods.

rdac pods 
+-------+----------------------------------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Host         | ID       | Site        | Age             |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| App   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | 4:13:45         |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de |             | 1 day, 18:33:27 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | 4:13:31         |      8 |        31.21 |               |              |
| App   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | 4:13:14         |      8 |        31.21 |               |              |
| Infra | api-server                             | 0656b4230f44 | 6d4d40ab |             | 0:33:06         |      8 |        31.21 |               |              |
| Infra | collector                              | 6336341682ad | 042af0af |             | 4:11:19         |      8 |        31.21 |               |              |
| Infra | registry                               | cae649622fba | 4e4c4a4d |             | 4:11:03         |      8 |        31.21 |               |              |
| Infra | scheduler                              | 3ab379305be1 | b2bb9915 | *leader*    | 4:10:59         |      8 |        31.21 |               |              |
+-------+----------------------------------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+

Run the below command to verify functional health of each platform's service and verify all of their status is in OK state.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                               |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_infra | api-server                             | 0656b4230f44 | 6d4d40ab |             | service-status                                      | ok       |                                                       |
| rda_infra | api-server                             | 0656b4230f44 | 6d4d40ab |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | asset-dependency                       | e006dfd39d9b | 9f02a8f1 |             | service-status                                      | ok       |                                                       |
| rda_app   | asset-dependency                       | e006dfd39d9b | 9f02a8f1 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | authenticator                          | 1782a79e36c5 | adda9bc0 |             | service-status                                      | ok       |                                                       |
| rda_app   | authenticator                          | 1782a79e36c5 | adda9bc0 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | authenticator                          | 1782a79e36c5 | adda9bc0 |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | DB-connectivity                                     | ok       |                                                       |                                                  |
| rda_app   | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | service-dependency:cfxdimensions-app-access-manager | ok       | 1 pod(s) found for cfxdimensions-app-access-manager   |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | DB-connectivity                                     | ok       |                                                       |
| rda_infra | collector                              | 6336341682ad | 042af0af |             | service-status                                      | ok       |                                                       |
| rda_infra | collector                              | 6336341682ad | 042af0af |             | minio-connectivity                                  | ok       |                                                       |
| rda_infra | collector                              | 6336341682ad | 042af0af |             | opensearch-connectivity:default                     | ok       |                                                       |
| rda_infra | scheduler                              | 3ab379305be1 | b2bb9915 |             | service-status                                      | ok       |                                                       |
| rda_infra | scheduler                              | 3ab379305be1 | b2bb9915 |             | minio-connectivity                                  | ok       |                                                       |
| rda_infra | scheduler                              | 3ab379305be1 | b2bb9915 |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | service-status                                      | ok       |                                                       |
| rda_app   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | DB-connectivity                                     | ok       |                                                       |                                      |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+

RDAF Worker Service Upgrade:

Run the below command to upgrade RDAF worker services to latest version.

rdafk8s worker upgrade --tag 3.1.0

After upgrading the RDAF worker service using the above command, run the below command to verify it's running status and the version.

kubectl get pods -n rda-fabric -l app_category=rdaf-worker
+------------+--------------+-------------+--------------+-------+
| Name       | Host         | Status      | Container Id | Tag   |
+------------+--------------+-------------+--------------+-------+
| rda_worker | 111.92.12.60 | Up 1 minute | 4ce2a8f13d16 | 3.1.0 |
+------------+--------------+-------------+--------------+-------+

Run the below command to verify the functional health of the each RDAF worker service and verify that all of their status is in OK state.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                               |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_infra | api-server                             | 0656b4230f44 | 6d4d40ab |             | service-status                                      | ok       |                                                       |
...
...
| rda_infra | worker                                 | 4ce2a8f13d16 | d627124d | rda-site-01 | service-status                                      | ok       |                                                       |
| rda_infra | worker                                 | 4ce2a8f13d16 | d627124d | rda-site-01 | minio-connectivity                                  | ok       |                                                       |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+

Create Kafka Topics for OIA Application Services:

Download the below script and execute it on where rdafk8s setup was run during the initial RDAF platform setup. Please make sure the file /opt/rdaf/rdaf.cfg exist which is required for the below script to execute successfully.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.2/add_kafka_topics.py
python add_kafka_topics.py upgrade

RDAF OIA Application Services Upgrade:

Run the below command to upgrade the RDAF OIA (AIOps) application services to latest version.

rdafk8s app upgrade OIA --tag 7.0.6

Once above command is completed, run the below command to verify all of the RDAF OIA application services are upgraded to the specified version and all of their corresponding containers are in running state.

kubectl get pods -n rda-fabric -l app_category=rdaf-application

Wait for 3 to 5 minutes and run the below command to verify the functional health of each RDAF OIA application service and verify all of their status is in OK state.

rdac healthcheck

2.2. Upgrade from 7.2.0.x to 7.2.1.1

RDAF Platform: From 3.2.0.3 to 3.2.1.3

OIA (AIOps) Application: From 7.2.0.3 to 7.2.1.1/7.2.1.5

RDAF Deployment rdaf & rdafk8s CLI: From 1.1.7 to 1.1.8

RDAF Client rdac CLI: From 3.2.0.3 to 3.2.1.3

2.2.1. Upgrade Prerequisites

Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.

Important

Please make sure full backup of the RDAF platform system is completed before performing the upgrade.

Please run the below backup command to take the backup of application data.

rdafk8s backup --dest-dir <backup-dir>
Note: Please make sure this backup-dir is mounted across all infra,cli vms.

Run the below command on RDAF Management system and make sure the Kubernetes PODs are NOT in restarting mode (it is applicable to only Kubernetes environment)

kubectl get pods -n rda-fabric -l app_category=rdaf-infra
kubectl get pods -n rda-fabric -l app_category=rdaf-platform
kubectl get pods -n rda-fabric -l app_component=rda-worker 
kubectl get pods -n rda-fabric -l app_name=oia 

  • Verify that RDAF deployment rdaf & rdafk8s CLI's version is 1.1.7 on the VM where CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.
rdaf --version
rdafk8s --version
  • On-premise docker registry service version is 1.0.2
docker ps | grep docker-registry
  • RDAF Infrastructure services version is 1.0.2 (rda-nats service version is 1.0.2.1 and rda-minio service version is RELEASE.2022-11-11T03-44-20Z)

Run the below command to get RDAF Infra services details

rdafk8s infra status

Run the below command to get RDAF Infra services details

rdaf infra status
  • RDAF Platform services version is 3.2.0.3

Run the below command to get RDAF Platform services details

rdafk8s platform status

Run the below command to get RDAF Platform services details

rdaf platform status
  • RDAF OIA Application services version is 7.2.0.3 (rda-event-consumer service version is 7.2.0.5)

Run the below command to get RDAF App services details

rdafk8s app status

Run the below command to get RDAF App services details

rdaf app status

Login into the VM where rdaf & rdafk8s deployment CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.

  • Download the RDAF Deployment CLI's newer version 1.1.8 bundle.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/rdafcli-1.1.8.tar.gz
  • Upgrade the rdaf & rdafk8s CLI to version 1.1.8
pip install --user rdafcli-1.1.8.tar.gz
  • Verify the installed rdaf & rdafk8s CLI version is upgraded to 1.1.8
rdaf --version
rdafk8s --version

Download the below python script which is going to be used to identify K8s POD names for each RDA Fabric service POD Ids. Skip this step if this script was already downloaded.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.6/maint_command.py

Download the below upgrade python script.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/rdaf_k8s_upgrade_117_118.py

Please run the below python upgrade script. It creates a kafka topic called fsm-events, creates /opt/rdaf/config/network_config/policy.json file, and adds rda-fsm service to values.yaml file.

python rdaf_k8s_upgrade_117_118.py

Important

Please make sure above upgrade script is executed before moving to next step.

  • Download the RDAF Deployment CLI's newer version 1.1.8 bundle and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.

For RHEL OS Environment

wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/offline-rhel-1.1.8.tar.gz

For Ubuntu OS Environment

wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/offline-ubuntu-1.1.8.tar.gz

  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.1.8.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.1.8
  • Upgrade the rdaf & rdafk8s CLI to version 1.1.8
pip install --user rdafcli-1.1.8.tar.gz -f ./ --no-index
  • Verify the installed rdaf & rdafk8s CLI version
rdaf --version
rdafk8s --version

Download the below upgrade script and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/rdaf_k8s_upgrade_117_118.py

Please run the downloaded python upgrade script. It creates a kafka topic called fsm-events, creates /opt/rdaf/config/network_config/policy.json file, and adds rda-fsm service to values.yaml file.

python rdaf_k8s_upgrade_117_118.py

Important

Please make sure above upgrade script is executed before moving to next step.

  • Download the RDAF Deployment CLI's newer version 1.1.8 bundle
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/offline-rhel-1.1.8.tar.gz
  • Upgrade the rdaf CLI to version 1.1.8
pip install --user rdafcli-1.1.8.tar.gz
  • Verify the installed rdaf CLI version is upgraded to 1.1.8
rdaf --version
  • To stop application services, run the below command. Wait until all of the services are stopped.
rdaf app down OIA
rdaf app status
  • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
rdaf worker down
rdaf worker status
  • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
rdaf platform down

rdaf platform status
* Upgrade kafka using below command

rdaf infra upgrade --tag 1.0.2 --service kafka

Run the below RDAF command to check infra status

rdaf infra status
+----------------+--------------+-----------------+--------------+-------+
| Name           | Host         | Status          | Container Id | Tag   |
+----------------+--------------+-----------------+--------------+-------+
| haproxy        | 192.168.131.41 | Up 2 weeks      | ee9d25dc2276 | 1.0.2 |
|                |              |                 |              |       |
| haproxy        | 192.168.131.42 | Up 2 weeks      | e6ad57ac421d | 1.0.2 |
|                |              |                 |              |       |
| keepalived     | 192.168.131.41 | active          | N/A          | N/A   |
|                |              |                 |              |       |
| keepalived     | 192.168.131.42 | active          | N/A          | N/A   |
+----------------+--------------+-----------------+--------------+-------+

Run the below RDAF command to check infra healthcheck status

rdaf infra healthcheck
+----------------+-----------------+--------+----------------------+--------------+--------------+
| Name           | Check           | Status | Reason               | Host         | Container Id |
+----------------+-----------------+--------+----------------------+--------------+--------------+
| haproxy        | Port Connection | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Service Status  | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Firewall Port   | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Port Connection | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| haproxy        | Service Status  | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| haproxy        | Firewall Port   | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| keepalived     | Service Status  | OK     | N/A                  | 192.168.107.63 | N/A          |
| keepalived     | Service Status  | OK     | N/A                  | 192.168.107.64 | N/A          |
| nats           | Port Connection | OK     | N/A                  | 192.168.107.63 | f57ed825681b |
| nats           | Service Status  | OK     | N/A                  | 192.168.107.63 | f57ed825681b |
+----------------+-----------------+--------+----------------------+--------------+--------------+

Note

Please take backup of /opt/rdaf/deployment-scripts/values.yaml

Download the below upgrade python script.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/rdaf_upgrade_116_117_118.py

Please run the below python upgrade script. It creates a few topics for applying config changes to existing topics in HA setups. creates /opt/rdaf/config/network_config/policy.json file, adding fsm service to values.yaml file.

python rdaf_upgrade_116_117_118.py

Important

Please make sure above upgrade script is executed before moving to next step.

  • Download the RDAF Deployment CLI's newer version 1.1.8 bundle and copy it to RDAF management VM on which rdaf deployment CLI was installed.

For RHEL OS Environment

wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/offline-rhel-1.1.8.tar.gz

For Ubuntu OS Environment

wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/offline-ubuntu-1.1.8.tar.gz

  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.1.8.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.1.8
  • Upgrade the rdafCLI to version 1.1.8
pip install --user rdafcli-1.1.8.tar.gz -f ./ --no-index
  • Verify the installed rdaf & rdafk8s CLI version
rdaf --version

Download the below upgrade script and copy it to RDAF management VM on which rdaf deployment CLI was installed.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/rdaf_upgrade_116_117_118.py

Please run the below python upgrade script. It creates a few topics for applying config changes to existing topics in HA setups. creates /opt/rdaf/config/network_config/policy.json file, adding fsm service to values.yaml file.

python rdaf_upgrade_116_117_118.py

Important

Please make sure above upgrade script is executed before moving to next step.

2.2.2. Download the new Docker Images

Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.

rdaf registry fetch --tag 3.2.1.3,7.2.1.1,7.2.1.5,7.2.1.6

Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.

rdaf registry list-tags 

Please make sure 3.2.1.3 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-registry
  • rda-rda-scheduler
  • rda-collector
  • rda-stack-mgr
  • rda-identity
  • rda-fsm
  • rda-access-manager
  • rda-resource-manager
  • rda-user-preferences
  • onprem-portal
  • onprem-portal-nginx
  • rda-worker-all
  • onprem-portal-dbinit
  • cfxdx-nb-nginx-all
  • rda-event-gateway
  • rdac
  • rdac-full

Please make sure 7.2.1.1 image tag is downloaded for the below RDAF OIA Application services.

  • rda-app-controller
  • rda-alert-processor
  • rda-file-browser
  • rda-smtp-server
  • rda-ingestion-tracker
  • rda-reports-registry
  • rda-ml-config
  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-alert-ingester
  • rda-collaboration
  • rda-notification-service
  • rda-configuration-service

Please make sure 7.2.1.5 image tag is downloaded for the below RDAF OIA Application services.

  • rda-smtp-server
  • rda-event-consumer
  • rda-webhook-server
  • rda-collaboration
  • rda-configuration-service
  • rda-alert-ingester

Downloaded Docker images are stored under the below path.

/opt/rdaf/data/docker/registry/v2

Run the below command to check the filesystem's disk usage on which docker images are stored.

df -h /opt

Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.

rdaf registry delete-images --tag <tag1,tag2>

2.2.3.Upgrade Steps

2.2.3.1 Upgrade RDAF Platform Services

Step-1: Run the below command to initiate upgrading RDAF Platform services.

rdafk8s platform upgrade --tag 3.2.1.3

As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state.

kubectl get pods -n rda-fabric -l app_category=rdaf-platform

Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF platform service PODs

for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.

Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.1.3 version.

rdafk8s platform status
+----------------------+--------------+---------------+--------------+---------+
| Name                 | Host         | Status        | Container Id | Tag     |
+----------------------+--------------+---------------+--------------+---------+
| rda-api-server       | 192.168.131.45 | Up 2 Days ago | dde8ab1f9331 | 3.2.1.3 |
| rda-api-server       | 192.168.131.44 | Up 2 Days ago | e6ece7235e72 | 3.2.1.3 |
| rda-registry         | 192.168.131.45 | Up 2 Days ago | a577766fb8b2 | 3.2.1.3 |
| rda-registry         | 192.168.131.44 | Up 2 Days ago | 1aecc089b0c3 | 3.2.1.3 |
| rda-identity         | 192.168.131.45 | Up 2 Days ago | fea1c0ef7263 | 3.2.1.3 |
| rda-identity         | 192.168.131.44 | Up 2 Days ago | 2a48f402f678 | 3.2.1.3 |
| rda-fsm              | 192.168.131.45 | Up 2 Days ago | 5006c8a6e5f3 | 3.2.1.3 |
| rda-fsm              | 192.168.131.44 | Up 2 Days ago | 199cac791a90 | 3.2.1.3 |
| rda-access-manager   | 192.168.131.44 | Up 2 Days ago | e20495c61be2 | 3.2.1.3 |
| ....                 | ....           | ....          | ....         | ....    |
+----------------------+--------------+---------------+--------------+---------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | cf4c4f37c47a | 0633b451 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | cf4c4f37c47a | 0633b451 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | cf4c4f37c47a | 0633b451 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | cf4c4f37c47a | 0633b451 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | cf4c4f37c47a | 0633b451 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app   | alert-ingester                         | 7b9f1370e018 | f348532b |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 7b9f1370e018 | f348532b |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 7b9f1370e018 | f348532b |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 7b9f1370e018 | f348532b |             | service-initialization-status                       | ok       |                                                             |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.2.3.2 Upgrade rdac CLI

Run the below command to upgrade the rdac CLI

rdafk8s rdac_cli upgrade --tag 3.2.1.3
2.2.3.3 Upgrade RDA Worker Services

Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.

rdafk8s worker upgrade --tag 3.2.1.3

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating state.

kubectl get pods -n rda-fabric -l app_component=rda-worker

Step-3: Run the below command to put all Terminating RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance command that is required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF worker service PODs

for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds between each RDAF worker service upgrade by repeating above steps from Step-2 to Step-6 for rest of the RDAF worker service PODs.

Step-6: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.

rdac pods | grep rda-worker
rdafk8s worker status
+------------+--------------+---------------+--------------+---------+
| Name       | Host         | Status        | Container Id | Tag     |
+------------+--------------+---------------+--------------+---------+
| rda-worker | 192.168.131.50 | Up 2 Days ago | 497059c45d6e | 3.2.1.3 |
| rda-worker | 192.168.131.49 | Up 2 Days ago | 434b2ca40ed8 | 3.2.1.3 |
| ....       | ....           | ....          | ....         | ....    |
+------------+--------------+---------------+--------------+---------+

Step-7: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck
2.2.3.4 Upgrade OIA Application Services

Step-1: Run the below commands to initiate upgrading RDAF OIA Application services. First command upgrades the specified services from 7.2.0.3 to 7.2.1.1 and following command upgrades rest of the services from 7.2.0.3 to 7.2.1.5 and 7.2.1.6 respectively

rdafk8s app upgrade OIA --tag 7.2.1.1 --service rda-app-controller --service rda-alert-processor --service rda-file-browser --service rda-ingestion-tracker --service rda-reports-registry --service rda-ml-config --service rda-irm-service --service rda-notification-service
rdafk8s app upgrade OIA --tag 7.2.1.5 --service rda-smtp-server --service rda-event-consumer --service rda-webhook-server --service rda-collaboration --service rda-configuration-service
rdafk8s app upgrade OIA --tag 7.2.1.6 --service rda-alert-ingester

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each OIA application service is in Terminating state.

kubectl get pods -n rda-fabric -l app_name=oia

Step-3: Run the below command to put all Terminating OIA application service PODs into maintenance mode. It will list all of the POD Ids of OIA application services along with rdac maintenance command that are required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-oia-app-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the OIA application services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating OIA application service PODs

for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
kubectl get pods -n rda-fabric -l app_name=oia

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the OIA application service PODs.

Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.1.1 or 7.2.1.5 version.

rdafk8s app status
+---------------------------+--------------+---------------+--------------+---------+
| Name                      | Host         | Status        | Container Id | Tag     |
+---------------------------+--------------+---------------+--------------+---------+
| rda-alert-ingester        | 192.168.131.49 | Up 5 Days ago | b323998abd15 | 7.2.1.1 |
| rda-alert-ingester        | 192.168.131.50 | Up 5 Days ago | 710f262e27aa | 7.2.1.1 |
| rda-alert-processor       | 192.168.131.47 | Up 5 Days ago | ec1c53d94439 | 7.2.1.1 |
| rda-alert-processor       | 192.168.131.46 | Up 5 Days ago | deee4db62708 | 7.2.1.1 |
| rda-app-controller        | 192.168.131.49 | Up 5 Days ago | ef96deb9adda | 7.2.1.1 |
| rda-app-controller        | 192.168.131.50 | Up 5 Days ago | 6880b5632adb | 7.2.1.1 |
| rda-collaboration         | 192.168.131.49 | Up 2 Days ago | cc1b1c882250 | 7.2.1.5 |
| rda-collaboration         | 192.168.131.50 | Up 2 Days ago | 13be7e8bfa3f | 7.2.1.5 |
+---------------------------+--------------+---------------+--------------+---------+

Step-7: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

Warning

For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Run the below command to initiate upgrading RDAF Platform services.

rdaf platform upgrade --tag 3.2.1.3
Please wait till all of the new platform service are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.1.3 version.
rdaf platform status

+--------------------------+--------------+------------+--------------+---------+
| Name                     | Host         | Status     | Container Id | Tag     |
+--------------------------+--------------+------------+--------------+---------+
| cfx-rda-access-manager   | 192.168.107.60 | Up 6 hours | 80dac9d727a3 | 3.2.1.3 |
| cfx-rda-resource-manager | 192.168.107.60 | Up 6 hours | 68534a5c1d4c | 3.2.1.3 |
| cfx-rda-user-preferences | 192.168.107.60 | Up 6 hours | 78405b639915 | 3.2.1.3 |
| portal-backend           | 192.168.107.60 | Up 6 hours | 636e6968f661 | 3.2.1.3 |
| portal-frontend          | 192.168.107.60 | Up 6 hours | 2fd426bd6aa2 | 3.2.1.3 |
| rda_api_server           | 192.168.107.60 | Up 6 hours | e0994b366f98 | 3.2.1.3 |
| rda_asset_dependency     | 192.168.107.60 | Up 6 hours | 07610621408c | 3.2.1.3 |
| rda_collector            | 192.168.107.60 | Up 6 hours | 467d6b3d13f8 | 3.2.1.3 |
| rda_fsm                  | 192.168.107.60 | Up 6 hours | e32de86fe341 | 3.2.1.3 |
| rda_identity             | 192.168.107.60 | Up 6 hours | 45136d89b2cf | 3.2.1.3 |
| rda_registry             | 192.168.107.60 | Up 6 hours | 334d7d4cfa41 | 3.2.1.3 |
| rda_scheduler            | 192.168.107.60 | Up 6 hours | acf5a9ab556a | 3.2.1.3 |
+--------------------------+--------------+------------+--------------+---------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=1, Brokers=[2, 3, 1] |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
  • Upgrade rdac CLI

Run the below command to upgrade the rdac CLI

rdaf rdac_cli upgrade --tag 3.2.1.3
  • Upgrade RDA Worker Services

Please run the below command to initiate upgrading the RDA Worker service PODs.

rdaf worker upgrade --tag 3.2.1.3
Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.

rdac pods | grep worker
rdaf worker status

+------------+--------------+-----------+--------------+---------+
| Name       | Host         | Status    | Container Id | Tag     |
+------------+--------------+-----------+--------------+---------+
| rda_worker | 192.168.107.61 | Up 2 days | d951118ee757 | 3.2.1.3 |
| rda_worker | 192.168.107.62 | Up 2 days | f7033a72f013 | 3.2.1.3 |
+------------+--------------+-----------+--------------+---------+
Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck

Run the below commands to initiate upgrading RDAF OIA Application services. First command upgrades the specified services from 7.2.0.3 to 7.2.1.1 and second command upgrades rest of the services from 7.2.0.3 to 7.2.1.5

rdaf app upgrade OIA --tag 7.2.1.1 --service cfx-rda-app-controller --service cfx-rda-alert-processor --service cfx-rda-file-browser --service cfx-rda-ingestion-tracker --service cfx-rda-reports-registry --service cfx-rda-ml-config --service cfx-rda-irm-service --service cfx-rda-notification-service
rdaf app upgrade OIA --tag 7.2.1.5 --service cfx-rda-smtp-server --service  cfx-rda-event-consumer  --service cfx-rda-webhook-server --service  cfx-rda-collaboration --service cfx-rda-configuration-service 
rdaf app upgrade OIA --tag 7.2.1.6 --service  cfx-rda-alert-ingester

Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.1.1 or 7.2.1.5 version.

rdaf app status
+-------------------------------+--------------+-----------+--------------+---------+
| Name                          | Host         | Status    | Container Id | Tag     |
+-------------------------------+--------------+-----------+--------------+---------+
| cfx-rda-alert-ingester        | 192.168.107.66 | Up 2 days | 79d6756db639 | 7.2.1.5 |
| cfx-rda-alert-ingester        | 192.168.107.67 | Up 2 days | 9a0775246a0f | 7.2.1.5 |
| cfx-rda-alert-processor       | 192.168.107.66 | Up 2 days | 057552584cfe | 7.2.1.1 |
| cfx-rda-alert-processor       | 192.168.107.67 | Up 2 days | 787f0cb42734 | 7.2.1.1 |
| cfx-rda-app-controller        | 192.168.107.66 | Up 2 days | 07f406e984ad | 7.2.1.1 |
| cfx-rda-app-controller        | 192.168.107.67 | Up 2 days | 0b27802473c1 | 7.2.1.1 |
| cfx-rda-collaboration         | 192.168.107.66 | Up 2 days | 7322550c3cee | 7.2.1.5 |
+-------------------------------+--------------+-----------+--------------+---------+

2.2.4.Post Upgrade Steps

  • (Optional) Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action

  • Enable ML experiments manually if any experiments are configured (Organization --> Configuration --> ML Experiments)

  • (Optional) Add the following to All Incident Mappings.

## prefarbly after projectId field's json block

{
  "to": "notificationId",
  "from": "notificationId"
}, 
  • (Optional) New option called skip_retry_on_keywords is added within the Incident mapper, which will allow the user to control when to skip retry attempt while making an API call during create or update ticket operations on external ITSM system. (Ex: ServiceNow).

In the below example, if the API error response contains serviceIdentifier is not available or Ticket is already in inactive state no update is allowed message, it will skip retrying the API call as these are expected errors and retrying will not API successful.

{
  "to": "skip_retry_on_keywords",
  "func": {
    "evaluate": {
      "expr": "'[\"serviceIdentifier is not available\",\"Ticket is already in Inactive state no update is allowed\"]'"
    }
  }
}

2.3. Upgrade from 7.2.1.x to 7.2.2

2.3.1. Pre-requisites

Below are the pre-requisites which need to be in place before upgrading the OIA (AIOps) application services.

  • RDAF Deployment CLI Version: 1.1.8

  • RDAF Infrastructure Services Tag Version: 1.0.2,1.0.2.1(nats)

  • RDAF Core Platform & Worker Services Tag Version: 3.2.1 / 3.2.1.x

  • RDAF Client (RDAC) Tag Version: 3.2.1 / 3.2.1.x

  • OIA Services Tag Version: 7.2.1 / 7.2.1.x

  • CloudFabrix recommends taking VMware VM snapshots where AIOps solution is deployed

Important

Applicable only if FSM is configured for ITSM ticketing:

Before proceeding with the upgrade, please make sure to disable the below Service Blueprints.

  • Create Ticket
  • Update Ticket
  • Resolve Ticket
  • Read Alert Stream
  • Read Incident Stream
  • Read ITSM ticketing Inbound Notifications

Warning

Make sure all of the above pre-requisites are met before proceeding with the upgrade process.

Warning

Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

  • Download the RDAF Deployment CLI's newer version 1.1.9 bundle
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/rdafcli-1.1.9.tar.gz
  • Upgrade the rdaf CLI to version 1.1.9
pip install --user rdafcli-1.1.9.tar.gz
  • Verify the installed rdaf CLI version is upgraded to 1.1.9
rdaf --version
  • Download the RDAF Deployment CLI's newer version 1.1.9 bundle and copy it to RDAF management VM on which rdaf deployment CLI was installed.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/offline-rhel-1.1.9.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.1.9.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.1.9
  • Upgrade the rdafCLI to version 1.1.9
pip install --user rdafcli-1.1.9.tar.gz -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/offline-ubuntu-1.1.9.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.1.9.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.1.9
  • Upgrade the rdafCLI to version 1.1.9
pip install --user rdafcli-1.1.9.tar.gz -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
  • To stop OIA (AIOps) application services, run the below command. Wait until all of the services are stopped.
rdaf app down OIA
rdaf app status
  • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
rdaf worker down
rdaf worker status
  • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.

rdaf platform down
rdaf platform status

  • Upgrade kafka using below command
rdaf infra upgrade --tag 1.0.2 --service kafka

Run the below RDAF command to check infra status

rdaf infra status
+----------------+--------------+-----------------+--------------+------------------------------+
| Name           | Host         | Status          | Container Id | Tag                          |
+----------------+--------------+-----------------+--------------+------------------------------+
| haproxy        | 192.168.107.40 | Up 2 weeks      | 92875cebe689 | 1.0.2                        |
| keepalived     | 192.168.107.40 | Not Provisioned | N/A          | N/A                          |
| nats           | 192.168.107.41 | Up 2 weeks      | e365e0b794c7 | 1.0.2.1                      |
| minio          | 192.168.107.41 | Up 2 weeks      | 900c8b078059 | RELEASE.2022-11-11T03-44-20Z |
| mariadb        | 192.168.107.41 | Up 2 weeks      | c549e07c2688 | 1.0.2                        |
| opensearch     | 192.168.107.41 | Up 2 weeks      | 783204d75ba9 | 1.0.2                        |
| zookeeper      | 192.168.107.41 | Up 2 weeks      | f51138ff8a95 | 1.0.2                        |
| kafka          | 192.168.107.41 | Up 4 days       | 255020d998c9 | 1.0.2                        |
| redis          | 192.168.107.41 | Up 2 weeks      | 5d929327121d | 1.0.2                        |
| redis-sentinel | 192.168.107.41 | Up 2 weeks      | 4a5fdde49a21 | 1.0.2                        |
+----------------+--------------+-----------------+--------------+------------------------------+

Run the below RDAF command to check infra healthcheck status

rdaf infra healthcheck
+----------------+-----------------+--------+----------------------+--------------+--------------+
| Name           | Check           | Status | Reason               | Host         | Container Id |
+----------------+-----------------+--------+----------------------+--------------+--------------+
| haproxy        | Port Connection | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Service Status  | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Firewall Port   | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Port Connection | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| haproxy        | Service Status  | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| haproxy        | Firewall Port   | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| keepalived     | Service Status  | OK     | N/A                  | 192.168.107.63 | N/A          |
| keepalived     | Service Status  | OK     | N/A                  | 192.168.107.64 | N/A          |
| nats           | Port Connection | OK     | N/A                  | 192.168.107.63 | f57ed825681b |
| nats           | Service Status  | OK     | N/A                  | 192.168.107.63 | f57ed825681b |
+----------------+-----------------+--------+----------------------+--------------+--------------+
  • Run the below python upgrade script. It is for applying the below configuration & settings.

    • Create kafka topics and configure the topic message max size to 8mb
    • Create kafka-external user in config.json.
    • Add new alert-processor companion service settings in values.yaml
    • Configure and apply security index purge policy for Opensearch

Important

Take a backup of /opt/rdaf/deployment-scripts/values.yaml before running the below upgrade script.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/rdaf_upgrade_118_119.py
python rdaf_upgrade_118_119.py

Important

Make sure above upgrade script is executed before moving to next step.

2.3.2. Download the new Docker Images

Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.

rdaf registry fetch --tag 3.2.2,7.2.2

Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.

rdaf registry list-tags 

Make sure 3.2.2 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-registry
  • rda-rda-scheduler
  • rda-collector
  • rda-stack-mgr
  • rda-identity
  • rda-fsm
  • rda-access-manager
  • rda-resource-manager
  • rda-user-preferences
  • onprem-portal
  • onprem-portal-nginx
  • rda-worker-all
  • onprem-portal-dbinit
  • cfxdx-nb-nginx-all
  • rda-event-gateway
  • rdac
  • rdac-full

Make sure 7.2.2 image tag is downloaded for the below RDAF OIA Application services.

  • rda-app-controller
  • rda-alert-processor
  • rda-file-browser
  • rda-smtp-server
  • rda-ingestion-tracker
  • rda-reports-registry
  • rda-ml-config
  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-alert-ingester
  • rda-collaboration
  • rda-notification-service
  • rda-configuration-service

Downloaded Docker images are stored under the below path.

/opt/rdaf/data/docker/registry/v2

Run the below command to check the filesystem's disk usage on which docker images are stored.

df -h /opt

Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.

rdaf registry delete-images --tag <tag1,tag2>

2.3.3. Upgrade Services

2.3.3.1 Upgrade RDAF Platform Services

Run the below command to initiate upgrading RDAF Platform services.

rdaf platform upgrade --tag 3.2.2
Wait till all of the new platform service are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.2 version.
rdaf platform status

+---------------------+--------------+------------+--------------+-------+
| Name                | Host         | Status     | Container Id | Tag   |
+---------------------+--------------+------------+--------------+-------+
| rda_api_server      | 192.168.107.60 | Up 4 hours | 0da7ebeadceb | 3.2.2 |
| rda_registry        | 192.168.107.60 | Up 4 hours | 841a4e03447d | 3.2.2 |
| rda_scheduler       | 192.168.107.60 | Up 4 hours | 806af221a299 | 3.2.2 |
| rda_collector       | 192.168.107.60 | Up 4 hours | 9ae8da4d2182 | 3.2.2 |
| rda_asset_dependenc | 192.168.107.60 | Up 4 hours | e96cf642b2d6 | 3.2.2 |
| y                   |              |            |              |       |
| rda_identity        | 192.168.107.60 | Up 4 hours | 2a57ce63a756 | 3.2.2 |
| rda_fsm             | 192.168.107.60 | Up 4 hours | 2b645a75b5f0 | 3.2.2 |
+--------------------------+--------------+------------+--------------+--+
2.3.3.2 Upgrade RDAC cli

Run the below command to upgrade the rdac CLI

rdaf rdac_cli upgrade --tag 3.2.2

Run the below command to verify that one of the scheduler service is elected as a leader under Site column.

rdac pods
+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host         | ID       | Site        | Age     |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App   | fsm                                    | True        | 8b5dfca4cce9 | c0a8bbd7 |             | 7:33:16 |      8 |        31.21 |               |              |
| App   | ingestion-tracker                      | True        | d37e78507693 | e1bd1405 |             | 7:21:16 |      8 |        31.21 |               |              |
| App   | ml-config                              | True        | 0c73604632bc | 65594689 |             | 7:22:02 |      8 |        31.21 |               |              |
| App   | reports-registry                       | True        | be82a9e704a2 | 567f1275 |             | 7:25:23 |      8 |        31.21 |               |              |
| App   | smtp-server                            | True        | 08a8dd347660 | 06242bab |             | 7:23:35 |      8 |        31.21 |               |              |
| App   | user-preferences                       | True        | fc7a4a5a0591 | 53dce7ca |             | 7:32:25 |      8 |        31.21 |               |              |
| App   | webhook-server                         | True        | 20a2afb33b6c | fdb1eb21 |             | 7:23:53 |      8 |        31.21 |               |              |
| Infra | api-server                             | True        | b1e7105b231e | 33f6ed2c |             | 2:04:53 |      8 |        31.21 |               |              |
| Infra | collector                              | True        | f5abb5cac9a5 | eb17ce02 |             | 3:50:51 |      8 |        31.21 |               |              |
| Infra | registry                               | True        | ce73263c7828 | 8cda9974 |             | 7:34:05 |      8 |        31.21 |               |              |
| Infra | scheduler                              | True        | d9d62c1f1bb7 | 96047389 | *leader*    | 7:33:59 |      8 |        31.21 |               |              |
| Infra | worker                                 | True        | ba1198f05f6b | afd229a8 | rda-site-01 | 7:26:20 |      8 |        31.21 | 7             | 109          |
+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=1, Brokers=[2, 3, 1] |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.3.3.3 Upgrade RDA Worker Services

Run the below command to initiate upgrading the RDA worker service(s).

Tip

If the RDA worker is deployed in http proxy environment, add the required environment variables for http proxy settings in /opt/rdaf/deployment-scripts/values.yaml under rda_worker section. Below is the sample http proxy configuration for worker service.

rda_worker:
    mem_limit: 8G
    memswap_limit: 8G
    privileged: false
    environment:
      RDA_ENABLE_TRACES: 'no'
      RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
      http_proxy: "http://user:password@192.168.122.107:3128"
      https_proxy: "http://user:password@192.168.122.107:3128"
      HTTP_PROXY: "http://user:password@192.168.122.107:3128"
      HTTPS_PROXY: "http://user:password@192.168.122.107:3128
rdaf worker upgrade --tag 3.2.2

Wait for 120 seconds to let the newer version of RDA worker services join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA worker services.

rdac pods | grep worker
rdaf worker status
+------------+--------------+------------+--------------+-------+
| Name       | Host         | Status     | Container Id | Tag   |
+------------+--------------+------------+--------------+-------+
| rda_worker | 192.168.107.60 | Up 4 hours | d968c908d3e3 | 3.2.2 |
+------------+--------------+------------+--------------+-------+
2.3.3.4 Upgrade OIA/AIA Application Services

Run the below commands to initiate upgrading RDAF OIA/AIA Application services

rdaf app upgrade OIA/AIA --tag 7.2.2

Wait till all of the new OIA/AIA application services are in Running state and run the below command to verify their status and make sure they are running with 7.2.2 version. Check the new service cfx-rda-alert-processor-companion is deployed. Make sure all OIA/AIA services are up with the new tag.

rdaf app status

+-----------------------------------+--------------+------------+--------------+-------+
| Name                              | Host         | Status     | Container Id | Tag   |
+-----------------------------------+--------------+------------+--------------+-------+
| cfx-rda-app-controller            | 192.168.107.60 | Up 3 hours | 017692a218b8 | 7.2.2 |
| cfx-rda-reports-registry          | 192.168.107.60 | Up 3 hours | be82a9e704a2 | 7.2.2 |
| cfx-rda-notification-service      | 192.168.107.60 | Up 3 hours | 42d3c8c4861c | 7.2.2 |
| cfx-rda-file-browser              | 192.168.107.60 | Up 3 hours | 46b9dedab4b0 | 7.2.2 |
| cfx-rda-configuration-service     | 192.168.107.60 | Up 3 hours | 6bef9741ff46 | 7.2.2 |
| cfx-rda-alert-ingester            | 192.168.107.60 | Up 3 hours | 13975b9efe7d | 7.2.2 |
| cfx-rda-webhook-server            | 192.168.107.60 | Up 3 hours | 20a2afb33b6c | 7.2.2 |
| cfx-rda-smtp-server               | 192.168.107.60 | Up 3 hours | 08a8dd347660 | 7.2.2 |
| cfx-rda-event-consumer            | 192.168.107.60 | Up 3 hours | b0b62c88064a | 7.2.2 |
| cfx-rda-alert-processor           | 192.168.107.60 | Up 3 hours | ab24dcbd6e3a | 7.2.2 |
| cfx-rda-irm-service               | 192.168.107.60 | Up 3 hours | 11c92a206eaa | 7.2.2 |
| cfx-rda-ml-config                 | 192.168.107.60 | Up 3 hours | 0c73604632bc | 7.2.2 |
| cfx-rda-collaboration             | 192.168.107.60 | Up 3 hours | a5cfe5b681bb | 7.2.2 |
| cfx-rda-ingestion-tracker         | 192.168.107.60 | Up 3 hours | d37e78507693 | 7.2.2 |
| cfx-rda-alert-processor-companion | 192.168.107.60 | Up 3 hours | b74d82710af9 | 7.2.2 |
+-----------------------------------+--------------+------------+--------------+-------+
Run the below command to verify that one of the cfxdimensions-app-irm_service is elected as a leader under Site column.

rdac pods

+-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host         | ID       | Site        | Age      |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------|
| App   | alert-ingester                         | True        | 13975b9efe7d | dd32fdef |             | 12:07:37 |      8 |        31.21 |               |              |
| App   | alert-processor                        | True        | ab24dcbd6e3a | a980d44e |             | 12:06:10 |      8 |        31.21 |               |              |
| App   | alert-processor-companion              | True        | b74d82710af9 | 8f37b360 |             | 12:04:19 |      8 |        31.21 |               |              |
| App   | asset-dependency                       | True        | 83c5d941f3a6 | f17cc305 |             | 12:16:59 |      8 |        31.21 |               |              |
| App   | authenticator                          | True        | fb82e1664219 | b6f19086 |             | 12:16:47 |      8 |        31.21 |               |              |
| App   | cfx-app-controller                     | True        | 017692a218b8 | 55015d69 |             | 12:09:04 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | 87871b87d45e | b0465aa5 |             | 12:16:19 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | a5cfe5b681bb | c5b40c98 |             | 12:05:05 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | 46b9dedab4b0 | 3bcc6bc5 |             | 12:08:13 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | 11c92a206eaa | 851f07b7 | *leader*    | 12:05:48 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-notification-service | True        | 42d3c8c4861c | 891ab559 |             | 12:08:31 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-resource-manager     | True        | a35dd8127434 | 29b57c51 |             | 12:16:08 |      8 |        31.21 |               |              |
+-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------+
Run the below command to check if all RDA worker services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                               |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_app   | alert-ingester                         | 13975b9efe7d | dd32fdef |             | service-status                                      | ok       |                                                       |
| rda_app   | alert-ingester                         | 13975b9efe7d | dd32fdef |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | alert-ingester                         | 13975b9efe7d | dd32fdef |             | service-dependency:configuration-service            | ok       | 1 pod(s) found for configuration-service              |
| rda_app   | alert-ingester                         | 13975b9efe7d | dd32fdef |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | alert-ingester                         | 13975b9efe7d | dd32fdef |             | kafka-connectivity                                  | ok       | Cluster=oDO7X5AZTh-78HgTt0WbrA, Broker=1, Brokers=[1] |
| rda_app   | alert-processor                        | ab24dcbd6e3a | a980d44e |             | service-status                                      | ok       |                                                       |
| rda_app   | alert-processor                        | ab24dcbd6e3a | a980d44e |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | alert-processor                        | ab24dcbd6e3a | a980d44e |             | service-dependency:cfx-app-controller               | ok       | 1 pod(s) found for cfx-app-controller                 |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+

Note

Run the rdaf prune_imagescommand on to cleanup old docker images.

2.3.4. Post Upgrade Steps

1. Download the script from below path to migrate the UI-Icon URL from private to Public

Tip

This script run is an optional step to perform only if (1) white labeling customization was done on the Login page with an uploaded image before the version upgrade, (2) you are experiencing that the custom image is no longer showing up at the Login page after the upgrade.

wget https://macaw-amer.s3.amazonaws.com/releases/RDA/3.2.2/iconlib_migration_script.py
  • Copy the above script to rda_identity platform service container. Run the below command to get the container-id for rda_identity and the host IP on which it is running.

rdaf platform status
+--------------------------+--------------+------------+--------------+-------+
| Name                     | Host         | Status     | Container Id | Tag   |
+--------------------------+--------------+------------+--------------+-------+
| rda_api_server           | 192.168.107.40 | Up 7 hours | 6540e670fdb9 | 3.2.2 |
| rda_registry             | 192.168.107.40 | Up 7 hours | 98bead8d0599 | 3.2.2 |
....
| rda_identity             | 192.168.107.40 | Up 7 hours | 67940390d61f | 3.2.2 |
| rda_fsm                  | 192.168.107.40 | Up 7 hours | a96870b5f32f | 3.2.2 |
| cfx-rda-access-manager   | 192.168.107.40 | Up 7 hours | 33362030fe52 | 3.2.2 |
| cfx-rda-resource-manager | 192.168.107.40 | Up 7 hours | 54a082b754be | 3.2.2 |
| cfx-rda-user-preferences | 192.168.107.40 | Up 7 hours | 829f95d98741 | 3.2.2 |
+--------------------------+--------------+------------+--------------+-------+

  • Login to the host on which rda_identity service is running as rdauser using SSH CLI and run below command to copy the above downloaded script.
docker cp /home/rdauser/iconlib_migration_script.py <rda_identity_container_id>:/tmp
  • Run the below command to switch into rda_identity service's container shell.
docker exec -it <rda_identity_container_id> bash
  • Execute below command to migrate the customer branding (white labelling) changes.
python /tmp/iconlib_migration_script.py

2. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action in row level

3. Enable ML experiments manually if any experiments are configured (Organization --> Configuration --> Machine Learning)

4. FSM Installation Steps ( Applicable only for Remedy ITSM ticketing deployment )

a) Update the Team configuration that was created for ITSM ticketing (Team with Source 'Others'). Include the following content in the JSON editor of the Team's configuration. Adjust or add alert sources and execution delay as necessary.

[
{
"alert_source": "SNMP",
"execution_delay": 900,
"auto_share": {
    "create": true,
    "update": true,
    "close": true,
    "resolved": true,
    "cancel": true,
    "alert_count_changes": true
}
},
{
"alert_source": "Syslog",
"execution_delay": 900,
"auto_share": {
    "create": true,
    "update": true,
    "close": true,
    "resolved": true,
    "cancel": true,
    "alert_count_changes": true
}
}
]

b) Download and Update latest FSM model Configuration ->RDA Administration -> FSM Models

Important

Take a backup of existing model before update

https://macaw-amer.s3.amazonaws.com/releases/RDA/3.2.2/oia_ticketing_with_soothing_interval.yml

c) Add formatting templates Configuration ->RDA Administration -> Formatting Templates

  • snow-notes-template
{% for r in rows %}
    <b>Message</b> : {{r.a_message}} <br>
    <b>RaisedAt</b> : {{r.a_raised_ts}} <br>
    <b>UpdatedAt</b> : {{r.a_updated_ts}} <br>
    <b>Status</b> : {{r.a_status}} <br>
    <b>AssetName</b> : {{r.a_asset_name}} <br>
    <b>AssetType</b> : {{r.a_asset_type}} <br>
    <b>RepeatCount</b> : {{r.a_repeat_count}} <br>
    <b>Action</b> : {{r.action_name}} <br>
    <br><br>
{%endfor%}
  • snow-description-template
Description : {{i_description}}

d) Deploy FSM bundles

fsm_events_kafka_publisher_bundles,oia_fsm_aots_ticketing_bundle oia_fsm_common_ticketing_bundles

e) Create 'fsm-debug-outbound-ticketing' and 'aots_ticket_notifications' PStreams from the UI if they do not already exist

{
    "case_insensitive": true,
    "retention_days": 7
}

f) Enable Service Blueprints - Read Alert Stream, Read Incident Stream, Create Ticket, Update Ticket, Resolve Ticket, Read AOTS Inbound Notifications

2.4. Upgrade from 7.2.1.x to 7.2.2.1

2.4.1. Pre-requisites

Below are the pre-requisites which need to be in place before upgrading the OIA (AIOps) application services.

  • RDAF Deployment CLI Version: 1.1.8

  • RDAF Infrastructure Services Tag Version: 1.0.2,1.0.2.1(nats)

  • RDAF Core Platform & Worker Services Tag Version: 3.2.1.3

  • RDAF Client (RDAC) Tag Version: 3.2.1 / 3.2.1.x

  • OIA Services Tag Version: 7.2.1.1/7.2.1.5/7.2.1.6

  • CloudFabrix recommends taking VMware VM snapshots where RDA Fabric platform/applications are deployed

Warning

Make sure all of the above pre-requisites are met before proceeding with the upgrade process.

Warning

Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Important

Please make sure full backup of the RDAF platform system is completed before performing the upgrade.

Please take full backup of the RDA Fabric VMs using 3rd party backup software or run the below backup command to take the backup of application data.

rdafk8s backup --dest-dir <backup-dir>
Note: Please make sure the shared backup-dir is NFS mounted across all RDA Fabric Virtual Machines.

Run the below K8s commands and make sure the Kubernetes PODs are NOT in restarting mode (it is applicable to only Kubernetes environment)

kubectl get pods -n rda-fabric -l app_category=rdaf-infra
kubectl get pods -n rda-fabric -l app_category=rdaf-platform
kubectl get pods -n rda-fabric -l app_component=rda-worker 
kubectl get pods -n rda-fabric -l app_name=oia 

RDAF Deployment CLI Upgrade:

Please follow the below given steps.

Note

Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.

  • Run the below command to verify the current version of RDAF CLI is 1.1.8 version.
rdafk8s -v
  • Download the RDAF Deployment CLI's newer version 1.1.9.1 bundle
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/rdafcli-1.1.9.1.tar.gz
  • Upgrade the rdaf CLI to version 1.1.9.1
pip install --user rdafcli-1.1.9.1.tar.gz
  • Verify the installed rdaf CLI version is upgraded to 1.1.9.1
    rdafk8s -v
    rdaf -v
    
  • Download the RDAF Deployment CLI's newer version 1.1.9.1 bundle and copy it to RDAF management VM on which rdaf deployment CLI was installed.
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/offline-rhel-1.1.9.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.1.9.1.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.1.9.1
  • Upgrade the rdafCLI to version 1.1.9.1
pip install --user rdafcli-1.1.9.1.tar.gz -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/offline-ubuntu-1.1.9.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.1.9.1.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.1.9.1
  • Upgrade the rdafCLI to version 1.1.9.1
pip install --user rdafcli-1.1.9.1.tar.gz -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version

2.4.2. Download the new Docker Images

Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.

rdaf registry fetch --tag 3.2.2.1,7.2.2.1

Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.

rdaf registry list-tags 

Please make sure 3.2.2.1 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-registry
  • rda-rda-scheduler
  • rda-collector
  • rda-stack-mgr
  • rda-identity
  • rda-fsm
  • rda-access-manager
  • rda-resource-manager
  • rda-user-preferences
  • onprem-portal
  • onprem-portal-nginx
  • rda-worker-all
  • onprem-portal-dbinit
  • cfxdx-nb-nginx-all
  • rda-event-gateway
  • rdac
  • rdac-full

Please make sure 7.2.2.1 image tag is downloaded for the below RDAF OIA Application services.

  • rda-app-controller
  • rda-alert-processor
  • rda-file-browser
  • rda-smtp-server
  • rda-ingestion-tracker
  • rda-reports-registry
  • rda-ml-config
  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-alert-ingester
  • rda-collaboration
  • rda-notification-service
  • rda-configuration-service

Downloaded Docker images are stored under the below path.

/opt/rdaf/data/docker/registry/v2

Run the below command to check the filesystem's disk usage on which docker images are stored.

df -h /opt

Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.

rdaf registry delete-images --tag <tag1,tag2>

2.4.3. Upgrade Services

2.4.3.1 Upgrade RDAF Infra Services

Download the below upgrade script and copy it to RDAF management VM on which rdaf deployment CLI was installed.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/rdaf_k8s_upgrade_118_119_1.py

Please run the downloaded upgrade script. It configures and applies the below changes.

  • Creates a new Kafka user specifically for allowing Kafka topics which need to be exposed to external systems to publish the data such as events or alerts or notifications.
  • Updates the /opt/rdaf/config/network_config/config.json file with newly created Kafka user's credentials.
  • Creates and applies lifecycle management policy for Opensearch's default security audit logs index to purge the older data. It is configured to purge the data that is older than 15 days.
  • Updates /opt/rdaf/deployment-scripts/values.yaml file to add the support for new alert processor companion service. It also updates rda-worker service configuration to attach a new persistent-volume. The persisten-volume is created out of local host's directory path @ /opt/rdaf/config/worker/rda_packages on which rda-worker service is running.
python rdaf_k8s_upgrade_118_119_1.py

Important

Please make sure above upgrade script is executed before moving to next step.

  • Update kafka-values.yaml with below parameters.

Tip

  • Upgrade script generates kafka-values.yaml.latest file in /opt/rdaf/deployment-scripts/ directory which will have updated configuration.
  • Please take a backup of the kafka-values.yaml file before making changes.
    cp /opt/rdaf/deployment-scripts/kafka-values.yaml /opt/rdaf/deployment-scripts/kafka-values.yaml.bak
    
  • Please skip the changes if the current kafka-values.yaml file already has below mentioned parameters.

Edit kafka-values.yaml file.

vi /opt/rdaf/deployment-scripts/kafka-values.yaml

Find the below parameter and delete it if it exists.

autoCreateTopicsEnable: false

Add below highlighted paramters. Please skip if these are already configured.

global:
  imagePullSecrets:
  - cfxregistry-cred
image:
  registry: 192.168.10.10:5000
  repository: rda-platform-kafka
  tag: 1.0.2
  pullPolicy: Always
heapOpts: -Xmx2048m -Xms2048m
defaultReplicationFactor: 3
offsetsTopicReplicationFactor: 3
transactionStateLogReplicationFactor: 3
transactionStateLogMinIsr: 2
maxMessageBytes: '8399093'
numPartitions: 15
externalAccess:
  enabled: true
  autoDiscovery:
    enabled: true
  service:
    type: NodePort
    nodePorts:
    - 31252
    - 31533
    - 31964
serviceAccount:
  create: true
rbac:
  create: true
authorizerClassName: kafka.security.authorizer.AclAuthorizer
logRetentionHours: 24
allowEveryoneIfNoAclFound: true

Apply above configuration changes to kafka infra service.

rdafk8s infra upgrade --tag 1.0.2 --service kafka
After upgrading the RDAF Kafka service using the above command, run the below command to verify it's running status.

kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i kafka
  • Please wait till all of the Kafka service pods are in Running state.
rdafk8s infra status
  • Please make sure all infra services are in Running state before moving to next section.
kubectl get pods -n rda-fabric -l app_category=rdaf-infra
  • Additionally, please run the below command to make sure there are no errors with RDA Fabric services.
rdac healthcheck
2.4.3.2 Upgrade RDAF Platform Services

Step-1: Run the below command to initiate upgrading RDAF Platform services.

rdafk8s platform upgrade --tag 3.2.2.1

As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state. (Note: Please wait if a POD is in ContainerCreating state until it is transistioned into Terminating state.)

kubectl get pods -n rda-fabric -l app_category=rdaf-platform

Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.

python maint_command.py

Note

If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.6/maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.

rdac pods --show_maintenance | grep False

Warning

Wait for 120 seconds before executing Step-6.

Step-6: Run the below command to delete the Terminating RDAF platform service PODs

for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.

Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.2.1 version.

rdafk8s platform status
+----------------------+----------------+-----------------+--------------+-------------+
| Name                 | Host           | Status          | Container Id | Tag         |
+----------------------+--------------+-----------------+--------------+---------------+
| rda-api-server       | 192.168.131.45 | Up 19 Hours ago | 4d5adbbf954b | 3.2.2.1 |
| rda-api-server       | 192.168.131.44 | Up 19 Hours ago | 2c58bccaf38d | 3.2.2.1 |
| rda-registry         | 192.168.131.44 | Up 20 Hours ago | 408a4ddcc685 | 3.2.2.1 |
| rda-registry         | 192.168.131.45 | Up 20 Hours ago | 4f01fc820585 | 3.2.2.1 |
| rda-identity         | 192.168.131.44 | Up 20 Hours ago | bdd1e91f86ec | 3.2.2.1 |
| rda-identity         | 192.168.131.45 | Up 20 Hours ago | e63af9c6e9d9 | 3.2.2.1 |
| rda-fsm              | 192.168.131.45 | Up 20 Hours ago | 3ec246cf7edd | 3.2.2.1 |
+----------------------+--------------+-----------------+--------------+---------------+

Run the below command to check one of the rda-scheduler service is elected as a leader under Site column.

rdac pods
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Infra | api-server                             | True        | rda-api-server | 35a17877 |             | 20:15:37 |      8 |        31.33 |               |              |
| Infra | api-server                             | True        | rda-api-server | 8f678e25 |             | 20:14:39 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | 17ce190d |             | 20:47:41 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | 6b91bf23 |             | 20:47:22 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-5 | 4ee8ef7d |             | 20:48:20 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-5 | 895b7f5c |             | 20:47:39 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | ab79ba8d |             | 20:47:43 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | f2cefc92 | *leader*    | 20:47:23 |      8 |        31.33 |               |              |
| Infra | worker                                 | True        | rda-worker-df5 | e2174794 | rda-site-01 | 20:28:50 |      8 |        31.33 | 1             | 97           |
| Infra | worker                                 | True        | rda-worker-df5 | 6debca1d | rda-site-01 | 20:26:08 |      8 |        31.33 | 2             | 91           |
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | rda-alert-in | 1afb8c8d |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 1afb8c8d |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 1afb8c8d |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | rda-alert-in | 1afb8c8d |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 1afb8c8d |             | kafka-connectivity                                  | ok       | Cluster=nzyeX9qkR-ChWXC0fRvSyQ, Broker=0, Brokers=[0, 2, 1] |
| rda_app   | alert-ingester                         | rda-alert-in | 5751f199 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 5751f199 |             | minio-connectivity                                  | ok       |                                                             |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.4.3.3 Upgrade RDAC cli

Run the below command to upgrade the rdac CLI

rdafk8s rdac_cli upgrade --tag 3.2.2.1
2.4.3.4 Upgrade RDA Worker Services

Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.

Tip

If the RDA worker is deployed in http proxy environment, add the required environment variables for http proxy settings in /opt/rdaf/deployment-scripts/values.yaml under rda_worker section. Below is the sample http proxy configuration for worker service.

rda_worker:
    mem_limit: 8G
    memswap_limit: 8G
    privileged: false
    environment:
      RDA_ENABLE_TRACES: 'no'
      RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
      http_proxy: "http://user:password@192.168.122.107:3128"
      https_proxy: "http://user:password@192.168.122.107:3128"
      HTTP_PROXY: "http://user:password@192.168.122.107:3128"
      HTTPS_PROXY: "http://user:password@192.168.122.107:3128
rdafk8s worker upgrade --tag 3.2.2.1

Step-2: Run the below command to check the status of the existing and newer worker PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating state.

(Note: Please wait if a POD is in ContainerCreating state until it is transistioned into Terminating state.)

kubectl get pods -n rda-fabric -l app_component=rda-worker

Step-3: Run the below command to put all Terminating RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance command that is required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-worker-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.

rdac pods --show_maintenance | grep False

Warning

Wait for 120 seconds before executing Step-6.

Step-6: Run the below command to delete the Terminating RDAF worker service PODs

for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Repeat above steps from Step-2 to Step-6 for rest of the RDAF Worker service PODs.

Please wait till all the new worker service pods are in Running state.

Step-7: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.

rdafk8s worker status
+------------+----------------+----------------+--------------+-------------+
| Name       | Host           | Status         | Container Id | Tag         |
+------------+----------------+----------------+--------------+-------------+
| rda-worker | 192.168.131.44 | Up 6 Hours ago | eb679ed8a6c6 | 3.2.2.1 |
| rda-worker | 192.168.131.45 | Up 6 Hours ago | a3356b168c50 | 3.2.2.1 |
|            |                |                |              |             |
+------------+----------------+----------------+--------------+-------------+
rdac pods | grep rda-worker

Step-8: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck
2.4.3.5 Upgrade OIA Application Services

Step-1: Run the below commands to initiate upgrading RDAF OIA Application services.

rdafk8s app upgrade OIA --tag 7.2.2.1

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each OIA application service is in Terminating state.

(Note: Please wait if a POD is in ContainerCreating state until it is transistioned into Terminating state.)

kubectl get pods -n rda-fabric -l app_name=oia

Step-3: Run the below command to put all Terminating OIA application service PODs into maintenance mode. It will list all of the POD Ids of OIA application services along with rdac maintenance command that are required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-oia-app-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the OIA application services.

rdac pods --show_maintenance | grep False

Warning

Wait for 120 seconds before executing Step-6.

Step-6: Run the below command to delete the Terminating OIA application service PODs

for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
kubectl get pods -n rda-fabric -l app_name=oia

Note

Repeat above steps from Step-2 to Step-6 for rest of OIA application service PODs.

Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.2.1 version.

rdafk8s app status
+-------------------------------+----------------+-----------------+--------------+-----------------+
| Name                          | Host           | Status          | Container Id | Tag             |
+-------------------------------+--------------+-----------------+--------------+-------------------+
| rda-alert-ingester            | 192.168.131.50 | Up 1 Days ago   | a400c11be238 | 7.2.2.1     |
| rda-alert-ingester            | 192.168.131.49 | Up 1 Days ago   | 5187d5a093a5 | 7.2.2.1     |
| rda-alert-processor           | 192.168.131.46 | Up 1 Days ago   | 34901aba5e7d | 7.2.2.1     |
| rda-alert-processor           | 192.168.131.47 | Up 1 Days ago   | e6fe0aa7ffe4 | 7.2.2.1     |
| rda-alert-processor-companion | 192.168.131.50 | Up 1 Days ago   | 8e3cc2f3b252 | 7.2.2.1     |
| rda-alert-processor-companion | 192.168.131.49 | Up 1 Days ago   | 4237fb52031c | 7.2.2.1     |
| rda-app-controller            | 192.168.131.47 | Up 1 Days ago   | fbe360d13fa3 | 7.2.2.1     |
| rda-app-controller            | 192.168.131.46 | Up 1 Days ago   | 8346f5c69e7b | 7.2.2.1     |
+-------------------------------+----------------+-----------------+--------------+-----------------+

Step-7: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods

Run the below command to check if all RDA worker services has ok status and does not throw any failure messages.

rdac healthcheck
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host           | ID       | Site        | Age      |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------|
| App   | cfxdimensions-app-collaboration        | True        | rda-collaborat | ba007878 |             | 22:57:58 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | rda-file-brows | bf349af7 |             | 23:00:54 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | rda-file-brows | 46c7c2dc |             | 22:52:17 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | rda-irm-servic | 34698062 |             | 23:00:23 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | rda-irm-servic | b824b35b | *leader*    | 22:50:33 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | rda-notificati | 73d2c7f9 |             | 23:01:23 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | rda-notificati | bac009ba |             | 22:59:05 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-resource-manager     | True        | rda-resource-m | 3e164b71 |             | 23:25:24 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-resource-manager     | True        | rda-resource-m | dba599c6 |             | 23:25:00 |      8 |        31.33 |               |              |
| App   | configuration-service                  | True        | rda-configurat | dd7ec9d9 |             | 5:46:22  |      8 |        31.33 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+

2.4.4 Post Installation Steps

  • Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action

  • Download the script from below path to migrate the UI-Icon URL from private to Public

Tip

This script run is an optional step to perform only if (1) white labeling customization was done on the Login page with an uploaded image before the version upgrade, (2) you are experiencing that the custom image is no longer showing up at the Login page after the upgrade.

wget https://macaw-amer.s3.amazonaws.com/releases/RDA/3.2.2/iconlib_migration_script.py
  • Copy the above script to rda_identity platform service container. Run the below command to get the container-id for rda_identity and the host IP on which it is running.
rdafk8s platform status
rdaf platform status
+--------------------------+--------------+------------+--------------+-------+
| Name                     | Host         | Status     | Container Id | Tag   |
+--------------------------+--------------+------------+--------------+-------+
| rda_api_server           | 192.168.107.40 | Up 7 hours | 6540e670fdb9 | 3.2.2.1 |
| rda_registry             | 192.168.107.40 | Up 7 hours | 98bead8d0599 | 3.2.2.1 |
....
| rda_identity             | 192.168.107.40 | Up 7 hours | 67940390d61f | 3.2.2.1 |
| rda_fsm                  | 192.168.107.40 | Up 7 hours | a96870b5f32f | 3.2.2.1 |
| cfx-rda-access-manager   | 192.168.107.40 | Up 7 hours | 33362030fe52 | 3.2.2.1 |
| cfx-rda-resource-manager | 192.168.107.40 | Up 7 hours | 54a082b754be | 3.2.2.1 |
| cfx-rda-user-preferences | 192.168.107.40 | Up 7 hours | 829f95d98741 | 3.2.2.1 |
+--------------------------+--------------+------------+--------------+-------+
  • Login to the host on which rda_identity service is running as rdauser using SSH CLI and run below command to copy the above downloaded script.
docker cp /home/rdauser/iconlib_migration_script.py <rda_identity_container_id>:/tmp
  • Run the below command to switch into rda_identity service's container shell.
docker exec -it <rda_identity_container_id> bash
  • Execute below command to migrate the customer branding (white labelling) changes.
python /tmp/iconlib_migration_script.py
  • In this new version (7.2.2.1), suppression policy added support to read the data from a pstream to suppress the alerts. As a pre-requisite for this feature to work, the pstream that is going to be used in a suppression policy, should be configured with attr_name and it's value using which it can filter the alerts to apply the suppression policy. Additionally, the attributes start_time_utc and end_time_utc should be in ISO datetime format.
{

  "attr_name": "ci_name"

}
  • This new version also added a new feature to enrich the incoming alerts using either dataset or pstream or both within each alert's source mapper configuration. Below is a sample configuration for a reference on how to use dataset_enrich and stream_enrich functions within the alert mapper.

Dataset based enrichment:

  • name: Dataset name
  • condition: CFXQL based condition which can be defined with one or more conditions with AND and OR between each condition. Each condition is evaluated in the specified order and it picks the enrichment value(s) for whichever condition matches.
  • enriched_columns: Specify one or more attributes to be selected as enriched attributes on above condition match. When no attribute is specified, it will pick all of the available attributes.
{
  "func": {
    "dataset_enrich": {
      "name": "nagios-host-group-members",
      "condition": "host_name is '$assetName'",
      "enriched_columns": "group_id,hostgroup_name"
    }
  }
}

Pstream based enrichment:

  • name: Pstream name
  • condition: CFXQL based condition which can be defined with one or more conditions with AND and OR between each condition. Each condition is evaluated in the specified order and it picks the enrichment value(s) for whichever condition matches.
  • enriched_columns: Specify one or more attributes to be selected as enriched attributes on above condition match. When no attribute is specified, it will pick all of the available attributes.
{
  "func": {
    "stream_enrich": {
      "name": "nagios-host-group-members",
      "condition": "host_name is '$assetName'",
      "enriched_columns": {
        "group_id": "stream_id",
        "hostgroup_name": "stream_hostgroup"
      }
    }
  }
}