Skip to content

Installing OIA (Operations Intelligence & Analytics)

This document provides instructions about fresh Installation of OIA (Operations Intelligence & Analytics), which is also referred as AIOps.

1. Setup & Install

cfxOIA is an application that is installed on top of RDA Fabric platform.

1.1 Tag Version: 7.4.1

Pre-requisites:

Below are the pre-requisites which need to be in place before installing the OIA (AIOps) application services.

RDAF Deployment CLI Version: 1.2.1

RDAF Infrastructure Services Tag Version: 1.0.3

RDAF Core Platform & Worker Services Tag Version: 3.4.1

RDAF Client (RDAC) Tag Version: 3.4.1

Warning

Please complete all of the above pre-requisites before installing the OIA (AIOps) application services.

Login as rdauser user into on-premise docker registry or RDA Fabric Platform VM on which RDAF deployment CLI was installed (ex: putty)

Before installing the OIA (AIOps) application services, please run the below command to update HAProxy (Loadbalancer) configuration.

rdafk8s app update-config OIA
rdaf app update-config OIA

Run the below rdaf or rdafk8scommand, to make sure all of the RDAF infrastructure services are up and running.

rdafk8s infra status
rdaf infra status

Run the below rdac pods command, to make sure all of the RDAF core platform and worker services are up and running.

rdac pods
+-------+----------------------------------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Host           | ID       | Site        | Age      |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------|
| App   | asset-dependency                       | rda-asset-depe | 090669bf |             | 20:18:21 |      8 |        47.03 |               |              |
| App   | authenticator                          | rda-identity-5 | 57905b20 |             | 20:19:11 |      8 |        47.03 |               |              |
| App   | cfxdimensions-app-access-manager       | rda-access-man | 6338ad29 |             | 20:18:44 |      8 |        47.03 |               |              |
| App   | cfxdimensions-app-notification-service | rda-notificati | bb9e3e7b |             | 20:09:52 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-resource-manager     | rda-resource-m | e5a28e16 |             | 20:18:34 |      8 |        47.03 |               |              |
| App   | user-preferences                       | rda-user-prefe | fd09d3ba |             | 20:18:08 |      8 |        47.03 |               |              |
| Infra | api-server                             | rda-api-server | b1b910d9 |             | 20:19:22 |      8 |        47.03 |               |              |
| Infra | collector                              | rda-collector- | 99553e51 |             | 20:18:17 |      8 |        47.03 |               |              |
| Infra | registry                               | rda-registry-7 | a46cd712 |             | 20:19:15 |      8 |        47.03 |               |              |
| Infra | scheduler                              | rda-scheduler- | d5537051 | *leader*    | 20:18:26 |      8 |        47.03 |               |              |
| Infra | worker                                 | rda-worker-54d | 1f769792 | rda-site-01 | 20:06:48 |      4 |        15.6  | 0             | 0            |
+-------+----------------------------------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+

Run the below rdac healthcheck command to check the health status of all of the RDAF core platform and worker services.

All of the dependency checks should show as ok under Status column.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                               |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_infra | api-server                             | rda-api-serv | b1b910d9 |             | service-status                                      | ok       |                                                       |
| rda_infra | api-server                             | rda-api-serv | b1b910d9 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | asset-dependency                       | rda-asset-de | 090669bf |             | service-status                                      | ok       |                                                       |
| rda_app   | asset-dependency                       | rda-asset-de | 090669bf |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | authenticator                          | rda-identity | 57905b20 |             | service-status                                      | ok       |                                                       |
| rda_app   | authenticator                          | rda-identity | 57905b20 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | authenticator                          | rda-identity | 57905b20 |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | rda-access-m | 6338ad29 |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | rda-access-m | 6338ad29 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | rda-access-m | 6338ad29 |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | cfxdimensions-app-access-manager       | rda-access-m | 6338ad29 |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | rda-access-m | 6338ad29 |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | service-dependency:cfxdimensions-app-access-manager | ok       | 1 pod(s) found for cfxdimensions-app-access-manager   |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | rda-resource | e5a28e16 |             | DB-connectivity                                     | ok       |                                                       |
| rda_infra | collector                              | rda-collecto | 99553e51 |             | service-status                                      | ok       |                                                       |
| rda_infra | collector                              | rda-collecto | 99553e51 |             | minio-connectivity                                  | ok       |                                                       |
| rda_infra | collector                              | rda-collecto | 99553e51 |             | opensearch-connectivity:default                     | ok       |                                                       |
| rda_infra | registry                               | rda-registry | a46cd712 |             | service-status                                      | ok       |                                                       |
| rda_infra | registry                               | rda-registry | a46cd712 |             | minio-connectivity                                  | ok       |                                                       |
| rda_infra | scheduler                              | rda-schedule | d5537051 |             | service-status                                      | ok       |                                                       |
| rda_infra | scheduler                              | rda-schedule | d5537051 |             | minio-connectivity                                  | ok       |                                                       |
| rda_infra | scheduler                              | rda-schedule | d5537051 |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | user-preferences                       | rda-user-pre | fd09d3ba |             | service-status                                      | ok       |                                                       |
| rda_app   | user-preferences                       | rda-user-pre | fd09d3ba |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | user-preferences                       | rda-user-pre | fd09d3ba |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | user-preferences                       | rda-user-pre | fd09d3ba |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | user-preferences                       | rda-user-pre | fd09d3ba |             | DB-connectivity                                     | ok       |                                                       |
| rda_infra | worker                                 | rda-worker-5 | 1f769792 | rda-site-01 | service-status                                      | ok       |                                                       |
| rda_infra | worker                                 | rda-worker-5 | 1f769792 | rda-site-01 | minio-connectivity                                  | ok       |                                                       |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+

Installing OIA (AIOps) Application Services:

Set RDA Fabric platform's application configuration as aiops using the below command.

rdac rda-app-configure --type aiops

Note

Other supported options for above command are below:

  • rda: Choose this option when only RDA Fabric platform need to be installed along with RDA Worker and RDA Event Gateway services without AIOps (OIA) or Asset Intelligence (AIA) applications.

  • aiops: Choose this option when Operations Intelligence (OIA, a.k.a AIOps) application need to be installed.

  • asset: Choose this option when Asset Intelligence (AIA) application need to be installed. (Note: AIA application type is deprecated and all of it's capabilities are available through base RDA Fabric platform itself. For more information, please contact cfx-support@cloudfabric.com)

  • all: Choose this option, when all of the supported applications need to be installed.

Run the below command to deploy RDAF OIA (AIOps) application services. (Note: Below shown tag name is a sample one for a reference only, for actual tag, please contact CloudFabrix support team at support@cloudfabrix.com)

rdaf app install OIA --tag 7.4.1
rdafk8s app install OIA --tag 7.4.1

After installing the OIA (AIOps) application services, run the below command to see the running status of the deployed application services.

rdaf app status
+---------------------------------+----------------+-----------------+--------------+-------+
| Name                            | Host           | Status          | Container Id | Tag   |
+---------------------------------+----------------+-----------------+--------------+-------+
| rda-alert-ingester              | 192.168.125.46 | Up 20 Hours ago | 610bb0e286d6 | 7.4.1 |
| rda-alert-processor             | 192.168.125.46 | Up 20 Hours ago | 79ee6788f73e | 7.4.1 |
| rda-app-controller              | 192.168.125.46 | Up 20 Hours ago | 6c672102d5ff | 7.4.1 |
| rda-collaboration               | 192.168.125.46 | Up 20 Hours ago | 34f25c05afce | 7.4.1 |
| rda-configuration-service       | 192.168.125.46 | Up 20 Hours ago | 112ccaf4b0e6 | 7.4.1 |
| rda-dataset-caas-all-alerts     | 192.168.125.46 | Up 20 Hours ago | 2b48d4dfbfd0 | 7.4.1 |
| rda-dataset-caas-current-alerts | 192.168.125.46 | Up 20 Hours ago | 03cdc77ddf1f | 7.4.1 |
| rda-event-consumer              | 192.168.125.46 | Up 20 Hours ago | 21113ba951a1 | 7.4.1 |
| rda-file-browser                | 192.168.125.46 | Up 20 Hours ago | 425dac228fc9 | 7.4.1 |
| rda-ingestion-tracker           | 192.168.125.46 | Up 20 Hours ago | 8a984a536a97 | 7.4.1 |
| rda-irm-service                 | 192.168.125.46 | Up 20 Hours ago | 258aadc0c1af | 7.4.1 |
| rda-ml-config                   | 192.168.125.46 | Up 20 Hours ago | bf23d58903f7 | 7.4.1 |
| rda-notification-service        | 192.168.125.46 | Up 20 Hours ago | a15c5232b25d | 7.4.1 |
| rda-reports-registry            | 192.168.125.46 | Up 20 Hours ago | 3890b5dfb8ae | 7.4.1 |
| rda-smtp-server                 | 192.168.125.46 | Up 20 Hours ago | 6aadab781947 | 7.4.1 |
| rda-webhook-server              | 192.168.125.46 | Up 20 Hours ago | 6bf555aed18b | 7.4.1 |
+---------------------------------+--------------+-----------------+--------------+-------+
rdafk8s app status
+---------------------------------+----------------+-----------------+--------------+-------+
| Name                            | Host           | Status          | Container Id | Tag   |
+---------------------------------+----------------+-----------------+--------------+-------+
| rda-alert-ingester              | 192.168.125.46 | Up 20 Hours ago | 610bb0e286d6 | 7.4.1 |
| rda-alert-processor             | 192.168.125.46 | Up 20 Hours ago | 79ee6788f73e | 7.4.1 |
| rda-app-controller              | 192.168.125.46 | Up 20 Hours ago | 6c672102d5ff | 7.4.1 |
| rda-collaboration               | 192.168.125.46 | Up 20 Hours ago | 34f25c05afce | 7.4.1 |
| rda-configuration-service       | 192.168.125.46 | Up 20 Hours ago | 112ccaf4b0e6 | 7.4.1 |
| rda-dataset-caas-all-alerts     | 192.168.125.46 | Up 20 Hours ago | 2b48d4dfbfd0 | 7.4.1 |
| rda-dataset-caas-current-alerts | 192.168.125.46 | Up 20 Hours ago | 03cdc77ddf1f | 7.4.1 |
| rda-event-consumer              | 192.168.125.46 | Up 20 Hours ago | 21113ba951a1 | 7.4.1 |
| rda-file-browser                | 192.168.125.46 | Up 20 Hours ago | 425dac228fc9 | 7.4.1 |
| rda-ingestion-tracker           | 192.168.125.46 | Up 20 Hours ago | 8a984a536a97 | 7.4.1 |
| rda-irm-service                 | 192.168.125.46 | Up 20 Hours ago | 258aadc0c1af | 7.4.1 |
| rda-ml-config                   | 192.168.125.46 | Up 20 Hours ago | bf23d58903f7 | 7.4.1 |
| rda-notification-service        | 192.168.125.46 | Up 20 Hours ago | a15c5232b25d | 7.4.1 |
| rda-reports-registry            | 192.168.125.46 | Up 20 Hours ago | 3890b5dfb8ae | 7.4.1 |
| rda-smtp-server                 | 192.168.125.46 | Up 20 Hours ago | 6aadab781947 | 7.4.1 |
| rda-webhook-server              | 192.168.125.46 | Up 20 Hours ago | 6bf555aed18b | 7.4.1 |
+---------------------------------+--------------+-----------------+--------------+-------+

Configuring OIA (AIOps) Application:

Login into RDAF portal as admin@cfx.com user.

Create a new Service Blueprint for OIA (AIOps) application and Machine Learning (ML) application.

For OIA (AIOps) Application: Go to Main Menu --> Configuration --> Artifacts --> Service Blueprints --> View details --> Click on Add and copy & paste the below configuration and Click on Save

name: cfxOIA
id: 81a1a2202
version: 2023_02_12_01
category: ITOM
comment: Operations Intelligence & Analytics (AIOps)
enabled: true
type: Service
provider: CloudFabrix Software, Inc.
attrs: {}
apps:
    -   label: cfxOIA
        appType: dimensions
        appName: incident-room-manager
        icon_url: /assets/img/applications/OIA.png
        permission: app:irm:read
service_pipelines: []

For Machine Learning (ML) Application: Go to Main Menu --> Configuration --> Artifacts --> Service Blueprints --> View details --> Click on Add and copy & paste the below configuration and Click on Save

name: cfxML
id: 81a1a030
version: 2023_02_12_01
category: ITOM
comment: Machine Learning (ML) Experiments
enabled: true
type: Service
provider: CloudFabrix Software, Inc.
attrs: {}
apps:
    -   label: cfxML
        appType: dimensions
        appName: ml-config
        icon_url: /assets/img/applications/ML.png
        permission: app:irm:read
service_pipelines: []

CFXOIA_App_ML_App_Add

2. Upgrade

This section provides instructions on how to upgrade existing deployment of RDAF platform and it's application OIA (Operations Intelligence & Analytics), which is also referred as AIOps.

2.1 Upgrade from 7.0.x to 7.0.6

Upgrade Prerequisites

Below are the pre-requisites which need to be in place before upgrafing the OIA (AIOps) application services.

RDAF Deployment CLI Version Upgrade: From 1.0.6 or higher to 1.1.2

RDAF Infrastructure Services Tag Version: From 1.0.1 or higher to 1.0.2 (Note: Not applicable if the services are already running at 1.0.2 version)

RDAF Core Platform & Worker Services Tag Version: From 3.0.9 to 3.1.0

RDAF Client (RDAC) Tag Version: From 3.0.9 to 3.1.0

Warning

Please complete all of the above pre-requisites before installing the OIA (AIOps) application services.

On-premise docker-registry

Login into RDAF on-premise docker-registry VM or RDAF platform VM as rdauser using SSH client on which rdaf CLI was installed and run the below command to verify status of the docker-registry service.

rdaf status
+-----------------+---------------+------------+--------------+-------+
| Name            | Host          | Status     | Container Id | Tag   |
+-----------------+---------------+------------+--------------+-------+
| docker-registry | 111.92.12.140 | Up 4 weeks | 71b8036fc64f | 1.0.1 |
+-----------------+---------------+------------+--------------+-------+

RDAF Infrastructure, Platform and Application services:

Login into RDAF on-premise docker-registry VM or RDAF platform VM as rdauser using SSH client on which rdaf CLI was installed and run the below command to verify status of the RDAF platform's infrastructure, core platform, application and worker services.

rdafk8s infra status
+----------------+--------------+-----------------+--------------+------------------------------+
| Name           | Host         | Status          | Container Id | Tag                          |
+----------------+--------------+-----------------+--------------+------------------------------+
| haproxy        | 111.92.12.41 | Up 6 days       | 245a37201207 | 1.0.2                        |
| keepalived     | 111.92.12.41 | Not Provisioned | N/A          | N/A                          |
| nats           | 111.92.12.41 | Up 6 days       | 15469a93d96f | 1.0.2                        |
| minio          | 111.92.12.41 | Up 6 days       | 3fd3f97bf25b | RELEASE.2022-11-07T23-47-39Z |
| mariadb        | 111.92.12.41 | Up 6 days       | 0fa1a0027993 | 1.0.2                        |
| opensearch     | 111.92.12.41 | Up 6 days       | dae308716400 | 1.0.2                        |
| zookeeper      | 111.92.12.41 | Up 6 days       | 4d8f61b4ab17 | 1.0.2                        |
| kafka          | 111.92.12.41 | Up 6 days       | 0dee08cd9c59 | 1.0.2                        |
| redis          | 111.92.12.41 | Up 6 days       | d1eccf90846e | 1.0.2                        |
| redis-sentinel | 111.92.12.41 | Up 6 days       | 683beb7b913e | 1.0.2                        |
+----------------+--------------+-----------------+--------------+------------------------------+
rdafk8s platform status
+--------------------------+--------------+-----------+--------------+-------+
| Name                     | Host         | Status    | Container Id | Tag   |
+--------------------------+--------------+-----------+--------------+-------+
| cfx-rda-access-manager   | 111.92.12.41 | Up 6 days | e487cdf24b46 | 3.0.9 |
| cfx-rda-resource-manager | 111.92.12.41 | Up 6 days | a7a21a31a26e | 3.0.9 |
| cfx-rda-user-preferences | 111.92.12.41 | Up 6 days | 9306d8da4b5a | 3.0.9 |
| portal-backend           | 111.92.12.41 | Up 6 days | 55df761dad1d | 3.0.9 |
| portal-frontend          | 111.92.12.41 | Up 6 days | 2183f00efa64 | 3.0.9 |
| rda_api_server           | 111.92.12.41 | Up 6 days | 3ba6256d1694 | 3.0.9 |
| rda_asset_dependency     | 111.92.12.41 | Up 6 days | d1a8b76bb114 | 3.0.9 |
| rda_collector            | 111.92.12.41 | Up 6 days | 441427d2bb1e | 3.0.9 |
| rda_identity             | 111.92.12.41 | Up 6 days | 2c1215d9155a | 3.0.9 |
| rda_registry             | 111.92.12.41 | Up 6 days | 7358e6ee6298 | 3.0.9 |
| rda_scheduler            | 111.92.12.41 | Up 6 days | ee72c66f8c80 | 3.0.9 |
+--------------------------+--------------+-----------+--------------+-------+
rdafk8s worker status
+------------+--------------+-----------+--------------+-------+
| Name       | Host         | Status    | Container Id | Tag   |
+------------+--------------+-----------+--------------+-------+
| rda_worker | 111.92.12.43 | Up 6 days | 88f4916ce18e | 3.0.9 |
| rda_worker | 111.92.12.43 | Up 6 days | 88f491612345 | 3.0.9 |
+------------+--------------+-----------+--------------+-------+
rdafk8s app status
+------------------------------+--------------+-----------+--------------+-------+
| Name                         | Host         | Status    | Container Id | Tag   |
+------------------------------+--------------+-----------+--------------+-------+
| all-alerts-cfx-rda-dataset-  | 111.92.12.42 | Up 6 days | 58a75c01c51f | 7.0.5 |
| caas                         |              |           |              |       |
| cfx-rda-alert-ingester       | 111.92.12.42 | Up 6 days | bc9a78953b73 | 7.0.5 |
| cfx-rda-alert-processor      | 111.92.12.42 | Up 6 days | 28401e5c2570 | 7.0.5 |
| cfx-rda-app-builder          | 111.92.12.42 | Up 6 days | be8f100056fd | 7.0.5 |
| cfx-rda-app-controller       | 111.92.12.42 | Up 6 days | a7a4ef35097d | 7.0.5 |
| cfx-rda-collaboration        | 111.92.12.42 | Up 6 days | d9d980b28a2b | 7.0.5 |
| cfx-rda-configuration-       | 111.92.12.42 | Up 6 days | db1a45835e1a | 7.0.5 |
| service                      |              |           |              |       |
| cfx-rda-event-consumer       | 111.92.12.42 | Up 6 days | baf09bad3ce1 | 7.0.5 |
| cfx-rda-file-browser         | 111.92.12.42 | Up 6 days | 32ccdfca8d8f | 7.0.5 |
| cfx-rda-ingestion-tracker    | 111.92.12.42 | Up 6 days | 1030345f2179 | 7.0.5 |
| cfx-rda-irm-service          | 111.92.12.42 | Up 6 days | 89d931f7d7b8 | 7.0.5 |
| cfx-rda-ml-config            | 111.92.12.42 | Up 6 days | 57fc39489a08 | 7.0.5 |
| cfx-rda-notification-service | 111.92.12.42 | Up 6 days | 408dbebb33c5 | 7.0.5 |
| cfx-rda-reports-registry     | 111.92.12.42 | Up 6 days | 3296cba8b3e4 | 7.0.5 |
| cfx-rda-smtp-server          | 111.92.12.42 | Up 6 days | 0f9884b6e7c8 | 7.0.5 |
| cfx-rda-webhook-server       | 111.92.12.42 | Up 6 days | a4403dee414e | 7.0.5 |
| current-alerts-cfx-rda-      | 111.92.12.42 | Up 6 days | d6cc63214103 | 7.0.5 |
| dataset-caas                 |              |           |              |       |
+------------------------------+--------------+-----------+--------------+-------+

Important

Please take RDAF platform's full data backup before performing an upgrade. For more information on RDAF platform's backup and restore commands using rdaf CLI, please refer at RDAF Platform Backup

Download RDAF Platform & OIA Images

  • Login into On-premise docker registry VM as rdauser using SSH client and run the below command to download RDAF platform's and OIA (AIOps) application service's updated images.
rdaf registry fetch --tag 1.0.2,3.1.0,7.0.6
  • Please wait until all of the RDAF platform's and OIA (AIOps) application service's images are downloaded. Run the below command to verify if the images are downloaded appropriately.
rdaf registry list-tags

Upgrade RDAF deployment CLI on RDAF Platform VM

Please follow and repeat the steps to download and upgrade the rdaf deployment CLI on RDAF platform VM using the steps outlined under RDAF CLI Upgrade on On-premise docker registry VM section.

Upgrade RDAF Platform & OIA Services

RDAF Platform Services Upgrade:

Run the below command to upgrade RDAF platform's services to version

rdafk8s platform upgrade --tag 3.1.0

Once above command is completed, run the below command to verify all of RDAF platform's services are upgraded to the specified version and all of their corresponding containers are in running state.

kubectl get pods -n rda-fabric -l app_category=rdaf-platform

RDAF Client CLI Upgrade:

Run the below command to upgrade the RDAF client CLI rdac to latest version.

rdafk8s rdac_cli upgrade --tag 3.1.0

After rdac CLI is upgraded, run the below commands to see all of the running RDAF platform's services pods.

rdac pods 
+-------+----------------------------------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Host         | ID       | Site        | Age             |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| App   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | 4:13:45         |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de |             | 1 day, 18:33:27 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | 4:13:31         |      8 |        31.21 |               |              |
| App   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | 4:13:14         |      8 |        31.21 |               |              |
| Infra | api-server                             | 0656b4230f44 | 6d4d40ab |             | 0:33:06         |      8 |        31.21 |               |              |
| Infra | collector                              | 6336341682ad | 042af0af |             | 4:11:19         |      8 |        31.21 |               |              |
| Infra | registry                               | cae649622fba | 4e4c4a4d |             | 4:11:03         |      8 |        31.21 |               |              |
| Infra | scheduler                              | 3ab379305be1 | b2bb9915 | *leader*    | 4:10:59         |      8 |        31.21 |               |              |
+-------+----------------------------------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+

Run the below command to verify functional health of each platform's service and verify all of their status is in OK state.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                               |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_infra | api-server                             | 0656b4230f44 | 6d4d40ab |             | service-status                                      | ok       |                                                       |
| rda_infra | api-server                             | 0656b4230f44 | 6d4d40ab |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | asset-dependency                       | e006dfd39d9b | 9f02a8f1 |             | service-status                                      | ok       |                                                       |
| rda_app   | asset-dependency                       | e006dfd39d9b | 9f02a8f1 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | authenticator                          | 1782a79e36c5 | adda9bc0 |             | service-status                                      | ok       |                                                       |
| rda_app   | authenticator                          | 1782a79e36c5 | adda9bc0 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | authenticator                          | 1782a79e36c5 | adda9bc0 |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-access-manager       | d412efb99f2e | ccb83d20 |             | DB-connectivity                                     | ok       |                                                       |                                                  |
| rda_app   | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | service-status                                      | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | service-dependency:cfxdimensions-app-access-manager | ok       | 1 pod(s) found for cfxdimensions-app-access-manager   |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | cfxdimensions-app-resource-manager     | ec87d2ee6387 | 33ee28ca |             | DB-connectivity                                     | ok       |                                                       |
| rda_infra | collector                              | 6336341682ad | 042af0af |             | service-status                                      | ok       |                                                       |
| rda_infra | collector                              | 6336341682ad | 042af0af |             | minio-connectivity                                  | ok       |                                                       |
| rda_infra | collector                              | 6336341682ad | 042af0af |             | opensearch-connectivity:default                     | ok       |                                                       |
| rda_infra | scheduler                              | 3ab379305be1 | b2bb9915 |             | service-status                                      | ok       |                                                       |
| rda_infra | scheduler                              | 3ab379305be1 | b2bb9915 |             | minio-connectivity                                  | ok       |                                                       |
| rda_infra | scheduler                              | 3ab379305be1 | b2bb9915 |             | DB-connectivity                                     | ok       |                                                       |
| rda_app   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | service-status                                      | ok       |                                                       |
| rda_app   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | service-dependency:registry                         | ok       | 1 pod(s) found for registry                           |
| rda_app   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | user-preferences                       | 520bca813ddf | f4ca7d44 |             | DB-connectivity                                     | ok       |                                                       |                                      |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+

RDAF Worker Service Upgrade:

Run the below command to upgrade RDAF worker services to latest version.

rdafk8s worker upgrade --tag 3.1.0

After upgrading the RDAF worker service using the above command, run the below command to verify it's running status and the version.

kubectl get pods -n rda-fabric -l app_category=rdaf-worker
+------------+--------------+-------------+--------------+-------+
| Name       | Host         | Status      | Container Id | Tag   |
+------------+--------------+-------------+--------------+-------+
| rda_worker | 111.92.12.60 | Up 1 minute | 4ce2a8f13d16 | 3.1.0 |
+------------+--------------+-------------+--------------+-------+

Run the below command to verify the functional health of the each RDAF worker service and verify that all of their status is in OK state.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                               |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_infra | api-server                             | 0656b4230f44 | 6d4d40ab |             | service-status                                      | ok       |                                                       |
...
...
| rda_infra | worker                                 | 4ce2a8f13d16 | d627124d | rda-site-01 | service-status                                      | ok       |                                                       |
| rda_infra | worker                                 | 4ce2a8f13d16 | d627124d | rda-site-01 | minio-connectivity                                  | ok       |                                                       |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+

Create Kafka Topics for OIA Application Services:

Download the below script and execute it on where rdafk8s setup was run during the initial RDAF platform setup. Please make sure the file /opt/rdaf/rdaf.cfg exist which is required for the below script to execute successfully.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.2/add_kafka_topics.py
python add_kafka_topics.py upgrade

RDAF OIA Application Services Upgrade:

Run the below command to upgrade the RDAF OIA (AIOps) application services to latest version.

rdafk8s app upgrade OIA --tag 7.0.6

Once above command is completed, run the below command to verify all of the RDAF OIA application services are upgraded to the specified version and all of their corresponding containers are in running state.

kubectl get pods -n rda-fabric -l app_category=rdaf-application

Wait for 3 to 5 minutes and run the below command to verify the functional health of each RDAF OIA application service and verify all of their status is in OK state.

rdac healthcheck

2.2. Upgrade from 7.2.0.x to 7.2.1.1

RDAF Platform: From 3.2.0.3 to 3.2.1.3

OIA (AIOps) Application: From 7.2.0.3 to 7.2.1.1/7.2.1.5

RDAF Deployment rdaf & rdafk8s CLI: From 1.1.7 to 1.1.8

RDAF Client rdac CLI: From 3.2.0.3 to 3.2.1.3

2.2.1. Upgrade Prerequisites

Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.

Important

Please make sure full backup of the RDAF platform system is completed before performing the upgrade.

Kubernetes: Please run the below backup command to take the backup of application data.

rdafk8s backup --dest-dir <backup-dir>
Non-Kubernetes: Please run the below backup command to take the backup of application data.
rdaf backup --dest-dir <backup-dir>
Note: Please make sure this backup-dir is mounted across all infra,cli vms.

Run the below command on RDAF Management system and make sure the Kubernetes PODs are NOT in restarting mode (it is applicable to only Kubernetes environment)

kubectl get pods -n rda-fabric -l app_category=rdaf-infra
kubectl get pods -n rda-fabric -l app_category=rdaf-platform
kubectl get pods -n rda-fabric -l app_component=rda-worker 
kubectl get pods -n rda-fabric -l app_name=oia 

  • Verify that RDAF deployment rdaf & rdafk8s CLI's version is 1.1.7 on the VM where CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.
rdaf --version
rdafk8s --version
  • On-premise docker registry service version is 1.0.2
docker ps | grep docker-registry
  • RDAF Infrastructure services version is 1.0.2 (rda-nats service version is 1.0.2.1 and rda-minio service version is RELEASE.2022-11-11T03-44-20Z)

Run the below command to get RDAF Infra services details

rdafk8s infra status

Run the below command to get RDAF Infra services details

rdaf infra status
  • RDAF Platform services version is 3.2.0.3

Run the below command to get RDAF Platform services details

rdafk8s platform status

Run the below command to get RDAF Platform services details

rdaf platform status
  • RDAF OIA Application services version is 7.2.0.3 (rda-event-consumer service version is 7.2.0.5)

Run the below command to get RDAF App services details

rdafk8s app status

Run the below command to get RDAF App services details

rdaf app status

Login into the VM where rdaf & rdafk8s deployment CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.

  • Download the RDAF Deployment CLI's newer version 1.1.8 bundle.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/rdafcli-1.1.8.tar.gz
  • Upgrade the rdaf & rdafk8s CLI to version 1.1.8
pip install --user rdafcli-1.2.0 tar.gz
  • Verify the installed rdaf & rdafk8s CLI version is upgraded to 1.1.8
rdaf --version
rdafk8s --version

Download the below python script which is going to be used to identify K8s POD names for each RDA Fabric service POD Ids. Skip this step if this script was already downloaded.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.6/maint_command.py

Download the below upgrade python script.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/rdaf_k8s_upgrade_117_118.py

Please run the below python upgrade script. It creates a kafka topic called fsm-events, creates /opt/rdaf/config/network_config/policy.json file, and adds rda-fsm service to values.yaml file.

python rdaf_k8s_upgrade_117_118.py

Important

Please make sure above upgrade script is executed before moving to next step.

  • Download the RDAF Deployment CLI's newer version 1.1.8 bundle and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.

For RHEL OS Environment

wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/offline-rhel-1.1.8.tar.gz

For Ubuntu OS Environment

wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/offline-ubuntu-1.1.8.tar.gz

  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.1.8.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.1.8
  • Upgrade the rdaf & rdafk8s CLI to version 1.1.8
pip install --user rdafcli-1.1.8.tar.gz -f ./ --no-index
  • Verify the installed rdaf & rdafk8s CLI version
rdaf --version
rdafk8s --version

Download the below upgrade script and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/rdaf_k8s_upgrade_117_118.py

Please run the downloaded python upgrade script. It creates a kafka topic called fsm-events, creates /opt/rdaf/config/network_config/policy.json file, and adds rda-fsm service to values.yaml file.

python rdaf_k8s_upgrade_117_118.py

Important

Please make sure above upgrade script is executed before moving to next step.

  • Download the RDAF Deployment CLI's newer version 1.1.8 bundle
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/offline-rhel-1.1.8.tar.gz
  • Upgrade the rdaf CLI to version 1.1.8
pip install --user rdafcli-1.1.8.tar.gz
  • Verify the installed rdaf CLI version is upgraded to 1.1.8
rdaf --version
  • To stop application services, run the below command. Wait until all of the services are stopped.
rdaf app down OIA
rdaf app status
  • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
rdaf worker down
rdaf worker status
  • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
rdaf platform down

rdaf platform status
* Upgrade kafka using below command

rdaf infra upgrade --tag 1.0.2 --service kafka

Run the below RDAF command to check infra status

rdaf infra status
+----------------+--------------+-----------------+--------------+-------+
| Name           | Host         | Status          | Container Id | Tag   |
+----------------+--------------+-----------------+--------------+-------+
| haproxy        | 192.168.131.41 | Up 2 weeks      | ee9d25dc2276 | 1.0.2 |
|                |              |                 |              |       |
| haproxy        | 192.168.131.42 | Up 2 weeks      | e6ad57ac421d | 1.0.2 |
|                |              |                 |              |       |
| keepalived     | 192.168.131.41 | active          | N/A          | N/A   |
|                |              |                 |              |       |
| keepalived     | 192.168.131.42 | active          | N/A          | N/A   |
+----------------+--------------+-----------------+--------------+-------+

Run the below RDAF command to check infra healthcheck status

rdaf infra healthcheck
+----------------+-----------------+--------+----------------------+--------------+--------------+
| Name           | Check           | Status | Reason               | Host         | Container Id |
+----------------+-----------------+--------+----------------------+--------------+--------------+
| haproxy        | Port Connection | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Service Status  | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Firewall Port   | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Port Connection | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| haproxy        | Service Status  | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| haproxy        | Firewall Port   | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| keepalived     | Service Status  | OK     | N/A                  | 192.168.107.63 | N/A          |
| keepalived     | Service Status  | OK     | N/A                  | 192.168.107.64 | N/A          |
| nats           | Port Connection | OK     | N/A                  | 192.168.107.63 | f57ed825681b |
| nats           | Service Status  | OK     | N/A                  | 192.168.107.63 | f57ed825681b |
+----------------+-----------------+--------+----------------------+--------------+--------------+

Note

Please take backup of /opt/rdaf/deployment-scripts/values.yaml

Download the below upgrade python script.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/rdaf_upgrade_116_117_118.py

Please run the below python upgrade script. It creates a few topics for applying config changes to existing topics in HA setups. creates /opt/rdaf/config/network_config/policy.json file, adding fsm service to values.yaml file.

python rdaf_upgrade_116_117_118.py

Important

Please make sure above upgrade script is executed before moving to next step.

  • Download the RDAF Deployment CLI's newer version 1.1.8 bundle and copy it to RDAF management VM on which rdaf deployment CLI was installed.

For RHEL OS Environment

wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/offline-rhel-1.1.8.tar.gz

For Ubuntu OS Environment

wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/offline-ubuntu-1.1.8.tar.gz

  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.1.8.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.1.8
  • Upgrade the rdafCLI to version 1.1.8
pip install --user rdafcli-1.1.8.tar.gz -f ./ --no-index
  • Verify the installed rdaf & rdafk8s CLI version
rdaf --version

Download the below upgrade script and copy it to RDAF management VM on which rdaf deployment CLI was installed.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.8/rdaf_upgrade_116_117_118.py

Please run the below python upgrade script. It creates a few topics for applying config changes to existing topics in HA setups. creates /opt/rdaf/config/network_config/policy.json file, adding fsm service to values.yaml file.

python rdaf_upgrade_116_117_118.py

Important

Please make sure above upgrade script is executed before moving to next step.

2.2.2. Download the new Docker Images

Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.

rdaf registry fetch --tag 3.2.1.3,7.2.1.1,7.2.1.5,7.2.1.6

Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.

rdaf registry list-tags 

Please make sure 3.2.1.3 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-registry
  • rda-rda-scheduler
  • rda-collector
  • rda-stack-mgr
  • rda-identity
  • rda-fsm
  • rda-access-manager
  • rda-resource-manager
  • rda-user-preferences
  • onprem-portal
  • onprem-portal-nginx
  • rda-worker-all
  • onprem-portal-dbinit
  • cfxdx-nb-nginx-all
  • rda-event-gateway
  • rdac
  • rdac-full

Please make sure 7.2.1.1 image tag is downloaded for the below RDAF OIA Application services.

  • rda-app-controller
  • rda-alert-processor
  • rda-file-browser
  • rda-smtp-server
  • rda-ingestion-tracker
  • rda-reports-registry
  • rda-ml-config
  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-alert-ingester
  • rda-collaboration
  • rda-notification-service
  • rda-configuration-service

Please make sure 7.2.1.5 image tag is downloaded for the below RDAF OIA Application services.

  • rda-smtp-server
  • rda-event-consumer
  • rda-webhook-server
  • rda-collaboration
  • rda-configuration-service
  • rda-alert-ingester

Downloaded Docker images are stored under the below path.

/opt/rdaf/data/docker/registry/v2

Run the below command to check the filesystem's disk usage on which docker images are stored.

df -h /opt

Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.

rdaf registry delete-images --tag <tag1,tag2>

2.2.3.Upgrade Steps

2.2.3.1 Upgrade RDAF Platform Services

Step-1: Run the below command to initiate upgrading RDAF Platform services.

rdafk8s platform upgrade --tag 3.2.1.3

As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state.

kubectl get pods -n rda-fabric -l app_category=rdaf-platform

Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF platform service PODs

for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.

Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.1.3 version.

rdafk8s platform status
+----------------------+--------------+---------------+--------------+---------+
| Name                 | Host         | Status        | Container Id | Tag     |
+----------------------+--------------+---------------+--------------+---------+
| rda-api-server       | 192.168.131.45 | Up 2 Days ago | dde8ab1f9331 | 3.2.1.3 |
| rda-api-server       | 192.168.131.44 | Up 2 Days ago | e6ece7235e72 | 3.2.1.3 |
| rda-registry         | 192.168.131.45 | Up 2 Days ago | a577766fb8b2 | 3.2.1.3 |
| rda-registry         | 192.168.131.44 | Up 2 Days ago | 1aecc089b0c3 | 3.2.1.3 |
| rda-identity         | 192.168.131.45 | Up 2 Days ago | fea1c0ef7263 | 3.2.1.3 |
| rda-identity         | 192.168.131.44 | Up 2 Days ago | 2a48f402f678 | 3.2.1.3 |
| rda-fsm              | 192.168.131.45 | Up 2 Days ago | 5006c8a6e5f3 | 3.2.1.3 |
| rda-fsm              | 192.168.131.44 | Up 2 Days ago | 199cac791a90 | 3.2.1.3 |
| rda-access-manager   | 192.168.131.44 | Up 2 Days ago | e20495c61be2 | 3.2.1.3 |
| ....                 | ....           | ....          | ....         | ....    |
+----------------------+--------------+---------------+--------------+---------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | cf4c4f37c47a | 0633b451 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | cf4c4f37c47a | 0633b451 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | cf4c4f37c47a | 0633b451 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | cf4c4f37c47a | 0633b451 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | cf4c4f37c47a | 0633b451 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app   | alert-ingester                         | 7b9f1370e018 | f348532b |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 7b9f1370e018 | f348532b |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 7b9f1370e018 | f348532b |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 7b9f1370e018 | f348532b |             | service-initialization-status                       | ok       |                                                             |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.2.3.2 Upgrade rdac CLI

Run the below command to upgrade the rdac CLI

rdafk8s rdac_cli upgrade --tag 3.2.1.3
2.2.3.3 Upgrade RDA Worker Services

Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.

rdafk8s worker upgrade --tag 3.2.1.3

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating state.

kubectl get pods -n rda-fabric -l app_component=rda-worker

Step-3: Run the below command to put all Terminating RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance command that is required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF worker service PODs

for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds between each RDAF worker service upgrade by repeating above steps from Step-2 to Step-6 for rest of the RDAF worker service PODs.

Step-6: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.

rdac pods | grep rda-worker
rdafk8s worker status
+------------+--------------+---------------+--------------+---------+
| Name       | Host         | Status        | Container Id | Tag     |
+------------+--------------+---------------+--------------+---------+
| rda-worker | 192.168.131.50 | Up 2 Days ago | 497059c45d6e | 3.2.1.3 |
| rda-worker | 192.168.131.49 | Up 2 Days ago | 434b2ca40ed8 | 3.2.1.3 |
| ....       | ....           | ....          | ....         | ....    |
+------------+--------------+---------------+--------------+---------+

Step-7: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck
2.2.3.4 Upgrade OIA Application Services

Step-1: Run the below commands to initiate upgrading RDAF OIA Application services. First command upgrades the specified services from 7.2.0.3 to 7.2.1.1 and following command upgrades rest of the services from 7.2.0.3 to 7.2.1.5 and 7.2.1.6 respectively

rdafk8s app upgrade OIA --tag 7.2.1.1 --service rda-app-controller --service rda-alert-processor --service rda-file-browser --service rda-ingestion-tracker --service rda-reports-registry --service rda-ml-config --service rda-irm-service --service rda-notification-service
rdafk8s app upgrade OIA --tag 7.2.1.5 --service rda-smtp-server --service rda-event-consumer --service rda-webhook-server --service rda-collaboration --service rda-configuration-service
rdafk8s app upgrade OIA --tag 7.2.1.6 --service rda-alert-ingester

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each OIA application service is in Terminating state.

kubectl get pods -n rda-fabric -l app_name=oia

Step-3: Run the below command to put all Terminating OIA application service PODs into maintenance mode. It will list all of the POD Ids of OIA application services along with rdac maintenance command that are required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-oia-app-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the OIA application services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating OIA application service PODs

for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
kubectl get pods -n rda-fabric -l app_name=oia

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the OIA application service PODs.

Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.1.1 or 7.2.1.5 version.

rdafk8s app status
+---------------------------+--------------+---------------+--------------+---------+
| Name                      | Host         | Status        | Container Id | Tag     |
+---------------------------+--------------+---------------+--------------+---------+
| rda-alert-ingester        | 192.168.131.49 | Up 5 Days ago | b323998abd15 | 7.2.1.1 |
| rda-alert-ingester        | 192.168.131.50 | Up 5 Days ago | 710f262e27aa | 7.2.1.1 |
| rda-alert-processor       | 192.168.131.47 | Up 5 Days ago | ec1c53d94439 | 7.2.1.1 |
| rda-alert-processor       | 192.168.131.46 | Up 5 Days ago | deee4db62708 | 7.2.1.1 |
| rda-app-controller        | 192.168.131.49 | Up 5 Days ago | ef96deb9adda | 7.2.1.1 |
| rda-app-controller        | 192.168.131.50 | Up 5 Days ago | 6880b5632adb | 7.2.1.1 |
| rda-collaboration         | 192.168.131.49 | Up 2 Days ago | cc1b1c882250 | 7.2.1.5 |
| rda-collaboration         | 192.168.131.50 | Up 2 Days ago | 13be7e8bfa3f | 7.2.1.5 |
+---------------------------+--------------+---------------+--------------+---------+

Step-7: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

Warning

For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Run the below command to initiate upgrading RDAF Platform services.

rdaf platform upgrade --tag 3.2.1.3
Please wait till all of the new platform service are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.1.3 version.
rdaf platform status

+--------------------------+--------------+------------+--------------+---------+
| Name                     | Host         | Status     | Container Id | Tag     |
+--------------------------+--------------+------------+--------------+---------+
| cfx-rda-access-manager   | 192.168.107.60 | Up 6 hours | 80dac9d727a3 | 3.2.1.3 |
| cfx-rda-resource-manager | 192.168.107.60 | Up 6 hours | 68534a5c1d4c | 3.2.1.3 |
| cfx-rda-user-preferences | 192.168.107.60 | Up 6 hours | 78405b639915 | 3.2.1.3 |
| portal-backend           | 192.168.107.60 | Up 6 hours | 636e6968f661 | 3.2.1.3 |
| portal-frontend          | 192.168.107.60 | Up 6 hours | 2fd426bd6aa2 | 3.2.1.3 |
| rda_api_server           | 192.168.107.60 | Up 6 hours | e0994b366f98 | 3.2.1.3 |
| rda_asset_dependency     | 192.168.107.60 | Up 6 hours | 07610621408c | 3.2.1.3 |
| rda_collector            | 192.168.107.60 | Up 6 hours | 467d6b3d13f8 | 3.2.1.3 |
| rda_fsm                  | 192.168.107.60 | Up 6 hours | e32de86fe341 | 3.2.1.3 |
| rda_identity             | 192.168.107.60 | Up 6 hours | 45136d89b2cf | 3.2.1.3 |
| rda_registry             | 192.168.107.60 | Up 6 hours | 334d7d4cfa41 | 3.2.1.3 |
| rda_scheduler            | 192.168.107.60 | Up 6 hours | acf5a9ab556a | 3.2.1.3 |
+--------------------------+--------------+------------+--------------+---------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=1, Brokers=[2, 3, 1] |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
  • Upgrade rdac CLI

Run the below command to upgrade the rdac CLI

rdaf rdac_cli upgrade --tag 3.2.1.3
  • Upgrade RDA Worker Services

Please run the below command to initiate upgrading the RDA Worker service PODs.

rdaf worker upgrade --tag 3.2.1.3
Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.

rdac pods | grep worker
rdaf worker status

+------------+--------------+-----------+--------------+---------+
| Name       | Host         | Status    | Container Id | Tag     |
+------------+--------------+-----------+--------------+---------+
| rda_worker | 192.168.107.61 | Up 2 days | d951118ee757 | 3.2.1.3 |
| rda_worker | 192.168.107.62 | Up 2 days | f7033a72f013 | 3.2.1.3 |
+------------+--------------+-----------+--------------+---------+
Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck

Run the below commands to initiate upgrading RDAF OIA Application services. First command upgrades the specified services from 7.2.0.3 to 7.2.1.1 and second command upgrades rest of the services from 7.2.0.3 to 7.2.1.5

rdaf app upgrade OIA --tag 7.2.1.1 --service cfx-rda-app-controller --service cfx-rda-alert-processor --service cfx-rda-file-browser --service cfx-rda-ingestion-tracker --service cfx-rda-reports-registry --service cfx-rda-ml-config --service cfx-rda-irm-service --service cfx-rda-notification-service
rdaf app upgrade OIA --tag 7.2.1.5 --service cfx-rda-smtp-server --service  cfx-rda-event-consumer  --service cfx-rda-webhook-server --service  cfx-rda-collaboration --service cfx-rda-configuration-service 
rdaf app upgrade OIA --tag 7.2.1.6 --service  cfx-rda-alert-ingester

Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.1.1 or 7.2.1.5 version.

rdaf app status
+-------------------------------+--------------+-----------+--------------+---------+
| Name                          | Host         | Status    | Container Id | Tag     |
+-------------------------------+--------------+-----------+--------------+---------+
| cfx-rda-alert-ingester        | 192.168.107.66 | Up 2 days | 79d6756db639 | 7.2.1.5 |
| cfx-rda-alert-ingester        | 192.168.107.67 | Up 2 days | 9a0775246a0f | 7.2.1.5 |
| cfx-rda-alert-processor       | 192.168.107.66 | Up 2 days | 057552584cfe | 7.2.1.1 |
| cfx-rda-alert-processor       | 192.168.107.67 | Up 2 days | 787f0cb42734 | 7.2.1.1 |
| cfx-rda-app-controller        | 192.168.107.66 | Up 2 days | 07f406e984ad | 7.2.1.1 |
| cfx-rda-app-controller        | 192.168.107.67 | Up 2 days | 0b27802473c1 | 7.2.1.1 |
| cfx-rda-collaboration         | 192.168.107.66 | Up 2 days | 7322550c3cee | 7.2.1.5 |
+-------------------------------+--------------+-----------+--------------+---------+

2.2.4.Post Upgrade Steps

  • (Optional) Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action

  • Enable ML experiments manually if any experiments are configured (Organization --> Configuration --> ML Experiments)

  • (Optional) Add the following to All Incident Mappings.

## prefarbly after projectId field's json block

{
  "to": "notificationId",
  "from": "notificationId"
}, 
  • (Optional) New option called skip_retry_on_keywords is added within the Incident mapper, which will allow the user to control when to skip retry attempt while making an API call during create or update ticket operations on external ITSM system. (Ex: ServiceNow).

In the below example, if the API error response contains serviceIdentifier is not available or Ticket is already in inactive state no update is allowed message, it will skip retrying the API call as these are expected errors and retrying will not API successful.

{
  "to": "skip_retry_on_keywords",
  "func": {
    "evaluate": {
      "expr": "'[\"serviceIdentifier is not available\",\"Ticket is already in Inactive state no update is allowed\"]'"
    }
  }
}

2.3. Upgrade from 7.2.1.x to 7.2.2

2.3.1. Pre-requisites

Below are the pre-requisites which need to be in place before upgrading the OIA (AIOps) application services.

  • RDAF Deployment CLI Version: 1.1.8

  • RDAF Infrastructure Services Tag Version: 1.0.2,1.0.2.1(nats)

  • RDAF Core Platform & Worker Services Tag Version: 3.2.1 / 3.2.1.x

  • RDAF Client (RDAC) Tag Version: 3.2.1 / 3.2.1.x

  • OIA Services Tag Version: 7.2.1 / 7.2.1.x

  • CloudFabrix recommends taking VMware VM snapshots where AIOps solution is deployed

Important

Applicable only if FSM is configured for ITSM ticketing:

Before proceeding with the upgrade, please make sure to disable the below Service Blueprints.

  • Create Ticket
  • Update Ticket
  • Resolve Ticket
  • Read Alert Stream
  • Read Incident Stream
  • Read ITSM ticketing Inbound Notifications

Warning

Make sure all of the above pre-requisites are met before proceeding with the upgrade process.

Warning

Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

  • Download the RDAF Deployment CLI's newer version 1.1.9 bundle
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/rdafcli-1.1.9.tar.gz
  • Upgrade the rdaf CLI to version 1.1.9
pip install --user rdafcli-1.1.9.tar.gz
  • Verify the installed rdaf CLI version is upgraded to 1.1.9
rdaf --version
  • Download the RDAF Deployment CLI's newer version 1.1.9 bundle and copy it to RDAF management VM on which rdaf deployment CLI was installed.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/offline-rhel-1.1.9.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.1.9.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.1.9
  • Upgrade the rdafCLI to version 1.1.9
pip install --user rdafcli-1.1.9.tar.gz -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/offline-ubuntu-1.1.9.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.1.9.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.1.9
  • Upgrade the rdafCLI to version 1.1.9
pip install --user rdafcli-1.1.9.tar.gz -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
  • To stop OIA (AIOps) application services, run the below command. Wait until all of the services are stopped.
rdaf app down OIA
rdaf app status
  • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
rdaf worker down
rdaf worker status
  • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.

rdaf platform down
rdaf platform status

  • Upgrade kafka using below command
rdaf infra upgrade --tag 1.0.2 --service kafka

Run the below RDAF command to check infra status

rdaf infra status
+----------------+--------------+-----------------+--------------+------------------------------+
| Name           | Host         | Status          | Container Id | Tag                          |
+----------------+--------------+-----------------+--------------+------------------------------+
| haproxy        | 192.168.107.40 | Up 2 weeks      | 92875cebe689 | 1.0.2                        |
| keepalived     | 192.168.107.40 | Not Provisioned | N/A          | N/A                          |
| nats           | 192.168.107.41 | Up 2 weeks      | e365e0b794c7 | 1.0.2.1                      |
| minio          | 192.168.107.41 | Up 2 weeks      | 900c8b078059 | RELEASE.2022-11-11T03-44-20Z |
| mariadb        | 192.168.107.41 | Up 2 weeks      | c549e07c2688 | 1.0.2                        |
| opensearch     | 192.168.107.41 | Up 2 weeks      | 783204d75ba9 | 1.0.2                        |
| zookeeper      | 192.168.107.41 | Up 2 weeks      | f51138ff8a95 | 1.0.2                        |
| kafka          | 192.168.107.41 | Up 4 days       | 255020d998c9 | 1.0.2                        |
| redis          | 192.168.107.41 | Up 2 weeks      | 5d929327121d | 1.0.2                        |
| redis-sentinel | 192.168.107.41 | Up 2 weeks      | 4a5fdde49a21 | 1.0.2                        |
+----------------+--------------+-----------------+--------------+------------------------------+

Run the below RDAF command to check infra healthcheck status

rdaf infra healthcheck
+----------------+-----------------+--------+----------------------+--------------+--------------+
| Name           | Check           | Status | Reason               | Host         | Container Id |
+----------------+-----------------+--------+----------------------+--------------+--------------+
| haproxy        | Port Connection | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Service Status  | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Firewall Port   | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy        | Port Connection | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| haproxy        | Service Status  | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| haproxy        | Firewall Port   | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58 |
| keepalived     | Service Status  | OK     | N/A                  | 192.168.107.63 | N/A          |
| keepalived     | Service Status  | OK     | N/A                  | 192.168.107.64 | N/A          |
| nats           | Port Connection | OK     | N/A                  | 192.168.107.63 | f57ed825681b |
| nats           | Service Status  | OK     | N/A                  | 192.168.107.63 | f57ed825681b |
+----------------+-----------------+--------+----------------------+--------------+--------------+
  • Run the below python upgrade script. It is for applying the below configuration & settings.

    • Create kafka topics and configure the topic message max size to 8mb
    • Create kafka-external user in config.json.
    • Add new alert-processor companion service settings in values.yaml
    • Configure and apply security index purge policy for Opensearch

Important

Take a backup of /opt/rdaf/deployment-scripts/values.yaml before running the below upgrade script.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/rdaf_upgrade_118_119.py
python rdaf_upgrade_118_119.py

Important

Make sure above upgrade script is executed before moving to next step.

2.3.2. Download the new Docker Images

Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.

rdaf registry fetch --tag 3.2.2,7.2.2

Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.

rdaf registry list-tags 

Make sure 3.2.2 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-registry
  • rda-rda-scheduler
  • rda-collector
  • rda-stack-mgr
  • rda-identity
  • rda-fsm
  • rda-access-manager
  • rda-resource-manager
  • rda-user-preferences
  • onprem-portal
  • onprem-portal-nginx
  • rda-worker-all
  • onprem-portal-dbinit
  • cfxdx-nb-nginx-all
  • rda-event-gateway
  • rdac
  • rdac-full

Make sure 7.2.2 image tag is downloaded for the below RDAF OIA Application services.

  • rda-app-controller
  • rda-alert-processor
  • rda-file-browser
  • rda-smtp-server
  • rda-ingestion-tracker
  • rda-reports-registry
  • rda-ml-config
  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-alert-ingester
  • rda-collaboration
  • rda-notification-service
  • rda-configuration-service

Downloaded Docker images are stored under the below path.

/opt/rdaf/data/docker/registry/v2

Run the below command to check the filesystem's disk usage on which docker images are stored.

df -h /opt

Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.

rdaf registry delete-images --tag <tag1,tag2>

2.3.3. Upgrade Services

2.3.3.1 Upgrade RDAF Platform Services

Run the below command to initiate upgrading RDAF Platform services.

rdaf platform upgrade --tag 3.2.2
Wait till all of the new platform service are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.2 version.
rdaf platform status

+---------------------+--------------+------------+--------------+-------+
| Name                | Host         | Status     | Container Id | Tag   |
+---------------------+--------------+------------+--------------+-------+
| rda_api_server      | 192.168.107.60 | Up 4 hours | 0da7ebeadceb | 3.2.2 |
| rda_registry        | 192.168.107.60 | Up 4 hours | 841a4e03447d | 3.2.2 |
| rda_scheduler       | 192.168.107.60 | Up 4 hours | 806af221a299 | 3.2.2 |
| rda_collector       | 192.168.107.60 | Up 4 hours | 9ae8da4d2182 | 3.2.2 |
| rda_asset_dependenc | 192.168.107.60 | Up 4 hours | e96cf642b2d6 | 3.2.2 |
| y                   |              |            |              |       |
| rda_identity        | 192.168.107.60 | Up 4 hours | 2a57ce63a756 | 3.2.2 |
| rda_fsm             | 192.168.107.60 | Up 4 hours | 2b645a75b5f0 | 3.2.2 |
+--------------------------+--------------+------------+--------------+--+
2.3.3.2 Upgrade RDAC cli

Run the below command to upgrade the rdac CLI

rdaf rdac_cli upgrade --tag 3.2.2

Run the below command to verify that one of the scheduler service is elected as a leader under Site column.

rdac pods
+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host         | ID       | Site        | Age     |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App   | fsm                                    | True        | 8b5dfca4cce9 | c0a8bbd7 |             | 7:33:16 |      8 |        31.21 |               |              |
| App   | ingestion-tracker                      | True        | d37e78507693 | e1bd1405 |             | 7:21:16 |      8 |        31.21 |               |              |
| App   | ml-config                              | True        | 0c73604632bc | 65594689 |             | 7:22:02 |      8 |        31.21 |               |              |
| App   | reports-registry                       | True        | be82a9e704a2 | 567f1275 |             | 7:25:23 |      8 |        31.21 |               |              |
| App   | smtp-server                            | True        | 08a8dd347660 | 06242bab |             | 7:23:35 |      8 |        31.21 |               |              |
| App   | user-preferences                       | True        | fc7a4a5a0591 | 53dce7ca |             | 7:32:25 |      8 |        31.21 |               |              |
| App   | webhook-server                         | True        | 20a2afb33b6c | fdb1eb21 |             | 7:23:53 |      8 |        31.21 |               |              |
| Infra | api-server                             | True        | b1e7105b231e | 33f6ed2c |             | 2:04:53 |      8 |        31.21 |               |              |
| Infra | collector                              | True        | f5abb5cac9a5 | eb17ce02 |             | 3:50:51 |      8 |        31.21 |               |              |
| Infra | registry                               | True        | ce73263c7828 | 8cda9974 |             | 7:34:05 |      8 |        31.21 |               |              |
| Infra | scheduler                              | True        | d9d62c1f1bb7 | 96047389 | *leader*    | 7:33:59 |      8 |        31.21 |               |              |
| Infra | worker                                 | True        | ba1198f05f6b | afd229a8 | rda-site-01 | 7:26:20 |      8 |        31.21 | 7             | 109          |
+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=1, Brokers=[2, 3, 1] |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.3.3.3 Upgrade RDA Worker Services

Run the below command to initiate upgrading the RDA worker service(s).

Tip

If the RDA worker is deployed in http proxy environment, add the required environment variables for http proxy settings in /opt/rdaf/deployment-scripts/values.yaml under rda_worker section. Below is the sample http proxy configuration for worker service.

rda_worker:
    mem_limit: 8G
    memswap_limit: 8G
    privileged: false
    environment:
      RDA_ENABLE_TRACES: 'no'
      RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
      http_proxy: "http://user:password@192.168.122.107:3128"
      https_proxy: "http://user:password@192.168.122.107:3128"
      HTTP_PROXY: "http://user:password@192.168.122.107:3128"
      HTTPS_PROXY: "http://user:password@192.168.122.107:3128
rdaf worker upgrade --tag 3.2.2

Wait for 120 seconds to let the newer version of RDA worker services join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA worker services.

rdac pods | grep worker
rdaf worker status
+------------+--------------+------------+--------------+-------+
| Name       | Host         | Status     | Container Id | Tag   |
+------------+--------------+------------+--------------+-------+
| rda_worker | 192.168.107.60 | Up 4 hours | d968c908d3e3 | 3.2.2 |
+------------+--------------+------------+--------------+-------+
2.3.3.4 Upgrade OIA/AIA Application Services

Run the below commands to initiate upgrading RDAF OIA/AIA Application services

rdaf app upgrade OIA/AIA --tag 7.2.2

Wait till all of the new OIA/AIA application services are in Running state and run the below command to verify their status and make sure they are running with 7.2.2 version. Check the new service cfx-rda-alert-processor-companion is deployed. Make sure all OIA/AIA services are up with the new tag.

rdaf app status

+-----------------------------------+--------------+------------+--------------+-------+
| Name                              | Host         | Status     | Container Id | Tag   |
+-----------------------------------+--------------+------------+--------------+-------+
| cfx-rda-app-controller            | 192.168.107.60 | Up 3 hours | 017692a218b8 | 7.2.2 |
| cfx-rda-reports-registry          | 192.168.107.60 | Up 3 hours | be82a9e704a2 | 7.2.2 |
| cfx-rda-notification-service      | 192.168.107.60 | Up 3 hours | 42d3c8c4861c | 7.2.2 |
| cfx-rda-file-browser              | 192.168.107.60 | Up 3 hours | 46b9dedab4b0 | 7.2.2 |
| cfx-rda-configuration-service     | 192.168.107.60 | Up 3 hours | 6bef9741ff46 | 7.2.2 |
| cfx-rda-alert-ingester            | 192.168.107.60 | Up 3 hours | 13975b9efe7d | 7.2.2 |
| cfx-rda-webhook-server            | 192.168.107.60 | Up 3 hours | 20a2afb33b6c | 7.2.2 |
| cfx-rda-smtp-server               | 192.168.107.60 | Up 3 hours | 08a8dd347660 | 7.2.2 |
| cfx-rda-event-consumer            | 192.168.107.60 | Up 3 hours | b0b62c88064a | 7.2.2 |
| cfx-rda-alert-processor           | 192.168.107.60 | Up 3 hours | ab24dcbd6e3a | 7.2.2 |
| cfx-rda-irm-service               | 192.168.107.60 | Up 3 hours | 11c92a206eaa | 7.2.2 |
| cfx-rda-ml-config                 | 192.168.107.60 | Up 3 hours | 0c73604632bc | 7.2.2 |
| cfx-rda-collaboration             | 192.168.107.60 | Up 3 hours | a5cfe5b681bb | 7.2.2 |
| cfx-rda-ingestion-tracker         | 192.168.107.60 | Up 3 hours | d37e78507693 | 7.2.2 |
| cfx-rda-alert-processor-companion | 192.168.107.60 | Up 3 hours | b74d82710af9 | 7.2.2 |
+-----------------------------------+--------------+------------+--------------+-------+
Run the below command to verify that one of the cfxdimensions-app-irm_service is elected as a leader under Site column.

rdac pods

+-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host         | ID       | Site        | Age      |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------|
| App   | alert-ingester                         | True        | 13975b9efe7d | dd32fdef |             | 12:07:37 |      8 |        31.21 |               |              |
| App   | alert-processor                        | True        | ab24dcbd6e3a | a980d44e |             | 12:06:10 |      8 |        31.21 |               |              |
| App   | alert-processor-companion              | True        | b74d82710af9 | 8f37b360 |             | 12:04:19 |      8 |        31.21 |               |              |
| App   | asset-dependency                       | True        | 83c5d941f3a6 | f17cc305 |             | 12:16:59 |      8 |        31.21 |               |              |
| App   | authenticator                          | True        | fb82e1664219 | b6f19086 |             | 12:16:47 |      8 |        31.21 |               |              |
| App   | cfx-app-controller                     | True        | 017692a218b8 | 55015d69 |             | 12:09:04 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | 87871b87d45e | b0465aa5 |             | 12:16:19 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | a5cfe5b681bb | c5b40c98 |             | 12:05:05 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | 46b9dedab4b0 | 3bcc6bc5 |             | 12:08:13 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | 11c92a206eaa | 851f07b7 | *leader*    | 12:05:48 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-notification-service | True        | 42d3c8c4861c | 891ab559 |             | 12:08:31 |      8 |        31.21 |               |              |
| App   | cfxdimensions-app-resource-manager     | True        | a35dd8127434 | 29b57c51 |             | 12:16:08 |      8 |        31.21 |               |              |
+-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------+
Run the below command to check if all RDA worker services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                               |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_app   | alert-ingester                         | 13975b9efe7d | dd32fdef |             | service-status                                      | ok       |                                                       |
| rda_app   | alert-ingester                         | 13975b9efe7d | dd32fdef |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | alert-ingester                         | 13975b9efe7d | dd32fdef |             | service-dependency:configuration-service            | ok       | 1 pod(s) found for configuration-service              |
| rda_app   | alert-ingester                         | 13975b9efe7d | dd32fdef |             | service-initialization-status                       | ok       |                                                       |
| rda_app   | alert-ingester                         | 13975b9efe7d | dd32fdef |             | kafka-connectivity                                  | ok       | Cluster=oDO7X5AZTh-78HgTt0WbrA, Broker=1, Brokers=[1] |
| rda_app   | alert-processor                        | ab24dcbd6e3a | a980d44e |             | service-status                                      | ok       |                                                       |
| rda_app   | alert-processor                        | ab24dcbd6e3a | a980d44e |             | minio-connectivity                                  | ok       |                                                       |
| rda_app   | alert-processor                        | ab24dcbd6e3a | a980d44e |             | service-dependency:cfx-app-controller               | ok       | 1 pod(s) found for cfx-app-controller                 |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+

Note

Run the rdaf prune_imagescommand on to cleanup old docker images.

2.3.4. Post Upgrade Steps

1. Download the script from below path to migrate the UI-Icon URL from private to Public

Tip

This script run is an optional step to perform only if (1) white labeling customization was done on the Login page with an uploaded image before the version upgrade, (2) you are experiencing that the custom image is no longer showing up at the Login page after the upgrade.

wget https://macaw-amer.s3.amazonaws.com/releases/RDA/3.2.2/iconlib_migration_script.py
  • Copy the above script to rda_identity platform service container. Run the below command to get the container-id for rda_identity and the host IP on which it is running.

rdaf platform status
+--------------------------+--------------+------------+--------------+-------+
| Name                     | Host         | Status     | Container Id | Tag   |
+--------------------------+--------------+------------+--------------+-------+
| rda_api_server           | 192.168.107.40 | Up 7 hours | 6540e670fdb9 | 3.2.2 |
| rda_registry             | 192.168.107.40 | Up 7 hours | 98bead8d0599 | 3.2.2 |
....
| rda_identity             | 192.168.107.40 | Up 7 hours | 67940390d61f | 3.2.2 |
| rda_fsm                  | 192.168.107.40 | Up 7 hours | a96870b5f32f | 3.2.2 |
| cfx-rda-access-manager   | 192.168.107.40 | Up 7 hours | 33362030fe52 | 3.2.2 |
| cfx-rda-resource-manager | 192.168.107.40 | Up 7 hours | 54a082b754be | 3.2.2 |
| cfx-rda-user-preferences | 192.168.107.40 | Up 7 hours | 829f95d98741 | 3.2.2 |
+--------------------------+--------------+------------+--------------+-------+

  • Login to the host on which rda_identity service is running as rdauser using SSH CLI and run below command to copy the above downloaded script.
docker cp /home/rdauser/iconlib_migration_script.py <rda_identity_container_id>:/tmp
  • Run the below command to switch into rda_identity service's container shell.
docker exec -it <rda_identity_container_id> bash
  • Execute below command to migrate the customer branding (white labelling) changes.
python /tmp/iconlib_migration_script.py

2. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action in row level

3. Enable ML experiments manually if any experiments are configured (Organization --> Configuration --> Machine Learning)

4. FSM Installation Steps ( Applicable only for Remedy ITSM ticketing deployment )

a) Update the Team configuration that was created for ITSM ticketing (Team with Source 'Others'). Include the following content in the JSON editor of the Team's configuration. Adjust or add alert sources and execution delay as necessary.

[
{
"alert_source": "SNMP",
"execution_delay": 900,
"auto_share": {
    "create": true,
    "update": true,
    "close": true,
    "resolved": true,
    "cancel": true,
    "alert_count_changes": true
}
},
{
"alert_source": "Syslog",
"execution_delay": 900,
"auto_share": {
    "create": true,
    "update": true,
    "close": true,
    "resolved": true,
    "cancel": true,
    "alert_count_changes": true
}
}
]

b) Download and Update latest FSM model Configuration ->RDA Administration -> FSM Models

Important

Take a backup of existing model before update

https://macaw-amer.s3.amazonaws.com/releases/RDA/3.2.2/oia_ticketing_with_soothing_interval.yml

c) Add formatting templates Configuration ->RDA Administration -> Formatting Templates

  • snow-notes-template
{% for r in rows %}
    <b>Message</b> : {{r.a_message}} <br>
    <b>RaisedAt</b> : {{r.a_raised_ts}} <br>
    <b>UpdatedAt</b> : {{r.a_updated_ts}} <br>
    <b>Status</b> : {{r.a_status}} <br>
    <b>AssetName</b> : {{r.a_asset_name}} <br>
    <b>AssetType</b> : {{r.a_asset_type}} <br>
    <b>RepeatCount</b> : {{r.a_repeat_count}} <br>
    <b>Action</b> : {{r.action_name}} <br>
    <br><br>
{%endfor%}
  • snow-description-template
Description : {{i_description}}

d) Deploy FSM bundles

fsm_events_kafka_publisher_bundles,oia_fsm_aots_ticketing_bundle oia_fsm_common_ticketing_bundles

e) Create 'fsm-debug-outbound-ticketing' and 'aots_ticket_notifications' PStreams from the UI if they do not already exist

{
    "case_insensitive": true,
    "retention_days": 7
}

f) Enable Service Blueprints - Read Alert Stream, Read Incident Stream, Create Ticket, Update Ticket, Resolve Ticket, Read AOTS Inbound Notifications

2.4. Upgrade from 7.2.1.x to 7.2.2.1

2.4.1. Pre-requisites

Below are the pre-requisites which need to be in place before upgrading the OIA (AIOps) application services.

  • RDAF Deployment CLI Version: 1.1.8

  • RDAF Infrastructure Services Tag Version: 1.0.2,1.0.2.1(nats)

  • RDAF Core Platform & Worker Services Tag Version: 3.2.1.3

  • RDAF Client (RDAC) Tag Version: 3.2.1 / 3.2.1.x

  • OIA Services Tag Version: 7.2.1.1/7.2.1.5/7.2.1.6

  • CloudFabrix recommends taking VMware VM snapshots where RDA Fabric platform/applications are deployed

Warning

Make sure all of the above pre-requisites are met before proceeding with the upgrade process.

Warning

Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Important

Please make sure full backup of the RDAF platform system is completed before performing the upgrade.

Kubernetes: Please run the below backup command to take the backup of application data.

rdafk8s backup --dest-dir <backup-dir>
Non-Kubernetes: Please run the below backup command to take the backup of application data.
rdaf backup --dest-dir <backup-dir>

Note: Please make sure the shared backup-dir is NFS mounted across all RDA Fabric Virtual Machines.

Run the below K8s commands and make sure the Kubernetes PODs are NOT in restarting mode (it is applicable to only Kubernetes environment)

kubectl get pods -n rda-fabric -l app_category=rdaf-infra
kubectl get pods -n rda-fabric -l app_category=rdaf-platform
kubectl get pods -n rda-fabric -l app_component=rda-worker 
kubectl get pods -n rda-fabric -l app_name=oia 

RDAF Deployment CLI Upgrade:

Please follow the below given steps.

Note

Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.

  • Run the below command to verify the current version of RDAF CLI is 1.1.8 version.
rdafk8s -v
  • Download the RDAF Deployment CLI's newer version 1.1.9.1 bundle
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/rdafcli-1.1.9.1.tar.gz
  • Upgrade the rdaf CLI to version 1.1.9.1
pip install --user rdafcli-1.1.9.1.tar.gz
  • Verify the installed rdaf CLI version is upgraded to 1.1.9.1
    rdafk8s -v
    rdaf -v
    
  • Download the RDAF Deployment CLI's newer version 1.1.9.1 bundle and copy it to RDAF management VM on which rdaf deployment CLI was installed.
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/offline-rhel-1.1.9.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.1.9.1.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.1.9.1
  • Upgrade the rdafCLI to version 1.1.9.1
pip install --user rdafcli-1.1.9.1.tar.gz -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/offline-ubuntu-1.1.9.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.1.9.1.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.1.9.1
  • Upgrade the rdafCLI to version 1.1.9.1
pip install --user rdafcli-1.1.9.1.tar.gz -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version

2.4.2. Download the new Docker Images

Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.

rdaf registry fetch --tag 3.2.2.1,7.2.2.1

Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.

rdaf registry list-tags 

Please make sure 3.2.2.1 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-registry
  • rda-rda-scheduler
  • rda-collector
  • rda-stack-mgr
  • rda-identity
  • rda-fsm
  • rda-access-manager
  • rda-resource-manager
  • rda-user-preferences
  • onprem-portal
  • onprem-portal-nginx
  • rda-worker-all
  • onprem-portal-dbinit
  • cfxdx-nb-nginx-all
  • rda-event-gateway
  • rdac
  • rdac-full

Please make sure 7.2.2.1 image tag is downloaded for the below RDAF OIA Application services.

  • rda-app-controller
  • rda-alert-processor
  • rda-file-browser
  • rda-smtp-server
  • rda-ingestion-tracker
  • rda-reports-registry
  • rda-ml-config
  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-alert-ingester
  • rda-collaboration
  • rda-notification-service
  • rda-configuration-service

Downloaded Docker images are stored under the below path.

/opt/rdaf/data/docker/registry/v2

Run the below command to check the filesystem's disk usage on which docker images are stored.

df -h /opt

Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.

rdaf registry delete-images --tag <tag1,tag2>

2.4.3. Upgrade Services

2.4.3.1 Upgrade RDAF Infra Services

Download the below upgrade script and copy it to RDAF management VM on which rdaf deployment CLI was installed.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9/rdaf_k8s_upgrade_118_119_1.py

Please run the downloaded upgrade script. It configures and applies the below changes.

  • Creates a new Kafka user specifically for allowing Kafka topics which need to be exposed to external systems to publish the data such as events or alerts or notifications.
  • Updates the /opt/rdaf/config/network_config/config.json file with newly created Kafka user's credentials.
  • Creates and applies lifecycle management policy for Opensearch's default security audit logs index to purge the older data. It is configured to purge the data that is older than 15 days.
  • Updates /opt/rdaf/deployment-scripts/values.yaml file to add the support for new alert processor companion service. It also updates rda-worker service configuration to attach a new persistent-volume. The persisten-volume is created out of local host's directory path @ /opt/rdaf/config/worker/rda_packages on which rda-worker service is running.
python rdaf_k8s_upgrade_118_119_1.py

Important

Please make sure above upgrade script is executed before moving to next step.

  • Update kafka-values.yaml with below parameters.

Tip

  • Upgrade script generates kafka-values.yaml.latest file in /opt/rdaf/deployment-scripts/ directory which will have updated configuration.
  • Please take a backup of the kafka-values.yaml file before making changes.
    cp /opt/rdaf/deployment-scripts/kafka-values.yaml /opt/rdaf/deployment-scripts/kafka-values.yaml.bak
    
  • Please skip the changes if the current kafka-values.yaml file already has below mentioned parameters.

Edit kafka-values.yaml file.

vi /opt/rdaf/deployment-scripts/kafka-values.yaml

Find the below parameter and delete it if it exists.

autoCreateTopicsEnable: false

Add below highlighted paramters. Please skip if these are already configured.

global:
  imagePullSecrets:
  - cfxregistry-cred
image:
  registry: 192.168.10.10:5000
  repository: rda-platform-kafka
  tag: 1.0.2
  pullPolicy: Always
heapOpts: -Xmx2048m -Xms2048m
defaultReplicationFactor: 3
offsetsTopicReplicationFactor: 3
transactionStateLogReplicationFactor: 3
transactionStateLogMinIsr: 2
maxMessageBytes: '8399093'
numPartitions: 15
externalAccess:
  enabled: true
  autoDiscovery:
    enabled: true
  service:
    type: NodePort
    nodePorts:
    - 31252
    - 31533
    - 31964
serviceAccount:
  create: true
rbac:
  create: true
authorizerClassName: kafka.security.authorizer.AclAuthorizer
logRetentionHours: 24
allowEveryoneIfNoAclFound: true

Apply above configuration changes to kafka infra service.

rdafk8s infra upgrade --tag 1.0.2 --service kafka
After upgrading the RDAF Kafka service using the above command, run the below command to verify it's running status.

kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i kafka
  • Please wait till all of the Kafka service pods are in Running state.
rdafk8s infra status
  • Please make sure all infra services are in Running state before moving to next section.
kubectl get pods -n rda-fabric -l app_category=rdaf-infra
  • Additionally, please run the below command to make sure there are no errors with RDA Fabric services.
rdac healthcheck
2.4.3.2 Upgrade RDAF Platform Services

Step-1: Run the below command to initiate upgrading RDAF Platform services.

rdafk8s platform upgrade --tag 3.2.2.1

As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state. (Note: Please wait if a POD is in ContainerCreating state until it is transistioned into Terminating state.)

kubectl get pods -n rda-fabric -l app_category=rdaf-platform

Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.

python maint_command.py

Note

If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.6/maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.

rdac pods --show_maintenance | grep False

Warning

Wait for 120 seconds before executing Step-6.

Step-6: Run the below command to delete the Terminating RDAF platform service PODs

for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.

Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.2.1 version.

rdafk8s platform status
+----------------------+----------------+-----------------+--------------+-------------+
| Name                 | Host           | Status          | Container Id | Tag         |
+----------------------+--------------+-----------------+--------------+---------------+
| rda-api-server       | 192.168.131.45 | Up 19 Hours ago | 4d5adbbf954b | 3.2.2.1 |
| rda-api-server       | 192.168.131.44 | Up 19 Hours ago | 2c58bccaf38d | 3.2.2.1 |
| rda-registry         | 192.168.131.44 | Up 20 Hours ago | 408a4ddcc685 | 3.2.2.1 |
| rda-registry         | 192.168.131.45 | Up 20 Hours ago | 4f01fc820585 | 3.2.2.1 |
| rda-identity         | 192.168.131.44 | Up 20 Hours ago | bdd1e91f86ec | 3.2.2.1 |
| rda-identity         | 192.168.131.45 | Up 20 Hours ago | e63af9c6e9d9 | 3.2.2.1 |
| rda-fsm              | 192.168.131.45 | Up 20 Hours ago | 3ec246cf7edd | 3.2.2.1 |
+----------------------+--------------+-----------------+--------------+---------------+

Run the below command to check one of the rda-scheduler service is elected as a leader under Site column.

rdac pods
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Infra | api-server                             | True        | rda-api-server | 35a17877 |             | 20:15:37 |      8 |        31.33 |               |              |
| Infra | api-server                             | True        | rda-api-server | 8f678e25 |             | 20:14:39 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | 17ce190d |             | 20:47:41 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | 6b91bf23 |             | 20:47:22 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-5 | 4ee8ef7d |             | 20:48:20 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-5 | 895b7f5c |             | 20:47:39 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | ab79ba8d |             | 20:47:43 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | f2cefc92 | *leader*    | 20:47:23 |      8 |        31.33 |               |              |
| Infra | worker                                 | True        | rda-worker-df5 | e2174794 | rda-site-01 | 20:28:50 |      8 |        31.33 | 1             | 97           |
| Infra | worker                                 | True        | rda-worker-df5 | 6debca1d | rda-site-01 | 20:26:08 |      8 |        31.33 | 2             | 91           |
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | rda-alert-in | 1afb8c8d |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 1afb8c8d |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 1afb8c8d |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | rda-alert-in | 1afb8c8d |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 1afb8c8d |             | kafka-connectivity                                  | ok       | Cluster=nzyeX9qkR-ChWXC0fRvSyQ, Broker=0, Brokers=[0, 2, 1] |
| rda_app   | alert-ingester                         | rda-alert-in | 5751f199 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 5751f199 |             | minio-connectivity                                  | ok       |                                                             |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.4.3.3 Upgrade RDAC cli

Run the below command to upgrade the rdac CLI

rdafk8s rdac_cli upgrade --tag 3.2.2.1
2.4.3.4 Upgrade RDA Worker Services

Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.

Tip

If the RDA worker is deployed in http proxy environment, add the required environment variables for http proxy settings in /opt/rdaf/deployment-scripts/values.yaml under rda_worker section. Below is the sample http proxy configuration for worker service.

rda_worker:
    mem_limit: 8G
    memswap_limit: 8G
    privileged: false
    environment:
      RDA_ENABLE_TRACES: 'no'
      RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
    extraEnvs:
      - name: http_proxy
        value: "http://user:password@192.168.122.107:3128"
      - name: https_proxy
        value: "http://user:password@192.168.122.107:3128"
      - name: HTTP_PROXY
        value: "http://user:password@192.168.122.107:3128"
      - name: HTTPS_PROXY
        value: "http://user:password@192.168.122.107:3128"
rdafk8s worker upgrade --tag 3.2.2.1

Step-2: Run the below command to check the status of the existing and newer worker PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating state.

(Note: Please wait if a POD is in ContainerCreating state until it is transistioned into Terminating state.)

kubectl get pods -n rda-fabric -l app_component=rda-worker

Step-3: Run the below command to put all Terminating RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance command that is required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-worker-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.

rdac pods --show_maintenance | grep False

Warning

Wait for 120 seconds before executing Step-6.

Step-6: Run the below command to delete the Terminating RDAF worker service PODs

for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Repeat above steps from Step-2 to Step-6 for rest of the RDAF Worker service PODs.

Please wait till all the new worker service pods are in Running state.

Step-7: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.

rdafk8s worker status
+------------+----------------+----------------+--------------+-------------+
| Name       | Host           | Status         | Container Id | Tag         |
+------------+----------------+----------------+--------------+-------------+
| rda-worker | 192.168.131.44 | Up 6 Hours ago | eb679ed8a6c6 | 3.2.2.1 |
| rda-worker | 192.168.131.45 | Up 6 Hours ago | a3356b168c50 | 3.2.2.1 |
|            |                |                |              |             |
+------------+----------------+----------------+--------------+-------------+
rdac pods | grep rda-worker

Step-8: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck
2.4.3.5 Upgrade OIA Application Services

Step-1: Run the below commands to initiate upgrading RDAF OIA Application services.

rdafk8s app upgrade OIA --tag 7.2.2.1

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each OIA application service is in Terminating state.

(Note: Please wait if a POD is in ContainerCreating state until it is transistioned into Terminating state.)

kubectl get pods -n rda-fabric -l app_name=oia

Step-3: Run the below command to put all Terminating OIA application service PODs into maintenance mode. It will list all of the POD Ids of OIA application services along with rdac maintenance command that are required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-oia-app-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the OIA application services.

rdac pods --show_maintenance | grep False

Warning

Wait for 120 seconds before executing Step-6.

Step-6: Run the below command to delete the Terminating OIA application service PODs

for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
kubectl get pods -n rda-fabric -l app_name=oia

Note

Repeat above steps from Step-2 to Step-6 for rest of OIA application service PODs.

Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.2.1 version.

rdafk8s app status
+-------------------------------+----------------+-----------------+--------------+-----------------+
| Name                          | Host           | Status          | Container Id | Tag             |
+-------------------------------+--------------+-----------------+--------------+-------------------+
| rda-alert-ingester            | 192.168.131.50 | Up 1 Days ago   | a400c11be238 | 7.2.2.1     |
| rda-alert-ingester            | 192.168.131.49 | Up 1 Days ago   | 5187d5a093a5 | 7.2.2.1     |
| rda-alert-processor           | 192.168.131.46 | Up 1 Days ago   | 34901aba5e7d | 7.2.2.1     |
| rda-alert-processor           | 192.168.131.47 | Up 1 Days ago   | e6fe0aa7ffe4 | 7.2.2.1     |
| rda-alert-processor-companion | 192.168.131.50 | Up 1 Days ago   | 8e3cc2f3b252 | 7.2.2.1     |
| rda-alert-processor-companion | 192.168.131.49 | Up 1 Days ago   | 4237fb52031c | 7.2.2.1     |
| rda-app-controller            | 192.168.131.47 | Up 1 Days ago   | fbe360d13fa3 | 7.2.2.1     |
| rda-app-controller            | 192.168.131.46 | Up 1 Days ago   | 8346f5c69e7b | 7.2.2.1     |
+-------------------------------+----------------+-----------------+--------------+-----------------+

Step-7: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods

Run the below command to check if all RDA worker services has ok status and does not throw any failure messages.

rdac healthcheck
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host           | ID       | Site        | Age      |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------|
| App   | cfxdimensions-app-collaboration        | True        | rda-collaborat | ba007878 |             | 22:57:58 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | rda-file-brows | bf349af7 |             | 23:00:54 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | rda-file-brows | 46c7c2dc |             | 22:52:17 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | rda-irm-servic | 34698062 |             | 23:00:23 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | rda-irm-servic | b824b35b | *leader*    | 22:50:33 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | rda-notificati | 73d2c7f9 |             | 23:01:23 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | rda-notificati | bac009ba |             | 22:59:05 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-resource-manager     | True        | rda-resource-m | 3e164b71 |             | 23:25:24 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-resource-manager     | True        | rda-resource-m | dba599c6 |             | 23:25:00 |      8 |        31.33 |               |              |
| App   | configuration-service                  | True        | rda-configurat | dd7ec9d9 |             | 5:46:22  |      8 |        31.33 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+

2.4.4 Post Installation Steps

  • Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action

  • Download the script from below path to migrate the UI-Icon URL from private to Public

Tip

This script run is an optional step to perform only if (1) white labeling customization was done on the Login page with an uploaded image before the version upgrade, (2) you are experiencing that the custom image is no longer showing up at the Login page after the upgrade.

wget https://macaw-amer.s3.amazonaws.com/releases/RDA/3.2.2/iconlib_migration_script.py
  • Copy the above script to rda_identity platform service container. Run the below command to get the container-id for rda_identity and the host IP on which it is running.
rdafk8s platform status
rdaf platform status
+--------------------------+--------------+------------+--------------+-------+
| Name                     | Host         | Status     | Container Id | Tag   |
+--------------------------+--------------+------------+--------------+-------+
| rda_api_server           | 192.168.107.40 | Up 7 hours | 6540e670fdb9 | 3.2.2.1 |
| rda_registry             | 192.168.107.40 | Up 7 hours | 98bead8d0599 | 3.2.2.1 |
....
| rda_identity             | 192.168.107.40 | Up 7 hours | 67940390d61f | 3.2.2.1 |
| rda_fsm                  | 192.168.107.40 | Up 7 hours | a96870b5f32f | 3.2.2.1 |
| cfx-rda-access-manager   | 192.168.107.40 | Up 7 hours | 33362030fe52 | 3.2.2.1 |
| cfx-rda-resource-manager | 192.168.107.40 | Up 7 hours | 54a082b754be | 3.2.2.1 |
| cfx-rda-user-preferences | 192.168.107.40 | Up 7 hours | 829f95d98741 | 3.2.2.1 |
+--------------------------+--------------+------------+--------------+-------+
  • Login to the host on which rda_identity service is running as rdauser using SSH CLI and run below command to copy the above downloaded script.
docker cp /home/rdauser/iconlib_migration_script.py <rda_identity_container_id>:/tmp
  • Run the below command to switch into rda_identity service's container shell.
docker exec -it <rda_identity_container_id> bash
  • Execute below command to migrate the customer branding (white labelling) changes.
python /tmp/iconlib_migration_script.py
  • In this new version (7.2.2.1), suppression policy added support to read the data from a pstream to suppress the alerts. As a pre-requisite for this feature to work, the pstream that is going to be used in a suppression policy, should be configured with attr_name and it's value using which it can filter the alerts to apply the suppression policy. Additionally, the attributes start_time_utc and end_time_utc should be in ISO datetime format.
{

  "attr_name": "ci_name"

}
  • This new version also added a new feature to enrich the incoming alerts using either dataset or pstream or both within each alert's source mapper configuration. Below is a sample configuration for a reference on how to use dataset_enrich and stream_enrich functions within the alert mapper.

Dataset based enrichment:

  • name: Dataset name
  • condition: CFXQL based condition which can be defined with one or more conditions with AND and OR between each condition. Each condition is evaluated in the specified order and it picks the enrichment value(s) for whichever condition matches.
  • enriched_columns: Specify one or more attributes to be selected as enriched attributes on above condition match. When no attribute is specified, it will pick all of the available attributes.
{
  "func": {
    "dataset_enrich": {
      "name": "nagios-host-group-members",
      "condition": "host_name is '$assetName'",
      "enriched_columns": "group_id,hostgroup_name"
    }
  }
}

Pstream based enrichment:

  • name: Pstream name
  • condition: CFXQL based condition which can be defined with one or more conditions with AND and OR between each condition. Each condition is evaluated in the specified order and it picks the enrichment value(s) for whichever condition matches.
  • enriched_columns: Specify one or more attributes to be selected as enriched attributes on above condition match. When no attribute is specified, it will pick all of the available attributes.
{
  "func": {
    "stream_enrich": {
      "name": "nagios-host-group-members",
      "condition": "host_name is '$assetName'",
      "enriched_columns": {
        "group_id": "stream_id",
        "hostgroup_name": "stream_hostgroup"
      }
    }
  }
}

2.5. Upgrade from 7.2.2.1 to 7.2.2.2

RDAF Platform: From 3.2.2.x to 3.2.2.2

OIA (AIOps) Application: From 7.2.2.x to 7.2.2.2

RDAF Deployment rdaf & rdafk8s CLI: From 1.1.9.x to 1.1.9.2

RDAF Client rdac CLI: From 3.2.2.x to 3.2.2.2

2.5.1. Prerequisites

Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.

  • RDAF Deployment CLI version: 1.1.9.1

  • Infra Services tag: 1.0.2, 1.0.2.1(nats)

  • Platform Services and RDA Worker tag: 3.2.2.1

  • OIA Application Services tag: 7.2.2.1

  • CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed

  • RDAF Deployment CLI version: 1.1.9

  • Infra Services tag: 1.0.2,1.0.2.1(nats)

  • Platform Services and RDA Worker tag: 3.2.2

  • OIA Application Services tag: 7.2.2

  • CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed

Warning

Make sure all of the above pre-requisites are met before proceeding with the upgrade process.

Warning

Kubernetes: Though Kubernetes based RDA Fabric deployment supports zero downtime upgrade, it is recommended to schedule a maintenance window for upgrading RDAF Platform and AIOps services to newer version.

Non-Kubernetes: Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Important

Please make sure full backup of the RDAF platform system is completed before performing the upgrade.

Kubernetes: Please run the below backup command to take the backup of application data.

rdafk8s backup --dest-dir <backup-dir>
Non-Kubernetes: Please run the below backup command to take the backup of application data.
rdaf backup --dest-dir <backup-dir>
Note: Please make sure this backup-dir is mounted across all infra,cli vms.

Run the below command on RDAF Management system and make sure the Kubernetes PODs are NOT in restarting mode (it is applicable to only Kubernetes environment)

kubectl get pods -n rda-fabric -l app_category=rdaf-infra
kubectl get pods -n rda-fabric -l app_category=rdaf-platform
kubectl get pods -n rda-fabric -l app_component=rda-worker 
kubectl get pods -n rda-fabric -l app_name=oia 

  • Verify that RDAF deployment rdaf cli version is 1.1.9 or rdafk8s cli version is 1.1.9.1 on the VM where CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployments.
rdaf --version
rdafk8s --version
  • On-premise docker registry service version is 1.0.2
docker ps | grep docker-registry
  • RDAF Infrastructure services version is 1.0.2 (rda-nats service version is 1.0.2.1 and rda-minio service version is RELEASE.2022-11-11T03-44-20Z)

Run the below command to get RDAF Infra services details

rdafk8s infra status
  • RDAF Platform services version is 3.2.2.1

Run the below command to get RDAF Platform services details

rdafk8s platform status
  • RDAF OIA Application services version is 7.2.2.1

Run the below command to get RDAF App services details

rdafk8s app status

Run the below command to get RDAF Infra services details

rdaf infra status
  • RDAF Platform services version is 3.2.2

Run the below command to get RDAF Platform services details

rdaf platform status
  • RDAF OIA Application services version is 7.2.2

Run the below command to get RDAF App services details

rdaf app status

RDAF Deployment CLI Upgrade:

Please follow the below given steps.

Note

Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.

Login into the VM where rdaf & rdafk8s deployment CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.

  • Download the RDAF Deployment CLI's newer version 1.1.9.2 bundle.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9.2/rdafcli-1.1.9.2.tar.gz
  • Upgrade the rdaf & rdafk8s CLI to version 1.1.9.2
pip install --user rdafcli-1.1.9.2.tar.gz
  • Verify the installed rdaf & rdafk8s CLI version is upgraded to 1.1.9.2
rdaf --version
rdafk8s --version
  • Download the RDAF Deployment CLI's newer version 1.1.9.2 bundle and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9.2/offline-rhel-1.1.9.2.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.1.9.2.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.1.9.2
  • Upgrade the rdafCLI to version 1.1.9.2
pip install --user rdafcli-1.1.9.2.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9.2/offline-ubuntu-1.1.9.2.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.1.9.2.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.1.9.2
  • Upgrade the rdafCLI to version 1.1.9.2
pip install --user rdafcli-1.1.9.2.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
  • Download the RDAF Deployment CLI's newer version 1.1.9.2 bundle
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9.2/rdafcli-1.1.9.2.tar.gz
  • Upgrade the rdaf CLI to version 1.1.9.2
pip install --user rdafcli-1.1.9.2.tar.gz
  • Verify the installed rdaf CLI version is upgraded to 1.1.9.2
rdaf --version
  • To stop application services, run the below command. Wait until all of the services are stopped.
rdaf app down OIA
rdaf app status
  • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
rdaf worker down
rdaf worker status
  • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
rdaf platform down
rdaf platform status
  • Download the RDAF Deployment CLI's newer version 1.1.9.2 bundle and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9.2/offline-rhel-1.1.9.2.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.1.9.2.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.1.9.2
  • Upgrade the rdafCLI to version 1.1.9.2
pip install --user rdafcli-1.1.9.2.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9.2/offline-ubuntu-1.1.9.2.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.1.9.2.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.1.9.2
  • Upgrade the rdafCLI to version 1.1.9.2
pip install --user rdafcli-1.1.9.2.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version

2.5.2. Download the new Docker Images

Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.

rdaf registry fetch --tag 1.0.2.1,3.2.2.2,3.2.2.3,7.2.2.2,7.2.2.3

Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.

rdaf registry list-tags 

Please make sure 1.0.2.1 image tag is downloaded for the below RDAF Infra services.

  • rda-platform-haproxy

Please make sure 3.2.2.2 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-registry
  • rda-rda-scheduler
  • rda-collector
  • rda-stack-mgr
  • rda-identity
  • rda-fsm
  • rda-access-manager
  • rda-resource-manager
  • rda-user-preferences
  • onprem-portal
  • onprem-portal-nginx
  • rda-worker-all
  • onprem-portal-dbinit
  • cfxdx-nb-nginx-all
  • rda-event-gateway
  • rdac
  • rdac-full

Please make sure 3.2.2.3 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-worker-all
  • cfxdx-nb-nginx-all

Please make sure 7.2.2.2 image tag is downloaded for the below RDAF OIA Application services.

  • rda-app-controller
  • rda-alert-processor
  • rda-file-browser
  • rda-smtp-server
  • rda-ingestion-tracker
  • rda-reports-registry
  • rda-ml-config
  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-alert-ingester
  • rda-collaboration
  • rda-notification-service
  • rda-configuration-service

Please make sure 7.2.2.3 image tag is downloaded for the below RDAF OIA Application services.

  • rda-irm-service

Downloaded Docker images are stored under the below path.

/opt/rdaf/data/docker/registry/v2

Run the below command to check the filesystem's disk usage on which docker images are stored.

df -h /opt

Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.

rdaf registry delete-images --tag <tag1,tag2>

2.5.3.Upgrade Steps

2.5.3.1 Upgrade RDAF Infra Services

Download the below python script (rdaf_upgrade_119_119_1_to_119_2.py)

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9.2/rdaf_upgrade_119_119_1_to_119_2.py

Please run the downloaded python script. It generates a new values.yaml.latest with new environment variables for HAProxy infrastructure service and rda-portal (front-end) platform service.

These environment variables need to be configured with appropriate values when CFX RDA Fabric portal need to be integrated and cross launched from 3rd party end user UI portal.

Note

Below mentioned environment variables are mandatory, however, their values can be left empty if integration with 3rd party external UI portal is not required.

  • HAProxy environment variables

EXTERNAL_PORTAL_URL: 3rd party UI portal url (ex: https://external-portal.acme.com) CFX_IP_ADDRESS: RDA Fabric platform's load balancer's virtual IP address (when configured in HA) or load balancer's IP address to access the UI portal.

  • rda-protal (front-end) environment variable

CFX_URL_PREFIX: Specify custom base URI string which can be used within the 3rd party end user UI portal to redirect the requests to RDA Fabric platform.

Please run the downloaded python upgrade script.

python rdaf_upgrade_119_119_1_to_119_2.py

Once the script is executed it will create /opt/rdaf/deployment-scripts/values.yaml.latest file.

Note

Please take a backup of /opt/rdaf/deployment-scripts/values.yaml file.

cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup

Edit /opt/rdaf/deployment-scripts/values.yaml and apply the below changes for haproxy and rda_portal services

vi /opt/rdaf/deployment-scripts/values.yaml

Under haproxy service configuration, set the environment variable EXTERNAL_PORTAL_URL with external portal URL. Note: https://external-portal.acme.com is used for a reference only. Also, set the environment variable CFX_IP_ADDRESS with RDA Fabric load-balancer's IP address (non-HA configuration) or virtual IP address when configured in HA.

haproxy:
  mem_limit: 2G
  memswap_limit: 2G
  environment:
    EXTERNAL_PORTAL_URL: "https://external-portal.acme.com"
    CFX_IP_ADDRESS: "<rda-fabric-ui-portal-ip>"

Under rda_portal service configuration, set the environment variable CFX_URL_PREFIX with customer URI string as shown below. Note: aiops is used for a reference only. When configured, all requests which hits https://external-portal.acme.com/aiops URI path on 3rd party UI portal, requests are forwarded to RDA Fabric platform and vice-versa.

rda_portal:
  ...
  ...
    portal_frontend:
      resources:
        requests:
          memory: 100Mi
      limits:
          memory: 2Gi
      env:
        CFX_URL_PREFIX: "aiops"

Configure the environment variables with empty values when 3rd party external portal integration is NOT needed.

haproxy:
  mem_limit: 2G
  memswap_limit: 2G
  environment:
    EXTERNAL_PORTAL_URL: ""
    CFX_IP_ADDRESS: ""
rda_portal:
  ...
  ...
    portal_frontend:
      resources:
        requests:
          memory: 100Mi
      limits:
          memory: 2Gi
      env:
        CFX_URL_PREFIX: ""
  • Upgrade HAProxy service using below command

    rdafk8s infra upgrade --tag 1.0.2.1 --service haproxy
    

    Run the below RDAF command to check infra status

    rdafk8s infra status
    
    +----------------+----------------+-------------+--------------+---------+
    | Name           | Host           | Status      | Container Id | Tag     |
    +----------------+----------------+-------------+--------------+---------+
    | haproxy        | 192.168.131.41 | Up 25 hours | 21ce252eec14 | 1.0.2.1 |
    | haproxy        | 192.168.131.42 | Up 25 hours | 329a6aa40e40 | 1.0.2.1 |
    | keepalived     | 192.168.131.41 | active      | N/A          | N/A     |
    | keepalived     | 192.168.131.42 | active      | N/A          | N/A     |
    | nats           | 192.168.131.41 | Up 2 months | 7b7a15f7d742 | 1.0.2.1 |
    | nats           | 192.168.131.42 | Up 2 months | a92cd1df2cbf | 1.0.2.1 |
    +----------------+----------------+-----------------+--------------+-----+
    

Before initiating the upgrade steps, RDA Fabric platform, worker and application services need to be stopped.

  • To stop OIA application services, run the below command. Wait until all of the services are stopped.
rdaf app down OIA
rdaf app status
  • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
rdaf worker down
rdaf worker status
  • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
rdaf platform down
rdaf platform status
  • Upgrade HAProxy using below command
rdaf infra upgrade --tag 1.0.2.1 --service haproxy

Run the below RDAF command to check infra status

rdaf infra status
+----------------+----------------+-------------+--------------+---------+
| Name           | Host           | Status      | Container Id | Tag     |
+----------------+----------------+-------------+--------------+---------+
| haproxy        | 192.168.107.63 | Up 25 hours | 21ce252eec14 | 1.0.2.1 |
| haproxy        | 192.168.107.64 | Up 25 hours | 329a6aa40e40 | 1.0.2.1 |
| keepalived     | 192.168.107.63 | active      | N/A          | N/A     |
| keepalived     | 192.168.107.64 | active      | N/A          | N/A     |
| nats           | 192.168.107.63 | Up 2 months | 7b7a15f7d742 | 1.0.2.1 |
| nats           | 192.168.107.64 | Up 2 months | a92cd1df2cbf | 1.0.2.1 |
+----------------+----------------+-----------------+--------------+-----+

Run the below RDAF command to check infra healthcheck status

rdaf infra healthcheck
+----------------+-----------------+--------+----------------------+--------------+-----------------+
| Name           | Check           | Status | Reason               | Host          | Container Id   |
+----------------+-----------------+--------+----------------------+--------------+-----------------+
| haproxy        | Port Connection | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6  |
| haproxy        | Service Status  | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6  |
| haproxy        | Firewall Port   | OK     | N/A                  | 192.168.107.63 | ed0e8a4f95d6  |
| haproxy        | Port Connection | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58  |
| haproxy        | Service Status  | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58  |
| haproxy        | Firewall Port   | OK     | N/A                  | 192.168.107.64 | 91c361ea0f58  |
| keepalived     | Service Status  | OK     | N/A                  | 192.168.107.63 | N/A           |
| keepalived     | Service Status  | OK     | N/A                  | 192.168.107.64 | N/A           |
| nats           | Port Connection | OK     | N/A                  | 192.168.107.63 | f57ed825681b  |
| nats           | Service Status  | OK     | N/A                  | 192.168.107.63 | f57ed825681b  |
+----------------+-----------------+--------+----------------------+--------------+-----------------+
2.5.3.2 Upgrade RDAF Platform Services

Step-1: Run the below command to initiate upgrading RDAF Platform services.

rdafk8s platform upgrade --tag 3.2.2.2

As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state.

kubectl get pods -n rda-fabric -l app_category=rdaf-platform

Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.

python maint_command.py

Note

If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.6/maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF platform service PODs

for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.

Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.2.2 version.

rdafk8s platform status
+----------------------+----------------+----------------+--------------+---------+
| Name                 | Host           | Status         | Container Id | Tag     |
+----------------------+----------------+----------------+--------------+---------+
| rda-api-server       | 192.168.131.44 | Up 1 Hours ago | f97c1658a0b7 | 3.2.2.2 |
| rda-api-server       | 192.168.131.44 | Up 1 Days ago  | 99cc29596560 | 3.2.2.2 |
| rda-registry         | 192.168.131.44 | Up 1 Days ago  | ee2d72396575 | 3.2.2.2 |
| rda-registry         | 192.168.131.44 | Up 2 Hours ago | 95c36fc91800 | 3.2.2.2 |
| rda-identity         | 192.168.131.44 | Up 1 Days ago  | 3d6aeb4c6c53 | 3.2.2.2 |
| rda-identity         | 192.168.131.44 | Up 2 Hours ago | 9303f3d0e7ed | 3.2.2.2 |
| rda-fsm              | 192.168.131.44 | Up 2 Hours ago | 342cbfe89b78 | 3.2.2.2 |
| rda-fsm              | 192.168.131.44 | Up 1 Days ago  | 5e77c12fc920 | 3.2.2.2 |
| rda-access-manager   | 192.168.131.44 | Up 2 Hours ago | b218a44f022c | 3.2.2.2 |
| rda-access-manager   | 192.168.131.44 | Up 1 Days ago  | 70ed48e783b9 | 3.2.2.2 |
+----------------------+--------------+----------------+--------------+-----------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods
+-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host         | ID       | Site        | Age             |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| Infra | api-server                             | True        | 40d242cf70f5 | 6f7ecfe2 |             | 2 days, 7:40:27 |      8 |        31.33 |               |              |
| Infra | api-server                             | True        | 9145166d798b | 6114b271 |             | 2 days, 7:40:52 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | a450b3da5188 | 1a86bf07 |             | 2 days, 7:39:59 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | 82ccb77d84e7 | 46c83c44 |             | 2 days, 7:39:44 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | c93e2eff7c37 | 30ad85d6 |             | 2 days, 7:40:32 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | 44d01548a49c | 0bb96897 |             | 2 days, 7:40:26 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | 159d453aad50 | 2cb4831c | *leader*    | 2 days, 7:40:20 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | 0682962441e4 | d6b1fb3b |             | 2 days, 7:40:12 |      8 |        31.33 |               |              |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

Warning

For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Run the below command to initiate upgrading RDAF Platform services.

rdaf platform upgrade --tag 3.2.2.2

Please wait till all of the new platform service are in Up state and run the below command to verify their status and make sure all of them are running with 3.2.2.2 version.

rdaf platform status
+--------------------------+----------------+---------------+--------------+---------+
| Name                     | Host           | Status        | Container Id | Tag     |
+--------------------------+----------------+---------------+--------------+---------+
| rda_api_server           | 192.168.107.61 | Up 58 minutes | 9145166d798b | 3.2.2.2 |
| rda_api_server           | 192.168.107.62 | Up 57 minutes | 40d242cf70f5 | 3.2.2.2 |
| rda_registry             | 192.168.107.61 | Up 57 minutes | c93e2eff7c37 | 3.2.2.2 |
| rda_registry             | 192.168.107.62 | Up 57 minutes | 44d01548a49c | 3.2.2.2 |
| rda_scheduler            | 192.168.107.61 | Up 57 minutes | 159d453aad50 | 3.2.2.2 |
| rda_scheduler            | 192.168.107.62 | Up 57 minutes | 0682962441e4 | 3.2.2.2 |
| rda_collector            | 192.168.107.61 | Up 56 minutes | a450b3da5188 | 3.2.2.2 |
| rda_collector            | 192.168.107.62 | Up 56 minutes | 82ccb77d84e7 | 3.2.2.2 |
+--------------------------+--------------+---------------+--------------+-----------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 9a0775246a0f | 8f538695 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 79d6756db639 | 95921403 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=1, Brokers=[2, 3, 1] |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.5.3.3 Upgrade rdac CLI

Run the below command to upgrade the rdac CLI

rdafk8s rdac_cli upgrade --tag 3.2.2.2

Run the below command to upgrade the rdac CLI

rdaf rdac_cli upgrade --tag 3.2.2.2
2.5.3.4 Upgrade RDA Worker Services

Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.

rdafk8s worker upgrade --tag 3.2.2.3

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating state.

kubectl get pods -n rda-fabric -l app_component=rda-worker

Step-3: Run the below command to put all Terminating RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance command that is required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF worker service PODs

for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds between each RDAF worker service upgrade by repeating above steps from Step-2 to Step-6 for rest of the RDAF worker service PODs.

Step-7: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.

rdac pods | grep rda-worker
rdafk8s worker status
+------------+----------------+---------------+--------------+-----------+
| Name       | Host           | Status        | Container Id | Tag       |
+------------+----------------+---------------+--------------+-----------+
| rda-worker | 192.168.131.49 | Up 2 Days ago | 7f5cc2a6ff82 | 3.2.2.3   |
| rda-worker | 192.168.131.50 | Up 2 Days ago | 17e06d02128d | 3.2.2.3   |
+------------+----------------+---------------+--------------+-----------+

Step-8: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck
  • Upgrade RDA Worker Services

Please run the below command to initiate upgrading the RDA Worker service PODs.

rdaf worker upgrade --tag 3.2.2.3

Please wait for 120 seconds to let the newer version of RDA Worker service containers join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service containers.

rdac pods | grep worker
rdaf worker status

+------------+----------------+------------+--------------+---------+
| Name       | Host           | Status     | Container Id | Tag     |
+------------+----------------+------------+--------------+---------+
| rda_worker | 192.168.107.61 | Up 2 hours | aa8319a88bc1 | 3.2.2.3 |
| rda_worker | 192.168.107.62 | Up 2 hours | 56e78986283f | 3.2.2.3 |
+------------+----------------+------------+--------------+---------+
Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck
2.5.3.5 Upgrade OIA Application Services

Step-1: Run the below commands to initiate upgrading RDAF OIA Application services

rdafk8s app upgrade OIA --tag 7.2.2.2

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each OIA application service is in Terminating state.

kubectl get pods -n rda-fabric -l app_name=oia

Step-3: Run the below command to put all Terminating OIA application service PODs into maintenance mode. It will list all of the POD Ids of OIA application services along with rdac maintenance command that are required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-oia-app-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the OIA application services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating OIA application service PODs

for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
kubectl get pods -n rda-fabric -l app_name=oia

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the OIA application service PODs.

Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.2.2 version.

rdafk8s app status
+-------------------------------+----------------+-----------------+--------------+-----------+
| Name                          | Host           | Status          | Container Id | Tag       |
+-------------------------------+----------------+-----------------+--------------+-----------+
| rda-alert-ingester            | 192.168.131.46 | Up 1 Days ago   | f546428c2a1a | 7.2.2.2   |
| rda-alert-ingester            | 192.168.131.46 | Up 1 Days ago   | 88a68aa40a9a | 7.2.2.2   |
| rda-alert-processor           | 192.168.131.46 | Up 1 Days ago   | 5d958ce95d4c | 7.2.2.2   |
| rda-alert-processor           | 192.168.131.46 | Up 1 Days ago   | cddbfed7dbba | 7.2.2.2   |
| rda-alert-processor-companion | 192.168.131.46 | Up 1 Days ago   | 127cd9e895a1 | 7.2.2.2   |
| rda-alert-processor-companion | 192.168.131.46 | Up 1 Days ago   | 1ac3ae88d16f | 7.2.2.2   |
| rda-app-controller            | 192.168.131.46 | Up 1 Days ago   | cf7d126099a6 | 7.2.2.2   |
| rda-app-controller            | 192.168.131.46 | Up 1 Days ago   | fcd5bb29c429 | 7.2.2.2   |
| rda-collaboration             | 192.168.131.46 | Up 1 Days ago   | 9c3243fb3094 | 7.2.2.2   |
+-------------------------------+----------------+-----------------+--------------+-----------+

Step-7: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host           | ID       | Site        | Age            |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------|
| App   | cfxdimensions-app-access-manager       | True        | 3a164c761ac7 | 6f02493c |             | 2 days, 7:38:22 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | d56b629c2c3b | e5ff5696 |             | 2 days, 7:38:05 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 8aafda236efe | 126203ec |             | 2 days, 7:11:18 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 3ea382fdc6af | 618a650b |             | 2 days, 7:10:58 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | d6f0d127ab06 | deb9c0c4 |             | 2 days, 7:17:45 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | 2b9851b95094 | 013f5b00 |             | 2 days, 7:17:25 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | 8361c0008d18 | a9fe343e | *leader*    | 2 days, 7:12:36 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | ca8a2cbdca81 | 8f497bb7 |             | 2 days, 7:12:14 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | dfbbcdcddafc | 8d0425ec |             | 2 days, 7:18:24 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | 753472f0a9be | 485800b5 |             | 2 days, 7:18:06 |      8 |        31.33 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | rda-alert-in | 47518623 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 47518623 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 47518623 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | rda-alert-in | 47518623 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 47518623 |             | kafka-connectivity                                  | ok       | Cluster=nzyeX9qkR-ChWXC0fRvSyQ, Broker=1, Brokers=[0, 2, 1] |
| rda_app   | alert-ingester                         | rda-alert-in | 82bcaa7c |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 82bcaa7c |             | minio-connectivity                                  | ok       |                          
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+

Upgrade rda-irm-service to 7.2.2.3:

Step-1: Run the below commands to initiate upgrading the rda-irm-service service

rdafk8s app upgrade OIA --tag 7.2.2.3 --service rda-irm-service

Step-2: Run the below command to check the status of the existing rda-irm-service PODs and make sure atleast one instance of rda-irm-service service is in Terminating state.

kubectl get pods -n rda-fabric -l app_name=oia | grep irm

Step-3: Run the below command to put rda-irm-service that is in Terminating state into maintenance mode. It will list all of the rda-irm-service POD services along with rdac maintenance command that are required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-irm-service-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the rda-irm-service

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating rda-irm-service service PODs

for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
kubectl get pods -n rda-fabric -l app_name=oia

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the rda-irm-service service PODs.

Please wait till all of the new rda-irm-service service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.2.3 version.

rdafk8s app status
+-------------------------------+----------------+-----------------+--------------+-----------+
| Name                          | Host           | Status          | Container Id | Tag       |
+-------------------------------+----------------+-----------------+--------------+-----------+
| rda-irm-service               | 192.168.131.46 | Up 1 Days ago   | f546428c2a1a | 7.2.2.3   |
| rda-irm-service               | 192.168.131.46 | Up 1 Days ago   | 88a68aa40a9a | 7.2.2.3   |
| rda-alert-processor           | 192.168.131.46 | Up 1 Days ago   | 5d958ce95d4c | 7.2.2.2   |
| rda-alert-processor           | 192.168.131.46 | Up 1 Days ago   | cddbfed7dbba | 7.2.2.2   |
+-------------------------------+----------------+-----------------+--------------+-----------+

Step-7: Run the below command to verify all rda-irm-service PODs are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host           | ID       | Site        | Age            |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------|
| App   | cfxdimensions-app-access-manager       | True        | 3a164c761ac7 | 6f02493c |             | 2 days, 7:38:22 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | d56b629c2c3b | e5ff5696 |             | 2 days, 7:38:05 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 8aafda236efe | 126203ec |             | 2 days, 7:11:18 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 3ea382fdc6af | 618a650b |             | 2 days, 7:10:58 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | d6f0d127ab06 | deb9c0c4 |             | 2 days, 7:17:45 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | 2b9851b95094 | 013f5b00 |             | 2 days, 7:17:25 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | 8361c0008d18 | a9fe343e | *leader*    | 2 days, 7:12:36 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | ca8a2cbdca81 | 8f497bb7 |             | 2 days, 7:12:14 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | dfbbcdcddafc | 8d0425ec |             | 2 days, 7:18:24 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | 753472f0a9be | 485800b5 |             | 2 days, 7:18:06 |      8 |        31.33 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

Run the below commands to initiate upgrading the RDA Fabric OIA Application services.

rdaf app upgrade OIA --tag 7.2.2.2

Please wait till all of the new OIA application service containers are in Up state and run the below command to verify their status and make sure they are running with 7.2.2.2 version.

rdaf app status
+-------------------------------+----------------+-----------------+--------------+-----------+
| Name                          | Host           | Status          | Container Id | Tag       |
+-------------------------------+----------------+-----------------+--------------+-----------+
| rda-alert-ingester            | 192.168.131.46 | Up 1 Days ago   | f546428c2a1a | 7.2.2.2   |
| rda-alert-ingester            | 192.168.131.46 | Up 1 Days ago   | 88a68aa40a9a | 7.2.2.2   |
| rda-alert-processor           | 192.168.131.46 | Up 1 Days ago   | 5d958ce95d4c | 7.2.2.2   |
| rda-alert-processor           | 192.168.131.46 | Up 1 Days ago   | cddbfed7dbba | 7.2.2.2   |
| rda-alert-processor-companion | 192.168.131.46 | Up 1 Days ago   | 127cd9e895a1 | 7.2.2.2   |
| rda-alert-processor-companion | 192.168.131.46 | Up 1 Days ago   | 1ac3ae88d16f | 7.2.2.2   |
| rda-app-controller            | 192.168.131.46 | Up 1 Days ago   | cf7d126099a6 | 7.2.2.2   |
| rda-app-controller            | 192.168.131.46 | Up 1 Days ago   | fcd5bb29c429 | 7.2.2.2   |
| rda-collaboration             | 192.168.131.46 | Up 1 Days ago   | 9c3243fb3094 | 7.2.2.2   |
+-------------------------------+----------------+-----------------+--------------+-----------+

Upgrade rda-irm-service to 7.2.2.3:

Run the below commands to initiate upgrading the rda-irm-service service to 7.2.2.3 version.

rdaf app upgrade OIA --tag 7.2.2.3 --service rda-irm-service

Please wait till all of the rda-irm-service containers are in Up state and run the below command to verify their status and make sure they are running with 7.2.2.3 version.

rdaf app status

2.5.4.Post Upgrade Steps

2.5.4.1 OIA

1. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action

2. Enable ML experiments manually if any experiments are configured Organization --> Configuration --> ML Experiments

3. By default resizableColumns: false for alerts and incidents tabular report. If you want to resizable for alerts and incidents tabular report then make it true. Go to user Configuration -> RDA Administration -> User Dashboards then search below Dashboard

a) oia-alert-group-view-alerts-os

b) oia-alert-group-view-details-os

c) oia-alert-groups-os

d) oia-alert-tracking-os

e) oia-alerts-os

f) oia-event-tracking-os

g) oia-event-tracking-view-alerts

h) oia-incident-alerts-os

i) oia-view-alerts-policy

j) oia-view-groups-policy

k) incident-collaboration

l) oia-incidents-os-template

m) oia-incidents-os

n) oia-incidents

o) oia-my-incidents

Images_Resizable_Columns

2.5.4.2 DNAC

1. Make sure Prime credentials are added under ConfigurationRDA IntegrationsCredentials

Note

Make sure credential names matches with bot names specified below in Point No. 4

2. Deploy latest dna_center_bundle bundle from ConfigurationRDA IntegrationsBundles → Click deploy action row level for dna_center_bundle.

3. Run dnac_create_pstreams pipeline from ConfigurationRDA IntegrationsPipelinesPublished Pipelines and search for dnac_create_pstreams and Click on Run in the action menu.

4. In the same Published Pipelines search for prime_clients_report and click on Edit Pipeline in Plain Text and uncomment the lines as shown below and change the version of pipeline and also check box publish pipeline and Click on Save

Images_Prime_Reports_Pipeline

5. Download the latest DNAC template from below link to platform VM (where rdac is installed) and execute the command given below

wget https://macaw-amer.s3.amazonaws.com/test/dynamic_dnac_template.html
rdac object add --name "dynamic_dnac_template.html" --folder widget_labels --file /tmp/dynamic_dnac_template.html

2.6. Upgrade from 7.2.2.2 to 7.3

RDAF Platform: From 3.2.2.2 to 3.3

OIA (AIOps) Application: From 7.2.2.2 to 7.3

RDAF Deployment rdaf & rdafk8s CLI: From 1.1.9.2 to 1.1.10

RDAF Client rdac CLI: From 3.2.2.2 to 3.3

2.6.1. Prerequisites

Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.

  • RDAF Deployment CLI version: 1.1.9.2

  • Infra Services tag: 1.0.2,1.0.2.1(nats, haproxy)

  • Platform Services and RDA Worker tag: 3.2.2.2/3.2.2.3

  • OIA Application Services tag: 7.2.2.2

  • CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed

  • RDAF Deployment CLI version: 1.1.9.2

  • Infra Services tag: 1.0.2,1.0.2.1(nats, haproxy)

  • Platform Services and RDA Worker tag: 3.2.2.2/3.2.2.3

  • OIA Application Services tag: 7.2.2.2

  • CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed

For FSM Pre-Upgrade & Post-Upgrade steps Click Here

Warning

Make sure all of the above pre-requisites are met before proceeding with the upgrade process.

Warning

Kubernetes: Though Kubernetes based RDA Fabric deployment supports zero downtime upgrade, it is recommended to schedule a maintenance window for upgrading RDAF Platform and AIOps services to newer version.

Non-Kubernetes: Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Important

Please make sure full backup of the RDAF platform system is completed before performing the upgrade.

Kubernetes: Please run the below backup command to take the backup of application data.

rdafk8s backup --dest-dir <backup-dir>
Non-Kubernetes: Please run the below backup command to take the backup of application data.
rdaf backup --dest-dir <backup-dir>
Note: Please make sure this backup-dir is mounted across all infra,cli vms.

Run the below command on RDAF Management system and make sure the Kubernetes PODs are NOT in restarting mode (it is applicable to only Kubernetes environment)

kubectl get pods -n rda-fabric -l app_category=rdaf-infra
kubectl get pods -n rda-fabric -l app_category=rdaf-platform
kubectl get pods -n rda-fabric -l app_component=rda-worker 
kubectl get pods -n rda-fabric -l app_name=oia 

  • Verify that RDAF deployment rdaf cli version is 1.1.9.2 or rdafk8s cli version is 1.1.9.2 on the VM where CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployments.
rdaf --version
rdafk8s --version
  • On-premise docker registry service version is 1.0.2
docker ps | grep docker-registry
  • RDAF Infrastructure services version is 1.0.2 (rda-nats service version is 1.0.2.1 and rda-minio service version is RELEASE.2022-11-11T03-44-20Z)

Run the below command to get RDAF Infra services details

rdafk8s infra status
  • RDAF Platform services version is 3.2.2.2 / 3.2.2.3

Run the below command to get RDAF Platform services details

rdafk8s platform status
  • RDAF OIA Application services version is 7.2.2.2

Run the below command to get RDAF App services details

rdafk8s app status

Run the below command to get RDAF Infra services details

rdaf infra status
  • RDAF Platform services version is 3.2.2.2 / 3.2.2.3

Run the below command to get RDAF Platform services details

rdaf platform status
  • RDAF OIA Application services version is 7.2.2.2

Run the below command to get RDAF App services details

rdaf app status

RDAF Deployment CLI Upgrade:

Please follow the below given steps.

Note

Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.

Login into the VM where rdaf & rdafk8s deployment CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.

  • Download the RDAF Deployment CLI's newer version 1.1.10 bundle.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.10/rdafcli-1.1.10.tar.gz
  • Upgrade the rdaf & rdafk8s CLI to version 1.1.10
pip install --user rdafcli-1.1.10.tar.gz
  • Verify the installed rdaf & rdafk8s CLI version is upgraded to 1.1.10
rdaf --version
rdafk8s --version
  • Download the RDAF Deployment CLI's newer version 1.1.10 bundle and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.10/offline-rhel-1.1.10.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.1.10.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.1.10
  • Upgrade the rdafCLI to version 1.1.10
pip install --user rdafcli-1.1.10.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.10/offline-ubuntu-1.1.10.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.1.10.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.1.10
  • Upgrade the rdafCLI to version 1.1.10
pip install --user rdafcli-1.1.10.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
  • Download the RDAF Deployment CLI's newer version 1.1.10 bundle
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.10/rdafcli-1.1.10.tar.gz
  • Upgrade the rdaf CLI to version 1.1.10
pip install --user rdafcli-1.1.10.tar.gz
  • Verify the installed rdaf CLI version is upgraded to 1.1.10
rdaf --version
  • To stop application services, run the below command. Wait until all of the services are stopped.
rdaf app down OIA
rdaf app status
  • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
rdaf worker down
rdaf worker status
  • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
rdaf platform down
rdaf platform status
  • Download the RDAF Deployment CLI's newer version 1.1.10 bundle and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.10/offline-rhel-1.1.10.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.1.10.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.1.10
  • Upgrade the rdafCLI to version 1.1.10
pip install --user rdafcli-1.1.10.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.10/offline-ubuntu-1.1.10.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.1.10.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.1.10
  • Upgrade the rdafCLI to version 1.1.10
pip install --user rdafcli-1.1.10.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version

2.6.2. Download the new Docker Images

Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.

rdaf registry fetch --tag 3.3,7.3,7.3.0.1,7.3.2

Note

Neo4j graphdb service is optional, please skip this step if this service is not needed.

rdaf registry fetch --neo4j-tag 5.11.0

Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.

rdaf registry list-tags 

Please make sure 3.3 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-registry
  • rda-rda-scheduler
  • rda-collector
  • rda-identity
  • rda-fsm
  • rda-access-manager
  • rda-resource-manager
  • rda-user-preferences
  • onprem-portal
  • onprem-portal-nginx
  • rda-worker-all
  • onprem-portal-dbinit
  • cfxdx-nb-nginx-all
  • rda-event-gateway
  • rda-chat-helper
  • rdac
  • rdac-full

Please make sure 7.3 image tag is downloaded for the below RDAF OIA Application services.

  • rda-app-controller
  • rda-alert-processor
  • rda-file-browser
  • rda-smtp-server
  • rda-ingestion-tracker
  • rda-reports-registry
  • rda-ml-config
  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-alert-ingester
  • rda-collaboration
  • rda-notification-service
  • rda-configuration-service
  • rda-irm-service
  • rda-alert-processor-companion

Please make sure 7.3.0.1 image tag is downloaded for the below RDAF OIA Application services.

  • rda-event-consumer

Please make sure 7.3.2 image tag is downloaded for the below RDAF OIA Application services.

  • rda-alert-ingester

Downloaded Docker images are stored under the below path.

/opt/rdaf/data/docker/registry/v2

Run the below command to check the filesystem's disk usage on which docker images are stored.

df -h /opt

Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.

rdaf registry delete-images --tag <tag1,tag2>

2.6.3.Upgrade Steps

2.6.3.1 Upgrade RDAF Infra Services

RDA Fabric platform has introduced supporting GraphDB service in 3.3 release. It is an optional service and it can be skipped during the upgrade process.

Download the python script (rdaf_upgrade_1192_1110_without_graphdb.py) if GraphDB service is NOT going to be installed.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.10/rdaf_upgrade_1192_1110_without_graphdb.py

Please run the downloaded python upgrade script.

rdaf_upgrade_1192_1110_without_graphdb.py

It generates a new values.yaml.latest with new environment variables for rda_scheduler infrastructure service.

Tip

Please skip the below step if GraphDB service is NOT going to be installed.

Warning

For installing neo4j GraphDB service, please add additional disk to RDA Fabric Infrastructure VM. Clicking Here

It is a pre-requisite and this step need to be completed before installing the neo4j GraphDB service.

Download the python script (rdaf_upgrade_1192_1110.py) if GraphDB service is going to be installed.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.10/rdaf_upgrade_1192_1110.py

Please run the downloaded python upgrade script.

rdaf_upgrade_1192_1110.py

It generates a new values.yaml.latest with new environment variables for rda_scheduler infrastructure service and /opt/rdaf/config/network_config/config.json file appended with neo4j GraphDB infra service

Once the above python script (with or with-out GraphDB configuration) is executed it will create /opt/rdaf/deployment-scripts/values.yaml.latest file.

Note

Please take a backup of /opt/rdaf/deployment-scripts/values.yaml file.

cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup

Edit /opt/rdaf/deployment-scripts/values.yaml and apply the below changes for rda_scheduler service.

vi /opt/rdaf/deployment-scripts/values.yaml

Under rda_scheduler service configuration, set the below environment variables

Note

When integrating CFX RDA Fabric portal with GitHub, configure the following environment variables with appropriate values. However, these variables can be left empty if integration with GitHub is NOT required.

RDA_GIT_ACCESS_TOKEN: ''
RDA_GIT_URL: ''
RDA_GITHUB_ORG: ''
RDA_GITHUB_REPO: ''
RDA_GITHUB_BRANCH_PREFIX: ''

Note

For reference, please see the configuration of the rda_scheduler service mentioned below.

rda_scheduler:
    mem_limit: 2G
    memswap_limit: 2G
    privileged: false
    environment:
      RDA_GIT_ACCESS_TOKEN: "ghp_cU3sDYe5yeARJrJaLflJLUBFdybDWY3KaKjV"
      RDA_GIT_URL: "https://api.github.com"
      RDA_GITHUB_ORG: "Organization Name"
      RDA_GITHUB_REPO: "test-playground"
      RDA_GITHUB_BRANCH_PREFIX: "main"
      RDA_ENABLE_TRACES: "no"
      DISABLE_REMOTE_LOGGING_CONTROL: "no"
      RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3

Tip

  • Please skip the below step of installing neo4j GraphDB service if it is not needed.
rdafk8s infra install --tag 1.0.2 --service neo4j
  • Please use the below mentioned command and wait till all of the neo4j pods are in Running state.
kubectl get pods -n rda-fabric -l app_category=rdaf-infra

Run the below RDAF command to check infra status

rdafk8s infra status
+----------------+----------------+-------------+--------------+---------+
| Name           | Host           | Status      | Container Id | Tag     |
+----------------+----------------+-------------+--------------+---------+
| haproxy        | 192.168.131.41 | Up 25 hours | 21ce252eec14 | 1.0.2.1 |
| haproxy        | 192.168.131.42 | Up 25 hours | 329a6aa40e40 | 1.0.2.1 |
| keepalived     | 192.168.131.41 | active      | N/A          | N/A     |
| keepalived     | 192.168.131.42 | active      | N/A          | N/A     |
| nats           | 192.168.131.41 | Up 2 months | 7b7a15f7d742 | 1.0.2.1 |
| nats           | 192.168.131.42 | Up 2 months | a92cd1df2cbf | 1.0.2.1 |
| rda-neo4j      | 192.168.109.65 | Up 23 Hours | 7e533c138867 | 5.11.0  |
+----------------+----------------+-----------------+--------------+-----+

Tip

  • Please skip the below step of installing neo4j GraphDB service if it is not needed.
  • Install neo4j service using below command
rdaf infra install --tag 5.11.0 --service neo4j

Run the below RDAF command to check infra status

rdaf infra status
+----------------+----------------+-------------+--------------+---------+
| Name           | Host           | Status      | Container Id | Tag     |
+----------------+----------------+-------------+--------------+---------+
| haproxy        | 192.168.107.63 | Up 25 hours | 21ce252eec14 | 1.0.2.1 |
| haproxy        | 192.168.107.64 | Up 25 hours | 329a6aa40e40 | 1.0.2.1 |
| keepalived     | 192.168.107.63 | active      | N/A          | N/A     |
| keepalived     | 192.168.107.64 | active      | N/A          | N/A     |
| nats           | 192.168.107.63 | Up 2 months | 7b7a15f7d742 | 1.0.2.1 |
| nats           | 192.168.107.64 | Up 2 months | a92cd1df2cbf | 1.0.2.1 |
| neo4j          | 192.168.107.63 | Up 42 hours | ee7e26cecb82 | 5.11.0  |  
+----------------+----------------+-------------+--------------+---------+

Run the below RDAF command to check infra healthcheck status

rdaf infra healthcheck
+----------------+-----------------+--------+-----------------+----------------+--------------+
| Name           | Check           | Status | Reason          | Host           | Container Id |
+----------------+-----------------+--------+-----------------+----------------+--------------+
| haproxy        | Port Connection | OK     | N/A             | 192.168.107.63 | 21ce252eec14 |
| haproxy        | Service Status  | OK     | N/A             | 192.168.107.63 | 21ce252eec14 |
| haproxy        | Firewall Port   | OK     | N/A             | 192.168.107.64 | 329a6aa40e40 |
| keepalived     | Service Status  | OK     | N/A             | 192.168.107.63 | N/A          |
| keepalived     | Service Status  | OK     | N/A             | 192.168.107.64 | N/A          |
| nats           | Port Connection | OK     | N/A             | 192.168.107.63 | 7b7a15f7d742 |
| nats           | Service Status  | OK     | N/A             | 192.168.107.63 | 7b7a15f7d742 |
| nats           | Firewall Port   | OK     | N/A             | 192.168.107.64 | a92cd1df2cbf |
| minio          | Port Connection | OK     | N/A             | 192.168.107.62 | cb4b5f67dfc8 |
| minio          | Service Status  | OK     | N/A             | 192.168.107.62 | cb4b5f67dfc8 |
| mariadb        | Port Connection | OK     | N/A             | 192.168.107.63 | 717b2b539a95 |
| mariadb        | Service Status  | OK     | N/A             | 192.168.107.63 | 717b2b539a95 |
| opensearch     | Firewall Port   | OK     | N/A             | 192.168.107.65 | 193de5b9d521 |
| zookeeper      | Service Status  | OK     | N/A             | 192.168.107.63 | 9df371735ec2 |
| kafka          | Port Connection | OK     | N/A             | 192.168.107.65 | 8c5acc5d3073 |
| kafka          | Service Status  | OK     | N/A             | 192.168.107.65 | 8c5acc5d3073 |
| kafka          | Firewall Port   | OK     | N/A             | 192.168.107.65 | 8c5acc5d3073 |
| redis          | Service Status  | OK     | Redis Slave     | 192.168.107.65 | 0db5415aacee |
| redis          | Firewall Port   | OK     | N/A             | 192.168.107.65 | 0db5415aacee |
| redis-sentinel | Port Connection | OK     | N/A             | 192.168.107.63 | 66cc0ff7d29e |
| redis-sentinel | Service Status  | OK     | N/A             | 192.168.107.63 | 66cc0ff7d29e |
| neo4j          | Service Status  | OK     | N/A             | 192.168.107.63 | ee7e26cecb82 |
| neo4j          | Firewall Port   | OK     | N/A             | 192.168.107.63 | ee7e26cecb82 |
| portal         | Service Status  | OK     | N/A             | 192.168.107.62 | d6c9b498227e |
| portal         | Firewall Port   | OK     | N/A             | 192.168.107.62 | d6c9b498227e |
+----------------+-----------------+--------+-----------------+--------------+----------------+

Before initiating the upgrade steps, RDA Fabric's platform, worker and application services need to be stopped.

  • To stop OIA application services, run the below command. Wait until all of the services are stopped.
rdaf app down OIA
rdaf app status
  • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
rdaf worker down
rdaf worker status
  • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
rdaf platform down
rdaf platform status
2.6.3.2 Upgrade RDAF Platform Services

Step-1: Run the below command to initiate upgrading RDAF Platform services.

rdafk8s platform upgrade --tag 3.3

As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state.

kubectl get pods -n rda-fabric -l app_category=rdaf-platform

Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.

python maint_command.py

Note

If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.6/maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF platform service PODs

for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.

Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.3 version.

rdafk8s platform status
+---------------------+----------------+---------------+----------------+-------+
| Name                | Host           | Status        | Container Id   | Tag   |
+---------------------+----------------+---------------+----------------+-------+
| rda-api-server      | 192.168.131.46 | Up 1 Days ago | faf4cdd79dd4   | 3.3   |
| rda-api-server      | 192.168.131.44 | Up 1 Days ago | 409c81c1000d   | 3.3   |
| rda-registry        | 192.168.131.46 | Up 1 Days ago | fa2682e9f7bb   | 3.3   |
| rda-registry        | 192.168.131.45 | Up 1 Days ago | 91eca9476848   | 3.3   |
| rda-identity        | 192.168.131.46 | Up 1 Days ago | 4e5e337eabe7   | 3.3   |
| rda-identity        | 192.168.131.44 | Up 1 Days ago | b10571cfa217   | 3.3   |
| rda-fsm             | 192.168.131.44 | Up 1 Days ago | 1cea17b4d5e0   | 3.3   |
| rda-fsm             | 192.168.131.46 | Up 1 Days ago | ac34fce6b2aa   | 3.3   |
| rda-chat-helper     | 192.168.131.45 | Up 1 Days ago | ea083e20a082   | 3.3   |
+---------------------+---------------+---------------+----------------+--------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods
+-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host         | ID       | Site        | Age             |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| Infra | api-server                             | True        | rda-api-server | b52f3919 |             | 1 day, 3:43:49  |      8 |        31.33 |               |              |
| Infra | api-server                             | True        | rda-api-server | 4fe976c4 |             | 1 day, 3:42:42  |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | 50ba4175 |             | 1 day, 23:01:14 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | e8d040a0 |             | 1 day, 23:01:33 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-5 | 4b220140 |             | 1 day, 23:00:29 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-5 | 711afddf |             | 1 day, 23:01:37 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | 21bbd0a9 | *leader*    | 1 day, 23:01:15 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | ff2700dd |             | 1 day, 22:59:38 |      8 |        31.33 |               |              |
| Infra | worker                                 | True        | rda-worker-59b | 94f56928 | rda-site-01 | 1 day, 22:36:25 |      8 |        31.33 | 3             | 95           |
| Infra | worker                                 | True        | rda-worker-59b | 786e86c2 | rda-site-01 | 1 day, 21:00:51 |      8 |        31.33 | 0             | 108          |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

Warning

For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Run the below command to initiate upgrading RDAF Platform services.

rdaf platform upgrade --tag 3.3

Please wait till all of the new platform service are in Up state and run the below command to verify their status and make sure all of them are running with 3.3 version.

rdaf platform status
+--------------------------+----------------+------------+--------------+------+
| Name                     | Host           | Status     | Container Id | Tag  |
+--------------------------+----------------+------------+--------------+------+
| rda_api_server           | 192.168.107.61 | Up 5 hours | 6fc70d6b82aa | 3.3  |
| rda_api_server           | 192.168.107.62 | Up 5 hours | afa31a2c614b | 3.3  |
| rda_registry             | 192.168.107.61 | Up 5 hours | 9f8adbb08b95 | 3.3  |
| rda_registry             | 192.168.107.62 | Up 5 hours | cc8e5d27eb0a | 3.3  |
| rda_scheduler            | 192.168.107.61 | Up 5 hours | f501e240e7a3 | 3.3  |
| rda_scheduler            | 192.168.107.62 | Up 5 hours | c5b2b258efe1 | 3.3  |
| rda_collector            | 192.168.107.61 | Up 5 hours | 2260fc37ebe5 | 3.3  |
| rda_collector            | 192.168.107.62 | Up 5 hours | 3e7ab4518394 | 3.3  |
+--------------------------+----------------+------------+--------------+------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-statu                                                                             s                                      | ok       |                                                                                                                                          |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | minio-connect                                                                             ivity                                  | ok       |                                                                                                                                          |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-depen                                                                             dency:configuration-service            | ok       | 2 pod(s) found for configuration-service                                                                                                 |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-initi                                                                             alization-status                       | ok       |                                                                                                                                          |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | kafka-connect                                                                             ivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=2, Brokers=[2, 3,                                                                              1] |
| rda_app   | alert-ingester                         | 795652ebd914 | 91c603f4 |             | service-statu                                                                             s                                      | ok       |                                                                                                                                          |
| rda_app   | alert-ingester                         | 795652ebd914 | 91c603f4 |             | minio-connect                                                                             ivity                                  | ok       |                                                                                                                                          |
| rda_app   | alert-ingester                         | 795652ebd914 | 91c603f4 |             | service-depen                                                                             dency:configuration-service            | ok       | 2 pod(s) found for configuration-service                                                                                                 |
| rda_app   | alert-ingester                         | 795652ebd914 | 91c603f4 |             | service-initi                                                                             alization-status                       | ok       |                                                                                                                                          |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.6.3.3 Upgrade rdac CLI

Run the below command to upgrade the rdac CLI

rdafk8s rdac_cli upgrade --tag 3.3

Run the below command to upgrade the rdac CLI

rdaf rdac_cli upgrade --tag 3.3
2.6.3.4 Upgrade RDA Worker Services

Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.

rdafk8s worker upgrade --tag 3.3

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating state.

kubectl get pods -n rda-fabric -l app_component=rda-worker

Step-3: Run the below command to put all Terminating RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance command that is required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF worker service PODs

for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds between each RDAF worker service upgrade by repeating above steps from Step-2 to Step-6 for rest of the RDAF worker service PODs.

Step-6: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.

rdac pods | grep rda-worker
rdafk8s worker status
+------------+----------------+------------------+--------------+---------+
| Name       | Host           | Status           | Container Id | Tag     |
+------------+----------------+------------------+--------------+---------+
| rda-worker | 192.168.131.45 | Up 1 Days ago    | afa217d2335a | 3.3     |
| rda-worker | 192.168.131.49 | Up 1 Days ago    | e114872efc30 | 3.3     |
| rda-worker | 192.168.131.44 | Up 1 Minutes ago | 0787bdb1cfc1 | 3.3     |
| rda-worker | 192.168.131.50 | Up 3 Minutes ago | 185d3a08fa9c | 3.3     |
+------------+----------------+------------------+--------------+---------+

Step-7: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck
  • Upgrade RDA Worker Services

Please run the below command to initiate upgrading the RDA Worker service PODs.

rdaf worker upgrade --tag 3.3

Please wait for 120 seconds to let the newer version of RDA Worker service containers join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service containers.

rdac pods | grep worker
rdaf worker status

+------------+----------------+------------+--------------+---------+
| Name       | Host           | Status     | Container Id | Tag     |
+------------+----------------+------------+--------------+---------+
| rda_worker | 192.168.107.61 | Up 3 hours | 4fa9c94ffe3c | 3.3     |
| rda_worker | 192.168.107.62 | Up 3 hours | c0684c26c606 | 3.3     |
+------------+----------------+------------+--------------+---------+
Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck
2.6.3.5 Upgrade OIA Application Services

Step-1: Run the below commands to initiate upgrading RDAF OIA Application services

rdafk8s app upgrade OIA --tag 7.3

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each OIA application service is in Terminating state.

kubectl get pods -n rda-fabric -l app_name=oia

Step-3: Run the below command to put all Terminating OIA application service PODs into maintenance mode. It will list all of the POD Ids of OIA application services along with rdac maintenance command that are required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-oia-app-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the OIA application services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating OIA application service PODs

for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
kubectl get pods -n rda-fabric -l app_name=oia

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the OIA application service PODs.

Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.3 version.

rdafk8s app status
+--------------------------------+----------------+----------------+----------------+-------+
| Name                           | Host           | Status         | Container Id   | Tag   |
+--------------------------------+----------------+----------------+----------------+-------+
| rda-alert-ingester             | 192.168.131.47 | Up 1 Days ago  | 653220e94e6b   | 7.3   |
| rda-alert-ingester             | 192.168.131.46 | Up 1 Days ago  | b15255a3efcd   | 7.3   |
| rda-alert-processor            | 192.168.131.46 | Up 3 Hours ago | f5d6f91ceb37   | 7.3   |
| rda-alert-processor            | 192.168.131.47 | Up 1 Days ago  | 48a28bcff96e   | 7.3   |
| rda-alert-processor-companion  | 192.168.131.46 | Up 1 Days ago  | 86e83ef2afa3   | 7.3   |
| rda-alert-processor-companion  | 192.168.131.47 | Up 1 Days ago  | ee74d9227837   | 7.3   |
| rda-app-controller             | 192.168.131.47 | Up 1 Days ago  | 9efeddfb6b65   | 7.3   |
+--------------------------------+----------------+----------------+----------------+-------+

Step-7: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host           | ID       | Site        | Age            |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------|
| App   | cfxdimensions-app-access-manager       | True        | 3a164c761ac7 | 6f02493c |             | 2 days, 7:38:22 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | d56b629c2c3b | e5ff5696 |             | 2 days, 7:38:05 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 8aafda236efe | 126203ec |             | 2 days, 7:11:18 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 3ea382fdc6af | 618a650b |             | 2 days, 7:10:58 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | d6f0d127ab06 | deb9c0c4 |             | 2 days, 7:17:45 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | 2b9851b95094 | 013f5b00 |             | 2 days, 7:17:25 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | 8361c0008d18 | a9fe343e | *leader*    | 2 days, 7:12:36 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | ca8a2cbdca81 | 8f497bb7 |             | 2 days, 7:12:14 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | dfbbcdcddafc | 8d0425ec |             | 2 days, 7:18:24 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | 753472f0a9be | 485800b5 |             | 2 days, 7:18:06 |      8 |        31.33 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app   | alert-ingester                         | 795652ebd914 | 91c603f4 |             | service-status                                      | ok       |                                                             |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+

Upgrade Event Consumer Service to 7.3.0.1:

Step-1: Run the below commands to initiate upgrading rda-event-consumer services to 7.3.0.1 version

rdafk8s app upgrade OIA --tag 7.3.0.1 --service rda-event-consumer

Step-2: Run the below commands to check the status of rda-event-consumer service PODs and make sure atleast one instance of the service is in Terminating state.

kubectl get pods -n rda-fabric -l app_name=oia | grep rda-event-consumer

Step-3: Run the below command to put all Terminating OIA application rda-event-consumer service PODs into maintenance mode. It will list all of the POD Ids of rda-event-consumer service along with rdac maintenance command that are required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-event-consumer-service-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the rda-event-consumer service.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating rda-event-consumer service PODs

for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
kubectl get pods -n rda-fabric -l app_name=oia | grep rda-event-consumer

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the rda-event-consumer service PODs.

Please wait till all of the new rda-event-consumer service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.3.0.1 version.

rdafk8s app status

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

Upgrade Alert Ingester Service to 7.3.2:

Step-1: Run the below commands to initiate upgrading rda-alert-ingester services to 7.3.2 version

rdafk8s app upgrade OIA --tag 7.3.2 --service rda-alert-ingester

Step-2: Run the below commands to check the status of rda-alert-ingester service PODs and make sure atleast one instance of the service is in Terminating state.

kubectl get pods -n rda-fabric -l app_name=oia | grep rda-alert-ingester

Step-3: Run the below command to put all Terminating OIA application rda-alert-ingester service PODs into maintenance mode. It will list all of the POD Ids of rda-alert-ingester service along with rdac maintenance command that are required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-rda-alert-ingester-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the rda-alert-ingester service.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating rda-alert-ingester service PODs

for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
kubectl get pods -n rda-fabric -l app_name=oia | grep rda-alert-ingester

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the rda-alert-ingester service PODs.

Please wait till all of the new rda-alert-ingester service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.3.2 version.

rdafk8s app status

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

Run the below commands to initiate upgrading the RDA Fabric OIA Application services.

rdaf app upgrade OIA --tag 7.3

Please wait till all of the new OIA application service containers are in Up state and run the below command to verify their status and make sure they are running with 7.3 version.

rdaf app status

+-----------------------------------+----------------+------------+--------------+-----+
| Name                              | Host           | Status     | Container Id | Tag |
+-----------------------------------+----------------+------------+--------------+-----+
| cfx-rda-irm-service               | 192.168.107.66 | Up 5 hours | a53da18e68e8 | 7.3 |
| cfx-rda-irm-service               | 192.168.107.67 | Up 5 hours | ae42ce5f7c5a | 7.3 |
| cfx-rda-ml-config                 | 192.168.107.66 | Up 5 hours | 5942676cea00 | 7.3 |
| cfx-rda-ml-config                 | 192.168.107.67 | Up 5 hours | a59e44cb9950 | 7.3 |
| cfx-rda-collaboration             | 192.168.107.66 | Up 5 hours | 8465a6e01886 | 7.3 |
| cfx-rda-collaboration             | 192.168.107.67 | Up 5 hours | 610a07bd2893 | 7.3 |
| cfx-rda-ingestion-tracker         | 192.168.107.66 | Up 5 hours | fbc1c8d940ea | 7.3 |
| cfx-rda-ingestion-tracker         | 192.168.107.67 | Up 5 hours | 607212ea01e9 | 7.3 |
| cfx-rda-alert-processor-companion | 192.168.107.66 | Up 5 hours | 6cb93d1bdda0 | 7.3 |
| cfx-rda-alert-processor-companion | 192.168.107.67 | Up 5 hours | 3f8bf14adb34 | 7.3 |
+-----------------------------------+----------------+------------+--------------+-----+
Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods

+-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host           | ID       | Site        | Age            |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------|
| App   | cfxdimensions-app-access-manager       | True        | bd9e264212b5 | 68f9c494 |             | 22:52:26 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | 5695b14a7743 | 9499b9f8 |             | 22:50:52 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 8465a6e01886 | cefbcfaa |             | 22:23:26 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 610a07bd2893 | d33b198b |             | 22:23:05 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | 88352870e685 | e6ca73b0 |             | 22:31:19 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | 18cdb22d4439 | 56e874fd |             | 22:30:57 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | a53da18e68e8 | cdaf8950 | *leader*    | 22:25:01 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | ae42ce5f7c5a | 472c324a |             | 22:24:39 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | a11edf83127d | ba7d0978 |             | 22:32:15 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | 458a0b43be9f | 2289a696 |             | 22:31:53 |      8 |        31.33 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=2, Brokers=[2, 3, 1] |
| rda_app   | alert-ingester                         | 795652ebd914 | 91c603f4 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 795652ebd914 | 91c603f4 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 795652ebd914 | 91c603f4 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+

Upgrade Event Consumer Service to 7.3.0.1:

Run the below commands to initiate upgrading the cfx-rda-event-consumer services to 7.3.0.1 version.

rdaf app upgrade OIA --tag 7.3.0.1 --service cfx-rda-event-consumer

Please wait till all of the new OIA application cfx-rda-event-consumer service containers are in Up state and run the below command to verify their status and make sure they are running with 7.3.0.1 version.

rdaf app status

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

Upgrade Alert Ingester Service to 7.3.2:

Run the below commands to initiate upgrading the cfx-rda-alert-ingester services to 7.3.2 version.

rdaf app upgrade OIA --tag 7.3.2 --service cfx-rda-alert-ingester

Please wait till all of the new OIA application cfx-rda-alert-ingester service containers are in Up state and run the below command to verify their status and make sure they are running with 7.3.2 version.

rdaf app status

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

2.6.4.Post Upgrade Steps

2.6.4.1 OIA

1. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action

2. Enable ML experiments manually if any experiments are configured Organization --> Configuration --> ML Experiments

3. By default resizableColumns: false for alerts and incidents tabular report. If you want to resizable for alerts and incidents tabular report then make it true. Go to user Configuration -> RDA Administration -> User Dashboards then search below Dashboard

a) oia-alert-group-view-alerts-os

b) oia-alert-group-view-details-os

c) oia-alert-groups-os

d) oia-alert-tracking-os

e) oia-alerts-os

f) oia-event-tracking-os

g) oia-event-tracking-view-alerts

h) oia-incident-alerts-os

i) oia-view-alerts-policy

j) oia-view-groups-policy

k) incident-collaboration

l) oia-incidents-os-template

m) oia-incidents-os

n) oia-incidents

o) oia-my-incidents

Images_resizable_columns

4. Update oia-alerts-stream pstream definition to have default values for a_ticket_id as Not Available.(RDA Administration → Persistent Stream → oia-alerts stream → Edit)

Images_a_ticket_id

Images_oia_alerts_stream

1. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action

2. Enable ML experiments manually if any experiments are configured Organization --> Configuration --> ML Experiments

3. Update oia-alerts-stream pstream definition to have default values for a_ticket_id as Not Available.(RDA Administration → Persistent Stream → oia-alerts stream → Edit)

Images_a_ticket_id

4. By default resizableColumns: false for alerts and incidents tabular report. If you want to resizable for alerts and incidents tabular report then make it true. Go to user Configuration -> RDA Administration -> User Dashboards then search below Dashboard

a) oia-alert-group-view-alerts-os

b) oia-alert-group-view-details-os

c) oia-alert-groups-os

d) oia-alert-tracking-os

e) oia-alerts-os

f) oia-event-tracking-os

g) oia-event-tracking-view-alerts

h) oia-incident-alerts-os

i) oia-view-alerts-policy

j) oia-view-groups-policy

k) incident-collaboration

l) oia-incidents-os-template

m) oia-incidents-os

n) oia-incidents

o) oia-my-incidents

Images_resizable_columns

2.6.4.2 DNAC

1. Deploy latest dna_center_bundle bundle from ConfigurationRDA IntegrationsBundles → Click deploy action row level for dna_center_bundle.

2. Upload latest dictionaries of device_family_alias and dnac_building dictionaries

wget https://macaw-amer.s3.amazonaws.com/releases/OIA/7.3/dnac_3_3_dictionaries.tar.gz

3. Need to Run Historical Data pipelines (4 of them) which are in published pipelines. All these pipelines need to be executed based on the data and by changing the query for a specific set of data to be filtered and execute the pipeline on specific rows.

4. Once Historical data pipelines execution is successfully completed (which might take a couple of hours to complete), We need to delete all 4 pipelines as shown in the screenshot.

Images_historical_Pipeline

Note

Update the schedule timings in Service Blueprints after the deployment as per the requirement.

2.7. Upgrade from 7.3 to 7.4

Note

This is a Non-K8s cli Upgrade Document

RDAF Infra Upgarade from 1.0.2 to 1.0.3

RDAF Platform: From 3.3 to 3.4

OIA (AIOps) Application: From 7.3 to 7.4

RDAF Deployment rdaf CLI: From 1.1.10 to 1.2.0

RDAF Client rdac CLI: From 3.3 to 3.4

2.7.1. Prerequisites

Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.

  • RDAF Deployment CLI version: 1.1.10

  • Infra Services tag: 1.0.2,1.0.2.1(nats, haproxy)

  • Platform Services and RDA Worker tag: 3.3

  • OIA Application Services tag: 7.3,7.3.0.1

  • AIA Application Services tag: 7.3

  • Delete “alert-model” dataset from datasets reports on UI before start upgrade

  • Check all MariaDB nodes are sync on HA setup using below commands before start upgrade

  • mysql -u<mysql username> -p<mysql password> -h <host IP> -P3307 -e "show status like 'wsrep_local_state_comment';"

    +---------------------------+--------+
    | Variable_name             | Value  |
    +---------------------------+--------+
    | wsrep_local_state_comment | Synced |
    +---------------------------+--------+
    
  • mysql -u<mysql username> -p<mysql password> -h <host IP> -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";

    +--------------------+-------+
    | Variable_name      | Value |
    +--------------------+-------+
    | wsrep_cluster_size | 3     |
    +--------------------+-------+
    
  • CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed

Warning

Make sure all of the above pre-requisites are met before proceeding with the upgrade process.

Warning

Non-Kubernetes: Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Important

Please make sure full backup of the RDAF platform system is completed before performing the upgrade.

Non-Kubernetes: Please run the below backup command to take the backup of application data.

rdaf backup --dest-dir <backup-dir>
Note: Please make sure this backup-dir is mounted across all infra,cli vms.

  • Verify that RDAF deployment rdaf cli version is 1.1.10 on the VM where CLI was installed for docker on-prem registry and managing Non-kubernetes deployments.
rdaf --version
  • On-premise docker registry service version is 1.0.2
docker ps | grep docker-registry
  • RDAF Infrastructure services version is 1.0.2 (rda-nats service version is 1.0.2.1 and rda-minio service version is RELEASE.2022-11-11T03-44-20Z)

Run the below command to get RDAF Infra services details

rdaf infra status
  • RDAF Platform services version is 3.3

Run the below command to get RDAF Platform services details

rdaf platform status
  • RDAF OIA Application services version is 7.3 / 7.3.0.1

Run the below command to get RDAF App services details

rdaf app status
RDAF Deployment CLI Upgrade:

Please follow the below given steps.

Note

Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.

Login into the VM where rdaf deployment CLI was installed for docker on-prem registry and managing Non-kubernetes deployment.

  • To stop application services, run the below command. Wait until all of the services are stopped.
rdaf app down OIA
rdaf app status
  • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
rdaf worker down
rdaf worker status
  • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
rdaf platform down
rdaf platform status

Note

Go to each mariaDB node and docker stop of mariaDB nodes eg: docker stop --time 120 infra-mariadb-1

If the setup is standalone go to mariaDB node and do docker stop (db container ID) –time 120

If it is a cluster we have to follow the reverse order to stop the services (node3 , node2 & node1)

  • To stop RDAF Infra services, run the below command. Wait until all of the services are stopped.

rdaf infra down
rdaf infra status

  • Download the RDAF Deployment CLI's newer version 1.2.0 bundle
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.0/rdafcli-1.2.0.tar.gz
  • Upgrade the rdaf CLI to version 1.2.0
pip install --user rdafcli-1.2.0.tar.gz
  • Verify the installed rdaf CLI version is upgraded to 1.2.0
rdaf --version
  • Download the RDAF Deployment CLI's newer version 1.2.0 bundle and copy it to RDAF management VM on which `rdaf deployment CLI was installed.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.0/offline-rhel-1.2.0.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.2.0.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.2.0
  • Upgrade the rdafCLI to version 1.2.0
pip install --user rdafcli-1.2.0.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.0/offline-ubuntu-1.2.0.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.2.0.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.2.0
  • Upgrade the rdafCLI to version 1.2.0
pip install --user rdafcli-1.2.0.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
  • To stop application services, run the below command. Wait until all of the services are stopped.
rdaf app down OIA
rdaf app status
  • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
rdaf worker down
rdaf worker status
  • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
rdaf platform down
rdaf platform status

Note

Go to each mariaDB node and docker stop of mariaDB nodes eg: docker stop --time 120 infra-mariadb-1

If the setup is standalone go to mariaDB node and do docker stop (db container ID) –time 120

If it is a cluster we have to follow the reverse order to stop the services (node3 , node2 & node1)

  • To stop RDAF Infra services, run the below command. Wait until all of the services are stopped.

rdaf infra down
rdaf infra status

2.7.2. Download the new Docker Images

Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.

rdaf registry fetch --tag 1.0.3,3.4,7.4
rdaf registry fetch --minio-tag RELEASE.2023-09-30T07-02-29Z

Note

Run the below command only when graphdb service to be installed. It is an optional service.

rdaf registry fetch --graphdb-tag 1.0.3

Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.

rdaf registry list-tags 
Please make sure 1.0.3 image tag is downloaded for the below RDAF Infra services.

  • haproxy
  • nats
  • mariadb
  • opensearch
  • kafka
  • redis
  • redis-sentinel

Please make sure 3.4 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-registry
  • rda-rda-scheduler
  • rda-collector
  • rda-identity
  • rda-fsm
  • rda-access-manager
  • rda-resource-manager
  • rda-user-preferences
  • onprem-portal
  • onprem-portal-nginx
  • rda-worker-all
  • onprem-portal-dbinit
  • cfxdx-nb-nginx-all
  • rda-event-gateway
  • rda-chat-helper
  • rdac
  • rdac-full

Please make sure 7.4 image tag is downloaded for the below RDAF OIA Application services.

  • rda-app-controller
  • rda-alert-processor
  • rda-file-browser
  • rda-smtp-server
  • rda-ingestion-tracker
  • rda-reports-registry
  • rda-ml-config
  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-alert-ingester
  • rda-collaboration
  • rda-notification-service
  • rda-configuration-service
  • rda-irm-service
  • rda-alert-processor-companion
  • rda-event-consumer
  • rda-alert-ingester

Downloaded Docker images are stored under the below path.

/opt/rdaf/data/docker/registry/v2

Run the below command to check the filesystem's disk usage on which docker images are stored.

df -h /opt

Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.

rdaf registry delete-images --tag <tag1,tag2>

2.7.3.Upgrade Steps

2.7.3.1 Upgrade RDAF Infra Services

RDA Fabric platform has introduced supporting GraphDB service in 3.4 release. It is an optional service and it can be skipped during the upgrade process.

Download the python script (rdaf_upgrade_1110_120.py)

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.0/rdaf_upgrade_1110_120.py

Please run the downloaded python upgrade script.

python rdaf_upgrade_1110_120.py upgrade

It generates a new values.yaml.latest with new environment variables for rda_scheduler infrastructure service.

  • Verify after running the upgrade script it has to clear data in mount points /kafka-logs , /zookeeper and it has to delete zookeeper entries in /opt/rdaf/rdaf.cfg file and in infra.yaml file.

  • Open /opt/rdaf/rdaf.cfgfile and search for kraft_cluster_id in the kafka section it has to update.

  • Once the above python script is executed it will create /opt/rdaf/deployment-scripts/values.yaml.latest file.

Note

Please take a backup of /opt/rdaf/deployment-scripts/values.yaml file.

cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup

Edit /opt/rdaf/deployment-scripts/values.yaml and apply the below changes for rda_scheduler service.

vi /opt/rdaf/deployment-scripts/values.yaml
  • Look for section scheduler env and copy NUM_SERVER_PROCESSES: 4 and update in values.yaml for the scheduler section. As shown in the below example
    rda_scheduler:
    mem_limit: 2G
    memswap_limit: 2G
    privileged: false
    environment:
      NUM_SERVER_PROCESSES: 4
      RDA_GIT_ACCESS_TOKEN: ''
      RDA_GIT_URL: https://api.github.com
      RDA_GITHUB_ORG: ''
      RDA_GITHUB_REPO: ''
      RDA_GITHUB_BRANCH_PREFIX: main
      RDA_ENABLE_TRACES: 'no'
      DISABLE_REMOTE_LOGGING_CONTROL: 'no'
      RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3

Tip

Please skip the below step if GraphDB service is NOT going to be installed.

Warning

For installing GraphDB service, please add additional disk to RDA Fabric Infrastructure VM. Clicking Here

It is a pre-requisite and this step need to be completed before installing the GraphDB service.

  • Upgrade kafka infra Service using below command

rdaf infra install --tag 1.0.3 --service kafka
Run the below RDAF command to check infra status

rdaf infra status
Run the below RDAF command to check infra healthcheck status

rdaf infra healthcheck
Run the below RDAF command to upgrade infra services

rdaf infra upgrade --tag 1.0.3 

Run the below RDAF command to check infra status

rdaf infra status

+----------------------+----------------+-----------------+--------------+--------------+
| Name                 | Host           | Status          | Container Id | Tag          |
+----------------------+----------------+-----------------+--------------+--------------+
| haproxy              | 192.168.107.63 | Up 20 hours     | a78256a09ee6 | 1.0.3        |
| haproxy              | 192.168.107.64 | Up 20 hours     | 968fe5c56865 | 1.0.3        |
| keepalived           | 192.168.107.63 | active          | N/A          | N/A          |
| keepalived           | 192.168.107.64 | active          | N/A          | N/A          |
| nats                 | 192.168.107.63 | Up 20 hours     | ca708ba9a4ae | 1.0.3        |
| nats                 | 192.168.107.64 | Up 20 hours     | 0755f1107200 | 1.0.3        |
| mariadb              | 192.168.107.63 | Up 20 hours     | f83efc183641 | 1.0.3        |
| mariadb              | 192.168.107.64 | Up 20 hours     | 6d9fb5d84d7c | 1.0.3        |
| mariadb              | 192.168.107.65 | Up 13 hours     | 014fd3e72f0a | 1.0.3        |
| opensearch           | 192.168.107.63 | Up 20 hours     | ffebb31f79ab | 1.0.3        |
| opensearch           | 192.168.107.64 | Up 20 hours     | e539c56b2ff8 | 1.0.3        | 
| opensearch           | 192.168.107.65 | Up 13 hours     | 3f29d7388301 | 1.0.3        |
| kafka                | 192.168.107.63 | Up 20 hours     | cb15f52eb5d2 | 1.0.3        |   
+----------------------+----------------+-----------------+--------------+--------------+
Run the below RDAF command to check infra healthcheck status

rdaf infra healthcheck
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| Name           | Check           | Status | Reason                       | Host           | Container Id |
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| haproxy        | Port Connection | OK     | N/A                          | 192.168.107.63 | a78256a09ee6 |
| haproxy        | Service Status  | OK     | N/A                          | 192.168.107.63 | a78256a09ee6 |
| haproxy        | Firewall Port   | OK     | N/A                          | 192.168.107.63 | a78256a09ee6 |
| haproxy        | Port Connection | OK     | N/A                          | 192.168.107.64 | 968fe5c56865 |
| haproxy        | Service Status  | OK     | N/A                          | 192.168.107.64 | 968fe5c56865 |
| haproxy        | Firewall Port   | OK     | N/A                          | 192.168.107.64 | 968fe5c56865 |
| keepalived     | Service Status  | OK     | N/A                          | 192.168.107.63 | N/A          |
| keepalived     | Service Status  | OK     | N/A                          | 192.168.107.64 | N/A          |
| nats           | Port Connection | OK     | N/A                          | 192.168.107.63 | ca708ba9a4ae |
| nats           | Service Status  | OK     | N/A                          | 192.168.107.63 | ca708ba9a4ae |
| nats           | Firewall Port   | OK     | N/A                          | 192.168.107.63 | ca708ba9a4ae |
+----------------+-----------------+--------+------------------------------+----------------+--------------+

Note

In infra healthcheck or infra status one of the mariaDB node is down or failed we have to restart that node which is in exit or restarting state

docker restart <container id>

After restart if the node wont come up go to /opt/rdaf/config/mariadb/my_custom.cnf and change the value in innodb like below

innodb_force_recovery=1

After changing the above parameter, restart the MariaDB container again. It should bring back the db up. After bringing up the MariaDB node make sure it is up and running & then delete the above added parameter

Verify all MariaDB nodes are sync on HA setup using below commands after infra upgrade

mysql -u<username> -p<password>  -h <host IP>  -P3307 -e "show status like 'wsrep_local_state_comment';"
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+
mysql  -u<username> -p<password>  -h <host IP> -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";

+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
For graphdb installation below are the steps

rdaf infra upgrade --tag 1.0.3 --service graphdb
rdaf infra install --tag 1.0.3 --service graphdb
+----------------------+--------------+-------------+--------------+--------------------------------+
| graphdb[agent]       | 192.168.133.97 | Up 18 hours | 3f90a6003415 | 1.0.3                        |
| graphdb[agent]       | 192.168.133.98 | Up 19 hours | c26141a16a97 | 1.0.3                        |
| graphdb[agent]       | 192.168.133.99 | Up 19 hours | 19ea6f54b5fa | 1.0.3                        |
| graphdb[server]      | 192.168.133.97 | Up 18 hours | f8fb50727a13 | 1.0.3                        |
| graphdb[server]      | 192.168.133.98 | Up 19 hours | 9c1f7d9d9dbb | 1.0.3                        |
| graphdb[server]      | 192.168.133.99 | Up 19 hours | 60a08e139c19 | 1.0.3                        |
| graphdb[coordinator] | 192.168.133.97 | Up 18 hours | 56604839c6fc | 1.0.3                        |
| graphdb[coordinator] | 192.168.133.98 | Up 19 hours | a1814d1a32ba | 1.0.3                        |
| graphdb[coordinator] | 192.168.133.99 | Up 19 hours | 51df56d349c1 | 1.0.3                        |
+----------------------+----------------+-------------+--------------+------------------------------+

Note

The Below Command will upgrade configuration in mariadb

It will take time to complete the below step

python rdaf_upgrade_1110_120.py configure-mariadb

Note

The Below Command will create new kafka user with existing tenant id

python rdaf_upgrade_1110_120.py configure-kafka-tenant
2.7.3.2 Upgrade RDAF Platform Services

Warning

For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Run the below command to initiate upgrading RDAF Platform services.

rdaf platform upgrade --tag 3.4

Please wait till all of the new platform service are in Up state and run the below command to verify their status and make sure all of them are running with 3.4 version.

rdaf platform status
+--------------------------+----------------+------------+--------------+------+
| Name                     | Host           | Status     | Container Id | Tag  |
+--------------------------+----------------+------------+--------------+------+
| rda_api_server           | 192.168.107.61 | Up 5 hours | 6fc70d6b82aa | 3.4  |
| rda_api_server           | 192.168.107.62 | Up 5 hours | afa31a2c614b | 3.4  |
| rda_registry             | 192.168.107.61 | Up 5 hours | 9f8adbb08b95 | 3.4  |
| rda_registry             | 192.168.107.62 | Up 5 hours | cc8e5d27eb0a | 3.4  |
| rda_scheduler            | 192.168.107.61 | Up 5 hours | f501e240e7a3 | 3.4  |
| rda_scheduler            | 192.168.107.62 | Up 5 hours | c5b2b258efe1 | 3.4  |
| rda_collector            | 192.168.107.61 | Up 5 hours | 2260fc37ebe5 | 3.4  |
| rda_collector            | 192.168.107.62 | Up 5 hours | 3e7ab4518394 | 3.4  |
+--------------------------+----------------+------------+--------------+------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 02532fe3e9d9 | a9dcda71 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 02532fe3e9d9 | a9dcda71 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 02532fe3e9d9 | a9dcda71 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 02532fe3e9d9 | a9dcda71 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 02532fe3e9d9 | a9dcda71 |             | kafka-connectivity                                  | ok       | Cluster=ZTkxMmRjOTRjZDZiMTFlZQ, Broker=3, Brokers=[1, 2, 3] |
| rda_app   | alert-ingester                         | 5f9b978db3e9 | 4d0892ee |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 5f9b978db3e9 | 4d0892ee |             | minio-connectivity                                  | ok       |                                                             |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.7.3.3 Upgrade rdac CLI

Run the below command to upgrade the rdac CLI

rdaf rdac_cli upgrade --tag 3.4
2.7.3.4 Upgrade RDA Worker Services
  • Upgrade RDA Worker Services

Please run the below command to initiate upgrading the RDA Worker service PODs.

rdaf worker upgrade --tag 3.4

Please wait for 120 seconds to let the newer version of RDA Worker service containers join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service containers.

rdac pods | grep worker
rdaf worker status

+------------+----------------+-------------+--------------+-----+
| Name       | Host           | Status      | Container Id | Tag |
+------------+----------------+-------------+--------------+-----+
| rda_worker | 192.168.107.61 | Up 23 hours | a8a33e57e9b6 | 3.4 |
| rda_worker | 192.168.107.62 | Up 23 hours | 9fc328bc0e26 | 3.4 |
+------------+----------------+-------------+--------------+-----+
Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck
2.7.3.5 Upgrade OIA Application Services

Run the below commands to initiate upgrading the RDA Fabric OIA Application services.

rdaf app upgrade OIA --tag 7.4

Please wait till all of the new OIA application service containers are in Up state and run the below command to verify their status and make sure they are running with 7.4 version.

rdaf app status

+-----------------------------------+----------------+-------------+--------------+-----+
| Name                              | Host           | Status      | Container Id | Tag |
+-----------------------------------+----------------+-------------+--------------+-----+
| cfx-rda-app-controller            | 192.168.107.66 | Up 23 hours | 1237d8c481d1 | 7.4 |
| cfx-rda-app-controller            | 192.168.107.67 | Up 23 hours | 0d501cca27ba | 7.4 |
| cfx-rda-reports-registry          | 192.168.107.66 | Up 23 hours | 65c0007b110e | 7.4 |
| cfx-rda-reports-registry          | 192.168.107.67 | Up 23 hours | 90a43cd57188 | 7.4 |
| cfx-rda-notification-service      | 192.168.107.66 | Up 23 hours | 11b53b25c182 | 7.4 |
| cfx-rda-notification-service      | 192.168.107.67 | Up 23 hours | 3206acc1612f | 7.4 |
| cfx-rda-file-browser              | 192.168.107.66 | Up 23 hours | bd8469446bb6 | 7.4 |
| cfx-rda-file-browser              | 192.168.107.67 | Up 23 hours | 31f5f3ecd347 | 7.4 |
+-----------------------------------+----------------+-------------+--------------+-----+
Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods

+-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host           | ID       | Site        | Age            |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------|
| App   | cfxdimensions-app-access-manager       | True        | bd9e264212b5 | 68f9c494 |             | 22:52:26 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | 5695b14a7743 | 9499b9f8 |             | 22:50:52 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 8465a6e01886 | cefbcfaa |             | 22:23:26 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 610a07bd2893 | d33b198b |             | 22:23:05 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | 88352870e685 | e6ca73b0 |             | 22:31:19 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | 18cdb22d4439 | 56e874fd |             | 22:30:57 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | a53da18e68e8 | cdaf8950 | *leader*    | 22:25:01 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | ae42ce5f7c5a | 472c324a |             | 22:24:39 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | a11edf83127d | ba7d0978 |             | 22:32:15 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | 458a0b43be9f | 2289a696 |             | 22:31:53 |      8 |        31.33 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 8c2198aa42b9 | 3661b780 |             | kafka-connectivity                                  | ok       | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=2, Brokers=[2, 3, 1] |
| rda_app   | alert-ingester                         | 795652ebd914 | 91c603f4 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 795652ebd914 | 91c603f4 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 795652ebd914 | 91c603f4 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+

2.7.4.Post Upgrade Steps

2.7.4.1 OIA

1. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action ( if not deploy these bundle while drill down incident won’t show pages like Alerts, Insights etc.. )

2. Enable ML experiments manually if any experiments are configured Organization --> Configuration --> ML Experiments

3. By default resizableColumns: false for alerts and incidents tabular report. If you want to resizable for alerts and incidents tabular report then make it true. Go to user Configuration -> RDA Administration -> User Dashboards then search below Dashboard

a) oia-alert-group-view-alerts-os

b) oia-alert-group-view-details-os

c) oia-alert-groups-os

d) oia-alert-tracking-os

e) oia-alerts-os

f) oia-event-tracking-os

g) oia-event-tracking-view-alerts

h) oia-incident-alerts-os

i) oia-view-alerts-policy

j) oia-view-groups-policy

k) incident-collaboration

l) oia-incidents-os-template

m) oia-incidents-os

n) oia-incidents

o) oia-my-incidents

Images_resizable_columns

4. Collaboration Service changes

  • Post deployment, modify the following file inside each of the collaboration docker service

  • To get the container id of the collaboration service use the following command to see where the collaboration service is running

    docker ps | grep collab
    
  • docker exec -it (container-id) bash

  • vi /usr/lib/python3.7/site-packages/cfxdimensions-app-collaboration/app.properties

    a) waitq.loop.exec.delay.secs=300

    b) waitq.active.incidents.exec.threads=1

  • docker restart (container-id)

2.7.4.2 Post Installation FSM Steps (Applicable only for installations with FSM)

1. Update FSM model:

https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.0/oia_ticketing_with_soothing_interval.yml
Go to Configuration -> RDA Administration -> FSM Models and update the model from above link

2. Deploy below Bundles from Configuration -> RDA Administration ->Bundles

fsm_events_kafka_publisher_bundles 
oia_fsm_common_ticketing_bundle
oia_fsm_aots_ticketing_bundle

3. Update Pipelines from links given below to the Published Pipelines

https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.0/pipelines/fsm_collab_notifier.yml
https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.0/pipelines/close_bmc_ticket.yml
https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.0/pipelines/fsm_read_incident_stream.yml

4. Update Close BMC Ticket Blueprint to run every 5 minutes instead of the previous 15-minute interval.

5. Enable below service blueprints from Configuration -> RDA Administration -> Service Blueprints

  • FSM Read Incident Stream

  • FSM Read Alert Stream

  • Create Ticket

  • Update Ticket

  • Resolve Ticket

  • Close BMC Ticket

2.7.4.3 DNAC

Below are the steps to upgrade ONLY DNAC functionality

1. Deploy latest dna_center_bundle from ConfigurationRDA Administration Bundles → Click deploy action row level for dna_center_bundle.

Note

Steps 2, 3 & 4 items dictionary/Template files can be downloaded from below

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.0/3.4_dictionaries.tar.gz

2. Upload Latest dictionary of Device Family Alias dictionary from ConfigurationRDA AdministrationDatasets and search for device_family_alias and Click on Manage Data action row level and Click on Import and upload latest device family dictionary and click on Save

3. In the same Datasets page look for dnac_host_info and Click on Manage Data action row level and Click on Import and upload the latest dnac host info file and click on Save.

Note

It is recommended to add new dataset instead of importing the file to existing dataset.

4. Latest DNAC HTML Template can be uploaded from ConfigurationRDA AdministrationObject Store. Click on Upload and provide name as dynamic_dnac_template.html then folder name as widget_labels and upload latest HTML template and select check box of Enable Overwrite and click on Add

5. In ConfigurationRDA AdministrationPipelinesPublished Pipelines. Need to modify dnac_add_sources pipeline by uncommenting the line of %% import_source = ‘DNAC_Alpharetta’.

2.7.4.4 BCS

Below are the steps to upgrade ONLY BCS functionality

  • Deploy latest dna_center_bundle bundle from ConfigurationRDA AdministrationBundles → Click deploy action row level for bcs_operational_insights.

2.8. Upgrade From 7.3/7.4 to 7.4.1

RDAF Infra Upgrade: from 1.0.2 to 1.0.3, 1.0.3.1(haproxy)

RDAF Platform: From 3.3 to 3.4.1

OIA (AIOps) Application: From 7.3 to 7.4.1

RDAF Deployment rdafk8s CLI: From 1.1.10 to 1.2.1

RDAF Client rdac CLI: From 3.3 to 3.4.1

RDAF Infra Upgrade: From 1.0.3.1(haproxy)

RDAF Platform: From 3.4 to 3.4.1

OIA (AIOps) Application: From 7.4 to 7.4.1

RDAF Deployment rdaf CLI: From 1.2.0 to 1.2.1

RDAF Client rdac CLI: From 3.4 to 3.4.1

2.8.1. Prerequisites

Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.

  • RDAF Deployment CLI version: 1.1.10

  • Infra Services tag: 1.0.2,1.0.2.1(nats, haproxy)

  • Platform Services and RDA Worker tag: 3.3

  • OIA Application Services tag: 7.3,7.3.0.1(event_consumer),7.3.2(alert-ingester)

  • CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed

  • Delete alert-model dataset from datasets reports on UI before start upgrade

  • Check all MariaDB nodes are sync on HA setup using below commands before start upgrade

Danger

Upgrading both kafka and mariadb infra services require a downtime to the RDAF platform and application services.

Please proceed to the below steps only after scheduled downtime is approved.

Tip

Please run the below commands on the VM host where RDAF deployment CLI was installed and rdafk8s setup command was run. The mariadb configuration is read from /opt/rdaf/rdaf.cfg file.

MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`

mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "show status like 'wsrep_local_state_comment';"

Please verify that the mariadb cluster state is in Synced state.

+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+

Please run the below command and verify that the mariadb cluster size is 3.

mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
  • RDAF Deployment CLI version: 1.2.0

  • Infra Services tag: 1.0.3

  • Platform Services and RDA Worker tag: 3.4

  • OIA Application Services tag: 7.4

  • AIA Application Services tag: 7.4

  • CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed

Danger

In this release, all of the RDAF Infrastucture services are upgraded. So, it is mandatory to take VM level snapshot before proceeding with the upgrade process.

Warning

Make sure all of the above pre-requisites are met before proceeding with the upgrade process.

Warning

Kubernetes: Though Kubernetes based RDA Fabric deployment supports zero downtime upgrade, it is recommended to schedule a maintenance window for upgrading RDAF Platform and AIOps services to newer version.

Non-Kubernetes: Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Important

Please make sure full backup of the RDAF platform system is completed before performing the upgrade.

Kubernetes: Please run the below backup command to take the backup of application data.

rdafk8s backup --dest-dir <backup-dir>
Non-Kubernetes: Please run the below backup command to take the backup of application data.
rdaf backup --dest-dir <backup-dir>
Note: Please make sure this backup-dir is mounted across all infra,cli vms.

Run the below command on RDAF Management system and make sure the Kubernetes PODs are NOT in restarting mode (it is applicable to only Kubernetes environment)

kubectl get pods -n rda-fabric -l app_category=rdaf-infra
kubectl get pods -n rda-fabric -l app_category=rdaf-platform
kubectl get pods -n rda-fabric -l app_component=rda-worker 
kubectl get pods -n rda-fabric -l app_name=oia 

  • Verify that RDAF deployment rdaf cli version is 1.2.0 or rdafk8s cli version is 1.1.10 on the VM where CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployments.
rdafk8s --version
rdaf --version
  • On-premise docker registry service version is 1.0.2
docker ps | grep docker-registry
  • RDAF Infrastructure services version is 1.0.2 (rda-nats service version is 1.0.2.1 and rda-minio service version is RELEASE.2022-11-11T03-44-20Z)

Run the below command to get RDAF Infra services details

rdafk8s infra status
  • RDAF Platform services version is 3.3

Run the below command to get RDAF Platform services details

rdafk8s platform status
  • RDAF OIA Application services version is 7.3/7.3.0.1/7.3.2

Run the below command to get RDAF App services details

rdafk8s app status

Run the below command to get RDAF Infra services details

rdaf infra status
  • RDAF Platform services version is 3.4

Run the below command to get RDAF Platform services details

rdaf platform status
  • RDAF OIA Application services version is 7.4

Run the below command to get RDAF App services details

rdaf app status

RDAF Deployment CLI Upgrade:

Please follow the below given steps.

Note

Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.

Login into the VM where rdaf & rdafk8s deployment CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.

  • Download the RDAF Deployment CLI's newer version 1.2.1 bundle.
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/rdafcli-1.2.1.tar.gz
  • Upgrade the rdaf & rdafk8s CLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz
  • Verify the installed rdaf & rdafk8s CLI version is upgraded to 1.2.1
rdaf --version
rdafk8s --version
  • Download the RDAF Deployment CLI's newer version 1.1.10 bundle and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/offline-rhel-1.2.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.2.1.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.2.1
  • Upgrade the rdafCLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/offline-ubuntu-1.2.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.2.1.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.2.1
  • Upgrade the rdafCLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
  • Download the RDAF Deployment CLI's newer version 1.1.10 bundle
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/rdafcli-1.2.1.tar.gz
  • Upgrade the rdaf CLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz
  • Verify the installed rdaf CLI version is upgraded to 1.2.1
rdaf --version
  • Download the RDAF Deployment CLI's newer version 1.2.1 bundle and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/offline-rhel-1.2.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.2.1.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.2.1
  • Upgrade the rdafCLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/offline-ubuntu-1.2.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.2.1.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.2.1
  • Upgrade the rdafCLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version

2.8.2. Download the new Docker Images

Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.

Run the below command to upgrade the registry

rdaf registry upgrade --tag 1.0.3
To fetch registry please use the below command

rdaf registry fetch --tag 1.0.3,1.0.3.1,3.4.1,7.4.1
rdaf registry fetch --minio-tag RELEASE.2023-09-30T07-02-29Z

Run the below command to upgrade the registry

rdaf registry upgrade --tag 1.0.3
To fetch registry please use the below command

rdaf registry fetch --tag 1.0.3.1,3.4.1,7.4.1

Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.

rdaf registry list-tags 

Please make sure 1.0.3.1 image tag is downloaded for the below RDAF Infra service.

  • rda-platform-haproxy

Please make sure 1.0.3 image tag is downloaded for the below RDAF Infra service.

  • rda-platform-haproxy
  • rda-platform-kafka
  • rda-platform-zookeeper
  • rda-platform-mariadb
  • rda-platform-opensearch
  • rda-platform-nats
  • rda-platform-busybox
  • rda-platform-nats-box
  • rda-platform-nats-boot-config
  • rda-platform-nats-server-config-reloader
  • rda-platform-prometheus-nats-exporter
  • rda-platform-redis
  • rda-platform-redis-sentinel
  • rda-platform-arangodb-starter
  • rda-platform-kube-arangodb
  • rda-platform-arangodb
  • rda-platform-kubectl
  • rda-platform-logstash
  • rda-platform-fluent-bit

Please make sure RELEASE.2023-09-30T07-02-29Z image tag is downloaded for the below RDAF Infra service.

  • minio

Please make sure 3.4.1 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-registry
  • rda-rda-scheduler
  • rda-collector
  • rda-identity
  • rda-fsm
  • rda-stack-mgr
  • rda-access-manager
  • rda-resource-manager
  • rda-user-preferences
  • onprem-portal
  • onprem-portal-nginx
  • rda-worker-all
  • onprem-portal-dbinit
  • cfxdx-nb-nginx-all
  • rda-event-gateway
  • rda-chat-helper
  • rdac
  • rdac-full
  • cfxcollector

Please make sure 7.4.1 image tag is downloaded for the below RDAF OIA Application services.

  • rda-app-controller
  • rda-alert-processor
  • rda-file-browser
  • rda-smtp-server
  • rda-ingestion-tracker
  • rda-reports-registry
  • rda-ml-config
  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-alert-ingester
  • rda-collaboration
  • rda-notification-service
  • rda-configuration-service
  • rda-irm-service
  • rda-alert-processor-companion

Downloaded Docker images are stored under the below path.

/opt/rdaf/data/docker/registry/v2

Run the below command to check the filesystem's disk usage on which docker images are stored.

df -h /opt

Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.

rdaf registry delete-images --tag <tag1,tag2>

2.8.3.Upgrade Steps

2.8.3.1 Upgrade RDAF Infra Services
2.8.3.1.1 Update RDAF Infra/Platform Services Configuration

Please download the below python script (python rdaf_upgrade_1110_121.py)

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/rdaf_upgrade_1110_121.py

Warning

Please verify the python binary version using which RDAF deployment CLI was installed.

ls -l /home/rdauser/.local/lib --> this will show python version as a directory name. (ex: python3.7 or python3.8)

python --version --> The major version (ex: Python 3.7.4 or 3.8.10) should match output from the above.

If it doesn't match, please run the below commands.

sudo mv /usr/bin/python /usr/bin/python_backup

sudo ln -s /usr/bin/python3.7 /usr/bin/python --> Please choose the python binary version using which RDAF deployment CLI was installed. In this example, pythin3.7 binary was used.

Note: If the python version is not either 3.7.x or 3.8.x, please stop the upgrade and contact CloudFabrix support for additional assistance.

Please run the downloaded python upgrade script rdaf_upgrade_1110_121.py as shown below.

The below step will generate *values.yaml.latest files for all RDAF Infrastructure services under /opt/rdaf/deployment-scripts directory.

python rdaf_upgrade_1110_121.py upgrade --no-kafka-upgrade

Please run the below commands to take backup of the values.yaml files of Infrastrucutre and Application services.

cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup
cp /opt/rdaf/deployment-scripts/nats-values.yaml /opt/rdaf/deployment-scripts/nats-values.yaml.backup
cp /opt/rdaf/deployment-scripts/minio-values.yaml /opt/rdaf/deployment-scripts/minio-values.yaml.backup
cp /opt/rdaf/deployment-scripts/mariadb-values.yaml /opt/rdaf/deployment-scripts/mariadb-values.yaml.backup
cp /opt/rdaf/deployment-scripts/opensearch-values.yaml /opt/rdaf/deployment-scripts/opensearch-values.yaml.backup
cp /opt/rdaf/deployment-scripts/kafka-values.yaml /opt/rdaf/deployment-scripts/kafka-values.yaml.backup
cp /opt/rdaf/deployment-scripts/redis-values.yaml /opt/rdaf/deployment-scripts/redis-values.yaml.backup
cp /opt/rdaf/deployment-scripts/arangodb-operator-values.yaml /opt/rdaf/deployment-scripts/arangodb-operator-values.yaml.backup

Update NATs configuration:

Run the below command to copy the upgraded NATs configuration from nats-values.yaml.latest to nats-values.yaml

cp /opt/rdaf/deployment-scripts/nats-values.yaml.latest /opt/rdaf/deployment-scripts/nats-values.yaml

Please update the memory limit value (below highlighted parameters) in /opt/rdaf/deployment-scripts/nats-values.yaml by copying the current value from /opt/rdaf/deployment-scripts/nats-values.yaml.backup file.

Note: Below given values are for a reference only.

nats-values.yaml.backup (existing config) nats-values.yaml (updated config)

---
nats:
  image: 192.168.125.140:5000/rda-platform-nats:1.0.2.1
  pullPolicy: Always
  limits:
    pingInterval: 15s
    maxPings: 2
    maxPayload: 8MB
  tls:
    secret:
      name: rdaf-certs
    ca: ca.cert
    cert: rdaf.cert
    key: rdaf.key
  selectorLabels:
    app: rda-fabric-services
    app_category: rdaf-infra
    app_component: rda-nats
  resources:
    requests:
      memory: 4Gi
    limits:
      memory: 12Gi
bootconfig:
  image: 192.168.125.140:5000/rda-platform-nats-boot-config:1.0.2
natsbox:
  image: 192.168.125.140:5000/rda-platform-nats-box:1.0.2
  nodeSelector:
    rdaf_infra_nats: allow

---
global:
  image:
    pullSecretNames:
    - cfxregistry-cred
  labels:
    app: rda-fabric-services
    app_category: rdaf-infra
    app_component: rda-nats
tlsCA:
  enabled: true
  secretName: rdaf-certs
  key: ca.cert
....
....    
container:
  image:
    repository: 192.168.125.140:5000/rda-platform-nats
    tag: 1.0.3
    pullPolicy: IfNotPresent
  merge:
    livenessProbe:
      initialDelaySeconds: 10
      timeoutSeconds: 5
      periodSeconds: 30
      successThreshold: 1
      failureThreshold: 3
    readinessProbe:
      initialDelaySeconds: 10
      timeoutSeconds: 5
      periodSeconds: 10
      successThreshold: 1
      failureThreshold: 3
    resources:
      requests:
        memory: 4Gi
      limits:
        memory: 12Gi
....
....

Update Minio configuration:

Run the below command to copy the upgraded Minio configuration from minio-values.yaml.latest to minio-values.yaml

cp /opt/rdaf/deployment-scripts/minio-values.yaml.latest /opt/rdaf/deployment-scripts/minio-values.yaml

Please update the memory limit value (below highlighted parameters) in /opt/rdaf/deployment-scripts/minio-values.yaml by copying the current value from /opt/rdaf/deployment-scripts/minio-values.yaml.backup file.

Note: Below given values are for a reference only.

minio-values.yaml.backup (existing config) minio-values.yaml (updated config)

image:
  repository: 192.168.125.140:5000/minio
  tag: RELEASE.2022-11-11T03-44-20Z
  pullPolicy: Always
imagePullSecrets: []
mcImage:
  repository: 192.168.125.140:5000/mc
  tag: RELEASE.2022-11-07T23-47-39Z
  pullPolicy: Always
service:
  type: NodePort
  nodePort: 30443
resources:
  requests:
    memory: 2Gi
  limits:
    memory: 8Gi
persistence:
  enabled: true
  size: 50Gi
  storageClass: "local-storage"
mode: standalone
....
....

---
image:
  repository: 192.168.125.140:5000/minio
  tag: RELEASE.2023-09-30T07-02-29Z
  pullPolicy: IfNotPresent
imagePullSecrets: []
mcImage:
  repository: 192.168.125.140:5000/mc
  tag: RELEASE.2023-09-29T16-41-22Z
  pullPolicy: IfNotPresent
service:
  type: NodePort
  nodePort: 30443
resources:
  requests:
    memory: 2Gi
  limits:
    memory: 8Gi
persistence:
  enabled: true
  size: 50Gi
  storageClass: local-storage
mode: standalone

Update Opensearch configuration:

Run the below command to copy the upgraded Opensearch configuration opensearch-values.yaml.latest to opensearch-values.yaml

cp /opt/rdaf/deployment-scripts/opensearch-values.yaml.latest /opt/rdaf/deployment-scripts/opensearch-values.yaml

Please update the opensearchJavaOpts and memory limit values (below highlighted parameters) in /opt/rdaf/deployment-scripts/opensearch-values.yaml by copying the current value from /opt/rdaf/deployment-scripts/opensearch-values.yaml.backup file.

Note: Below given values are for a reference only.

opensearch-values.yaml.backup (existing config) opensearch-values.yaml (updated config)

singleNode: false
replicas: 3
roles:
  - master
  - ingest
  - data
opensearchJavaOpts: "-Xmx24G -Xms24G"
extraEnvs:
  - name: DISABLE_INSTALL_DEMO_CONFIG
    value: "true"
image:
  repository: 192.168.125.140:5000/rda-platform-opensearch
  tag: 1.0.2
  pullPolicy: Always
imagePullSecrets:
  - name: cfxregistry-cred
service:
  type: NodePort
labels:
  app: rda-fabric-services
  app_category: rdaf-infra
  app_component: rda-opensearch
resources:
  requests:
    memory: 4Gi
  limits:
    memory: 48Gi
secretMounts:
....

singleNode: false
replicas: 3
roles:
  - master
  - ingest
  - data
opensearchJavaOpts: "-Xmx24G -Xms24G"
extraEnvs:
  - name: DISABLE_INSTALL_DEMO_CONFIG
    value: "true"
image:
  repository: 192.168.125.140:5000/rda-platform-opensearch
  tag: 1.0.3
  pullPolicy: IfNotPresent
imagePullSecrets:
  - name: cfxregistry-cred
service:
  type: NodePort
labels:
  app: rda-fabric-services
  app_category: rdaf-infra
  app_component: rda-opensearch
resources:
  requests:
    memory: 4Gi
  limits:
    memory: 48Gi
livenessProbe:
  periodSeconds: 20
  timeoutSeconds: 5
  failureThreshold: 10

Update Redis configuration:

Run the below command to copy the upgraded Redis configuratoin from redis-values.yaml.latest to redis-values.yaml

cp /opt/rdaf/deployment-scripts/redis-values.yaml.latest /opt/rdaf/deployment-scripts/redis-values.yaml

Update MariaDB configuration:

Run the below command to copy the upgraded MariaDB configuratoin from mariadb-values.yaml.latest to mariadb-values.yaml

cp /opt/rdaf/deployment-scripts/mariadb-values.yaml.latest /opt/rdaf/deployment-scripts/mariadb-values.yaml

Please update the below parameters (highlighted parameters in the given config example) in /opt/rdaf/deployment-scripts/mariadb-values.yaml file.

  • memory: Update it by copying the current value from /opt/rdaf/deployment-scripts/mariadb-values.yaml.backup file

  • initialDelaySeconds: set the value to 1200 (Under livenessProbe section)

  • failureThreshold: set the value to 15 (Under livenessProbe section)

  • expire_logs_days set the value to 1

  • innodb_buffer_pool_size: Update it by copying the current value from /opt/rdaf/deployment-scripts/mariadb-values.yaml.backup file

  • Comment out wsrep_replicate_myisam=ON line. Please ignore, if it is already commented out.

Note: Below given values are for a reference only.

mariadb-values.yaml.backup (existing config) mariadb-values.yaml (updated config)

image:
  registry: 192.168.125.140:5000
  repository: rda-platform-mariadb
  tag: 1.0.2
  pullPolicy: Always
  pullSecrets:
    - cfxregistry-cred
podLabels:
  app: rda-fabric-services
  app_category: rdaf-infra
  app_component: rda-mariadb
resources:
  requests: {}
  limits:
    memory: 28Gi
livenessProbe:
  enabled: true
  initialDelaySeconds: 1200
  periodSeconds: 10
  timeoutSeconds: 1
  successThreshold: 1
  failureThreshold: 15
readinessProbe:
  enabled: true
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 1
  successThreshold: 1
  failureThreshold: 3
....
....
mariadbConfiguration: |-
[client]
port=3306
socket=/opt/bitnami/mariadb/tmp/mysql.sock
plugin_dir=/opt/bitnami/mariadb/plugin

[mysqld]
default_storage_engine=InnoDB
basedir=/opt/bitnami/mariadb
datadir=/bitnami/mariadb/data
....
....
## Binary Logging
##
log_bin=mysql-bin
expire_logs_days=1
# Disabling for performance per ....
sync_binlog=0
# Required for Galera
binlog_format=row
....
....
innodb_log_files_in_group=2
innodb_log_file_size=128M
innodb_flush_log_at_trx_commit=1
innodb_file_per_table=1
# 80% Memory is default reco.
# Need to re-evaluate when DB size grows
innodb_buffer_pool_size=18G
innodb_file_format=Barracuda
....
....
[galera]
wsrep_on=ON
wsrep_provider=/opt/bitnami/mariadb/lib/libgalera_smm.so
wsrep_sst_method=mariabackup
wsrep_slave_threads=4
wsrep_cluster_address=gcomm://
wsrep_cluster_name=galera
wsrep_sst_auth="root:"
# Enabled for performance per https://mariadb.com/....
innodb_flush_log_at_trx_commit=2
# MYISAM REPLICATION SUPPORT #
wsrep_replicate_myisam=ON

image:
  registry: 192.168.125.140:5000
  repository: rda-platform-mariadb
  tag: 1.0.3
  pullPolicy: IfNotPresent
  pullSecrets:
    - cfxregistry-cred
podLabels:
  app: rda-fabric-services
  app_category: rdaf-infra
  app_component: rda-mariadb
resources:
  requests: {}
  limits:
    memory: 28Gi
livenessProbe:
  enabled: true
  initialDelaySeconds: 1200
  periodSeconds: 10
  timeoutSeconds: 1
  successThreshold: 1
  failureThreshold: 15
readinessProbe:
  enabled: true
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 1
  successThreshold: 1
  failureThreshold: 3
....
....
mariadbConfiguration: |-
[client]
port=3306
socket=/opt/bitnami/mariadb/tmp/mysql.sock
plugin_dir=/opt/bitnami/mariadb/plugin

[mysqld]
default_storage_engine=InnoDB
basedir=/opt/bitnami/mariadb
datadir=/bitnami/mariadb/data
plugin_dir=/opt/bitnami/mariadb/plugin
....
....
## Binary Logging
##
log_bin=mysql-bin
expire_logs_days=1
# Disabling for performance per ....
sync_binlog=0
# Required for Galera
binlog_format=row
....
....
innodb_log_files_in_group=2
innodb_log_file_size=128M
innodb_flush_log_at_trx_commit=1
innodb_file_per_table=1
# 80% Memory is default reco.
# Need to re-evaluate when DB size grows
innodb_buffer_pool_size=18G
innodb_file_format=Barracuda
....
....
[galera]
wsrep_on=ON
wsrep_provider=/opt/bitnami/mariadb/lib/libgalera_smm.so
wsrep_sst_method=mariabackup
wsrep_slave_threads=4
wsrep_cluster_address=gcomm://
wsrep_cluster_name=galera
wsrep_sst_auth="root:"
# Enabled for performance per https://mariadb.com/....
innodb_flush_log_at_trx_commit=2
# MYISAM REPLICATION SUPPORT #
#wsrep_replicate_myisam=ON

Update Kafka configuration:

Run the below command to copy the upgraded Kafka configuratoin from kafka-values.yaml.latest to kafka-values.yaml

cp /opt/rdaf/deployment-scripts/kafka-values.yaml.latest /opt/rdaf/deployment-scripts/kafka-values.yaml

Please update the below parameters (highlighted parameters in the given config example) in /opt/rdaf/deployment-scripts/kafka-values.yaml file.

  • memory: Update it by copying the current value from /opt/rdaf/deployment-scripts/kafka-values.yaml.backup file

  • nodePorts: Update it by copying the current value from kafka-values.yaml.backup file, please make sure to maintain the order of the nodePorts same as in the current configuration.

  • initialDelaySeconds: set the value to 1200 (Under livenessProbe section)

  • failureThreshold: set the value to 15 (Under livenessProbe section)

Note: Below given values are for a reference only.

kafka-values.yaml.backup (existing config) kafka-values.yaml (updated config)

---
global:
  imagePullSecrets:
  - cfxregistry-cred
image:
  registry: 192.168.125.140:5000
  repository: rda-platform-kafka
  tag: 1.0.2
  pullPolicy: Always
....
....
externalAccess:
  enabled: true
  autoDiscovery:
    enabled: true
  service:
    type: NodePort
    nodePorts:
    - 32606
    - 31877
    - 30323
serviceAccount:
  create: true
rbac:
  create: true
authorizerClassName: kafka.security.authorizer.AclAuthorizer
allowEveryoneIfNoAclFound: true
....
....
....
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app_component
            operator: In
            values:
            - rda-kafka
        topologyKey: kubernetes.io/hostname
nodeSelector:
  rdaf_infra_services: allow
persistence:
  enabled: true
  size: 8Gi
resources:
  limits:
    memory: 12Gi
zookeeper:
  image:
    registry: 192.168.125.140:5000
    repository: rda-platform-zookeeper
    tag: 1.0.2
....
....

---
global:
  imagePullSecrets:
  - cfxregistry-cred
image:
  registry: 192.168.125.140:5000
  repository: rda-platform-kafka
  tag: 1.0.3
  pullPolicy: IfNotPresent
heapOpts: -Xmx2048m -Xms2048m
....
....
  livenessProbe:
    enabled: true
    initialDelaySeconds: 1200
    timeoutSeconds: 5
    failureThreshold: 15
    periodSeconds: 10
    successThreshold: 1
  readinessProbe:
    enabled: true
    initialDelaySeconds: 5
    failureThreshold: 6
    timeoutSeconds: 5
    periodSeconds: 10
    successThreshold: 1
  nodeSelector:
    rdaf_infra_services: allow
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app_component
              operator: In
              values:
              - rda-kafka-controller
          topologyKey: kubernetes.io/hostname
  persistence:
    enabled: true
    size: 8Gi
  resources:
    limits:
      memory: 12Gi
service:
  type: ClusterIP
  ports:
    client: 9092
    controller: 9095
    interbroker: 9093
    external: 9094
....
....
  controller:
    service:
      type: NodePort
      ports:
        external: 9094
      nodePorts:
      - 32606
      - 31877
      - 30323
serviceAccount:
  create: true
rbac:
  create: true
kraft:
  enabled: true

Update rda_scheduler Service Configuration:

Please take a backup of the /opt/rdaf/deployment-scripts/values.yaml

cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup

Edit /opt/rdaf/deployment-scripts/values.yaml file and update the rda_scheduler service configuration by adding the below environment variable as shown below.

  • NUM_SERVER_PROCESSES: Set the value to 4
....
....
rda_scheduler:
  replicas: 1
  privileged: true
  resources:
    requests:
      memory: 100Mi
    limits:
      memory: 2Gi
  env:
    NUM_SERVER_PROCESSES: '4'
    RDA_ENABLE_TRACES: 'no'
    DISABLE_REMOTE_LOGGING_CONTROL: 'no'
    RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
    RDA_GIT_ACCESS_TOKEN: ''
    RDA_GIT_URL: ''
    RDA_GITHUB_ORG: ''
    RDA_GITHUB_REPO: ''
    RDA_GITHUB_BRANCH_PREFIX: ''
    LABELS: tenant_name=rdaf-01
  • Download the python script (rdaf_upgrade_120_121.py)
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/rdaf_upgrade_1110_121.py
  • Please run the downloaded python upgrade script.
python rdaf_upgrade_120_121.py
  • Install haproxy service using below command
rdaf infra install --tag 1.0.3.1 --service haproxy

Run the below RDAF command to check infra status

rdaf infra status
+----------------------+----------------+------------+--------------+------------------------------+
| Name                 | Host           | Status     | Container Id | Tag                          |
+----------------------+----------------+------------+--------------+------------------------------+
| haproxy              | 192.168.133.97 | Up 2 hours | 342fc1338ba1 | 1.0.3.1                      |
| haproxy              | 192.168.133.98 | Up 2 hours | ec0de9d45a66 | 1.0.3.1                      |
| keepalived           | 192.168.133.97 | active     | N/A          | N/A                          |
| keepalived           | 192.168.133.98 | active     | N/A          | N/A                          |
| nats                 | 192.168.133.97 | Up 4 hours | d2dc79419daa | 1.0.3                        |
| nats                 | 192.168.133.98 | Up 4 hours | ef7c632bdb58 | 1.0.3                        |
| minio                | 192.168.133.93 | Up 4 hours | 414d2a2351b9 | RELEASE.2023-09-30T07-02-29Z |
| minio                | 192.168.133.97 | Up 4 hours | aa0f20af7d70 | RELEASE.2023-09-30T07-02-29Z |
| minio                | 192.168.133.98 | Up 4 hours | 91e123f8ba43 | RELEASE.2023-09-30T07-02-29Z |
| minio                | 192.168.133.99 | Up 4 hours | 74e74cc328b5 | RELEASE.2023-09-30T07-02-29Z |
| mariadb              | 192.168.133.97 | Up 4 hours | c2d71adc09ce | 1.0.3                        |
| mariadb              | 192.168.133.98 | Up 4 hours | 54615146c0fc | 1.0.3                        |
| mariadb              | 192.168.133.99 | Up 4 hours | 68e2a6088477 | 1.0.3                        |
| opensearch           | 192.168.133.97 | Up 3 hours | 7e700c133672 | 1.0.3                        |
| opensearch           | 192.168.133.98 | Up 3 hours | a582e7b552d6 | 1.0.3                        |
| opensearch           | 192.168.133.99 | Up 3 hours | f752837167e2 | 1.0.3                        |
+----------------------+----------------+------------+--------------+------------------------------+

Run the below RDAF command to check infra healthcheck status

rdaf infra healthcheck
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| Name           | Check           | Status | Reason                       | Host           | Container Id |
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| haproxy        | Port Connection | OK     | N/A                          | 192.168.133.97 | 340d7ce361e0 |
| haproxy        | Service Status  | OK     | N/A                          | 192.168.133.97 | 340d7ce361e0 |
| haproxy        | Firewall Port   | OK     | N/A                          | 192.168.133.97 | 340d7ce361e0 |
| haproxy        | Port Connection | OK     | N/A                          | 192.168.133.98 | 4a6015c9362a |
| haproxy        | Service Status  | OK     | N/A                          | 192.168.133.98 | 4a6015c9362a |
| haproxy        | Firewall Port   | OK     | N/A                          | 192.168.133.98 | 4a6015c9362a |
| keepalived     | Service Status  | OK     | N/A                          | 192.168.133.97 | N/A          |
| keepalived     | Service Status  | OK     | N/A                          | 192.168.133.98 | N/A          |
| nats           | Port Connection | OK     | N/A                          | 192.168.133.97 | 991873bb3420 |
| nats           | Service Status  | OK     | N/A                          | 192.168.133.97 | 991873bb3420 |
| nats           | Firewall Port   | OK     | N/A                          | 192.168.133.97 | 991873bb3420 |
| nats           | Port Connection | OK     | N/A                          | 192.168.133.98 | 016438fe2d17 |
| nats           | Service Status  | OK     | N/A                          | 192.168.133.98 | 016438fe2d17 |
| nats           | Firewall Port   | OK     | N/A                          | 192.168.133.98 | 016438fe2d17 |
| minio          | Port Connection | OK     | N/A                          | 192.168.133.93 | 0c3c86e896c6 |
| minio          | Service Status  | OK     | N/A                          | 192.168.133.93 | 0c3c86e896c6 |
| minio          | Firewall Port   | OK     | N/A                          | 192.168.133.93 | 0c3c86e896c6 |
| minio          | Port Connection | OK     | N/A                          | 192.168.133.97 | 604fc5ce14a3 |
| minio          | Service Status  | OK     | N/A                          | 192.168.133.97 | 604fc5ce14a3 |
| minio          | Firewall Port   | OK     | N/A                          | 192.168.133.97 | 604fc5ce14a3 |
| minio          | Port Connection | OK     | N/A                          | 192.168.133.98 | 0c2ae986076e |
| minio          | Service Status  | OK     | N/A                          | 192.168.133.98 | 0c2ae986076e |
| minio          | Firewall Port   | OK     | N/A                          | 192.168.133.98 | 0c2ae986076e |
| minio          | Port Connection | OK     | N/A                          | 192.168.133.99 | 67a7681a40b4 |
| minio          | Service Status  | OK     | N/A                          | 192.168.133.99 | 67a7681a40b4 |
| minio          | Firewall Port   | OK     | N/A                          | 192.168.133.99 | 67a7681a40b4 |
| mariadb        | Port Connection | OK     | N/A                          | 192.168.133.97 | 40e9915a3cf4 |
| mariadb        | Service Status  | OK     | N/A                          | 192.168.133.97 | 40e9915a3cf4 |
| mariadb        | Firewall Port   | OK     | N/A                          | 192.168.133.97 | 40e9915a3cf4 |
+----------------+-----------------+--------+------------------------------+----------------+--------------+
2.8.3.1.2 Upgrade RDAF Infra Services
  • Upgrade haproxy service using below command
rdafk8s infra upgrade --tag 1.0.3.1 --service haproxy
  • Please use the below mentioned command to see haproxy is up and in Running state.
rdafk8s infra status

Warning

Please verify RDAF portal access to make sure it is accessible after haproxy service is ugpraded before proceeding to the next step.

  • Upgrade nats service using below command
rdafk8s infra upgrade --tag 1.0.3 --service nats
  • Please use the below mentioned command and wait till all of the nats pods are in Running state and Ready status is 2/2
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i nats

Tip

If the nats service upgrade is failed with PodDisruptionBudget policy version error message, please update the below file with apiVersion to policy/v1beta1

vi /home/rdauser/.local/lib/python3.7/site-packages/rdaf/deployments/helm/rda-nats/files/pod-disruption-budget.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  {{- include "nats.metadataNamespace" $ | nindent 2 }}
  name: {{ .Values.podDisruptionBudget.name }}
  labels:
    {{- include "nats.labels" $ | nindent 4 }}
....

Run the nats service upgrade command.

rdafk8s infra upgrade --tag 1.0.3 --service nats
  • Upgrade minio service using below command
rdafk8s infra upgrade --tag 1.0.3 --service minio
  • Please use the below mentioned command and wait till all of the minio pods are in Running state and Ready status is 1/1
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i minio
  • Upgrade redis service using below command
rdafk8s infra upgrade --tag 1.0.3 --service redis
  • Please use the below mentioned command and wait till all of the redis pods are in Running state and Ready status is 1/1
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i redis
  • Upgrade opensearch service using below command
rdafk8s infra upgrade --tag 1.0.3 --service opensearch
  • Please use the below mentioned command and wait till all of the opensearch pods are in Running state and Ready status is 1/1
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i opensearch

Run the below command to get RDAF Infra services details

rdafk8s infra status

Danger

Upgrading both kafka and mariadb infra services require a downtime to the RDAF platform and application services.

Please proceed to the below steps only after scheduled downtime is approved.

Please download the MariaDB upgrade scripts:

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/AP_7.4.1_migration_ddl_version_from_20_to_22.ql
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/AP_7.4.1_copy_history_data_version_from_20_to_22.ql

Stop RDAF Application Services:

  • To stop rda-webhook-server application service and wait for 60 seconds. This step is to help stop receiving the incoming webhook alerts and allow rest of the application services complete processing the in-transit alerts.
rdafk8s app down OIA --service rda-webhook-server --force
sleep 60
  • To stop all of the Application services.
rdafk8s app down OIA --force
  • Check the Application services status. When all of the application services are stopped, it will show an empty output.
rdafk8s app status

Upgrade kafka Service:

  • Please run the below upgrade script rdaf_upgrade_1110_121.py. This script will clear all the data of Kafka and Zookeeper services under the mount points /kafka-logs and /zookeeper, and delete Kubernetes (k8s) pods, Helm charts, persistent volumes (pv), and persistent volume claims (pvc) configuration. After this step, it will uninstall the Kafka and Zookeeper services.
python rdaf_upgrade_1110_121.py upgrade-kafka
  • Please run the below command to check kafka and zookeeper services are uninstalled.
helm list -n rda-fabric
  • Install kafka service using below command.
rdafk8s infra install --tag 1.0.3 --service kafka
  • Please run the below command and wait till all of the kafka pods are in Running state and the Ready status is 1/1
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i kafka
  • Please run the below command to create necessary Kafka Topics and corresponding configuration.
python rdaf_upgrade_1110_121.py configure-kafka-tenant

Upgrade mariadb Service:

  • To stop mariadb services, run the below command. Wait until all of the services are stopped.
rdafk8s infra down --service mariadb
  • Please run the below command to check mariadb pods are down
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i mariadb
  • Upgrade mariadb service using the below command
rdafk8s infra upgrade --tag 1.0.3 --service mariadb 
  • Please run the below command and wait till all of the mariadb pods are in Running state and Ready status is 1/1
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i mariadb

Warning

Please wait till all of the Kafka and MariaDB infra serivce pods are in Running state and Ready status is 1/1

  • Run the below commands to check the status of the mariadb cluster. Please verify that the cluster state is in Synced state.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`

MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`

MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`

mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "show status like 'wsrep_local_state_comment';"
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+

Run the below commands to check the cluster size of the mariadb cluster. Please verify that the cluster size is 3.

mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
  • Please run the below commands to drop the indexes on two alert tables of AIOps application services.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`

MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`

MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`

TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor -e "DROP INDEX IF EXISTS AlertAlternateKey on alert;"
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor -e "DROP INDEX IF EXISTS AlertHistoryAlternateKey on alerthistory;"

Warning

Please make sure above commands are executed successfully, before continuing to the below step.

  • Please run the below command to upgrade the DB schema configuration of the mariadb serivce post the 1.0.3 version upgrade.
python rdaf_upgrade_1110_121.py configure-mariadb   
  • Please run the below RDAF command to check infra services status
rdafk8s infra status
+--------------------------+----------------+-----------------+--------------+------------------------------+
| Name                     | Host           | Status          | Container Id | Tag                          |
+--------------------------+----------------+-----------------+--------------+------------------------------+
| haproxy                  | 192.168.131.41 | Up 16 hours     | e2b3b46f702d | 1.0.3.1                      |
| haproxy                  | 192.168.131.42 | Up 5 hours      | a89fdd2c5299 | 1.0.3.1                      |
| keepalived               | 192.168.131.41 | active          | N/A          | N/A                          |
| keepalived               | 192.168.131.42 | active          | N/A          | N/A                          |
| rda-nats                 | 192.168.131.41 | Up 16 Hours ago | 3682271b3b58 | 1.0.3                        |
| rda-nats                 | 192.168.131.42 | Up 4 Hours ago  | 1f3599cf7193 | 1.0.3                        |
| rda-minio                | 192.168.131.41 | Up 16 Hours ago | 80a865d27b2c | RELEASE.2023-09-30T07-02-29Z |
| rda-minio                | 192.168.131.42 | Up 4 Hours ago  | 22c7da5bc030 | RELEASE.2023-09-30T07-02-29Z |
| rda-minio                | 192.168.131.43 | Up 3 Weeks ago  | 1af5abda3061 | RELEASE.2023-09-30T07-02-29Z |
| rda-minio                | 192.168.131.48 | Up 3 Weeks ago  | 7eec14f4ce0e | RELEASE.2023-09-30T07-02-29Z |
| rda-mariadb              | 192.168.131.41 | Up 16 Hours ago | 2596eaddb435 | 1.0.3                        |
| rda-mariadb              | 192.168.131.42 | Up 4 Hours ago  | c004da615516 | 1.0.3                        |
| rda-mariadb              | 192.168.131.43 | Up 2 Weeks ago  | b49f33d491d6 | 1.0.3                        |
| rda-opensearch           | 192.168.131.41 | Up 16 Hours ago | 5595347d56d6 | 1.0.3                        |
...
...
+--------------------------+--------------+-----------------+--------------+--------------------------------+
  • Please run the below commands to create a copy of alert and alerthistory tables of rda-alert-processor service DB as a backup and update the schema.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`

MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`

MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`

TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor < AP_7.4.1_migration_ddl_version_from_20_to_22.ql
  • Please run the below commands to copy the data from alert_bak and alerthistory_bak backup tables of rda-alert-processor service DB back to primary alert and alerthistory tables.

Note

The copy process would take sometime depends on the historical data in alerthistory table. Please continue with rest of the steps while the data is being copied.

MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`

MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`

MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`

TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor < AP_7.4.1_copy_history_data_version_from_20_to_22.ql

Installing GraphDB Service:

Tip

Please skip the below step if GraphDB service is NOT going to be installed.

Warning

For installing GraphDB service, please add additional disk to RDA Fabric Infrastructure VM. Clicking Here

It is a pre-requisite and this step need to be completed before installing the GraphDB service.

rdafk8s infra install --tag 1.0.3 --service graphdb
  • Please use the below mentioned command and wait till all of the arangodb pods are in Running state.
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep arango
2.8.3.2 Upgrade RDAF Platform Services

Step-1: Run the below command to initiate upgrading RDAF Platform services.

rdafk8s platform upgrade --tag 3.4.1

As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state.

kubectl get pods -n rda-fabric -l app_category=rdaf-platform

Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.

python maint_command.py

Note

If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.6/maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF platform service PODs

for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.

Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.4.1 version.

rdafk8s platform status
+---------------------+----------------+-----------------+--------------+-------+
| Name                | Host           | Status          | Container Id | Tag   |
+---------------------+----------------+-----------------+--------------+-------+
| rda-api-server      | 192.168.131.46 | Up 21 Hours ago | 98ec9561d787 | 3.4.1 |
| rda-api-server      | 192.168.131.45 | Up 21 Hours ago | e7b7cdb7d3d2 | 3.4.1 |
| rda-registry        | 192.168.131.44 | Up 21 Hours ago | bc2fed4a15f3 | 3.4.1 |
| rda-registry        | 192.168.131.46 | Up 21 Hours ago | 1b6da7ff3ce2 | 3.4.1 |
| rda-identity        | 192.168.131.45 | Up 21 Hours ago | 30053cf6667e | 3.4.1 |
| rda-identity        | 192.168.131.46 | Up 21 Hours ago | 6ee2e6a861f7 | 3.4.1 |
| rda-fsm             | 192.168.131.44 | Up 21 Hours ago | c014e84bf197 | 3.4.1 |
| rda-fsm             | 192.168.131.46 | Up 21 Hours ago | 6a609f8ab579 | 3.4.1 |
+---------------------+----------------+-----------------+--------------+-------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods
+-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host         | ID       | Site        | Age             |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| Infra | api-server                             | True        | rda-api-server | 5081891f |             | 0                                                                             :29:54 |      8 |        31.33 |               |              |
| Infra | api-server                             | True        | rda-api-server | 9fc5db97 |             | 0                                                                             :29:52 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | f9b6a00d |             | 0                                                                             :30:00 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | 0a4eb8cd |             | 0                                                                             :30:01 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-7 | 758fc2cb |             | 0                                                                             :30:51 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-7 | 3d56a31f |             | 0                                                                             :28:49 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | 8b570be5 |             | 0                                                                             :30:44 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | 44930ac7 | *leader*    | 0                                                                             :30:47 |      8 |        31.33 |               |              |
| Infra | worker                                 | True        | rda-worker-69d | 91615244 | rda-site-01 | 0                                                                             :25:30 |      8 |        31.33 | 0             | 9            |
| Infra | worker                                 | True        | rda-worker-69d | af99d199 | rda-site-01 | 0                                                                             :25:31 |      8 |        31.33 | 2             | 14           |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

Warning

For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

  • To stop application services, run the below command. Wait until all of the services are stopped.

    rdaf app down OIA
    
    rdaf app status
    
    • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
    rdaf worker down
    
    rdaf worker status
    
    • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
    rdaf platform down
    
    rdaf platform status
    

Run the below command to initiate upgrading RDAF Platform services.

rdaf platform upgrade --tag 3.4.1

Please wait till all of the new platform service are in Up state and run the below command to verify their status and make sure all of them are running with 3.3 version.

rdaf platform status
+--------------------------+----------------+------------+--------------+-------+
| Name                     | Host           | Status     | Container Id | Tag   |
+--------------------------+----------------+------------+--------------+-------+
| rda_api_server           | 192.168.133.92 | Up 2 hours | 6366c9717f07 | 3.4.1 |
| rda_api_server           | 192.168.133.93 | Up 2 hours | d5b8c2722f72 | 3.4.1 |
| rda_registry             | 192.168.133.92 | Up 2 hours | 47f722aab97b | 3.4.1 |
| rda_registry             | 192.168.133.93 | Up 2 hours | f5ce662af82f | 3.4.1 |
| rda_scheduler            | 192.168.133.92 | Up 2 hours | 28b597777069 | 3.4.1 |
| rda_scheduler            | 192.168.133.93 | Up 2 hours | 2d70a4ac184e | 3.4.1 |
| rda_collector            | 192.168.133.92 | Up 2 hours | 637a07f4df17 | 3.4.1 |
| rda_collector            | 192.168.133.93 | Up 2 hours | 478167b3952a | 3.4.1 |
| rda_asset_dependency     | 192.168.133.92 | Up 2 hours | c910651896fe | 3.4.1 |
| rda_asset_dependency     | 192.168.133.93 | Up 2 hours | c1ddfde81b13 | 3.4.1 |
| rda_identity             | 192.168.133.92 | Up 2 hours | f70beaa486a6 | 3.4.1 |
| rda_identity             | 192.168.133.93 | Up 2 hours | a726b0f154c8 | 3.4.1 |
| rda_fsm                  | 192.168.133.92 | Up 2 hours | 87b26529566a | 3.4.1 |
| rda_fsm                  | 192.168.133.93 | Up 2 hours | 13891be75c05 | 3.4.1 |
+--------------------------+----------------+------------+--------------+-------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | kafka-connectivity                                  | ok       | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=1, Brokers=[1, 2, 3] |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | kafka-connectivity                                  | ok       | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=2, Brokers=[1, 2, 3] |
| rda_app   | alert-processor                        | 2afde67935ac | 33170bc7 |             | service-status                                      | ok       |                                                             |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.8.3.3 Upgrade rdac CLI

Run the below command to upgrade the rdac CLI

rdafk8s rdac_cli upgrade --tag 3.4.1

Run the below command to upgrade the rdac CLI

rdaf rdac_cli upgrade --tag 3.4.1
2.8.3.4 Upgrade OIA Application Services

Step-1: Run the below commands to initiate upgrading RDAF OIA Application services

rdafk8s app upgrade OIA/AIA --tag 7.4.1

Step-2: Run the below command to check the status of the newly upgraded PODs.

kubectl get pods -n rda-fabric -l app_name=oia

Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.3 version.

rdafk8s app status

+-------------------------------+----------------+----------------+--------------+-------+
| Name                          | Host           | Status         | Container Id | Tag   |
+-------------------------------+----------------+----------------+--------------+-------+
| rda-alert-ingester            | 192.168.131.50 | Up 4 Hours ago | 013e6fb89274 | 7.4.1 |
| rda-alert-ingester            | 192.168.131.49 | Up 4 Hours ago | ce269889fe6c | 7.4.1 |
| rda-alert-processor-companion | 192.168.131.49 | Up 4 Hours ago | b4bca9347589 | 7.4.1 |
| rda-alert-processor-companion | 192.168.131.50 | Up 4 Hours ago | 1c530b32c563 | 7.4.1 |
| rda-alert-processor           | 192.168.131.47 | Up 4 Hours ago | b0e25a38c72d | 7.4.1 |
| rda-alert-processor           | 192.168.131.46 | Up 4 Hours ago | 2a5b0f764cfd | 7.4.1 |
| rda-app-controller            | 192.168.131.50 | Up 4 Hours ago | 0261820f6e01 | 7.4.1 |
| rda-app-controller            | 192.168.131.46 | Up 4 Hours ago | 134844ff7208 | 7.4.1 |
| rda-collaboration             | 192.168.131.50 | Up 4 Hours ago | e5e196b74462 | 7.4.1 |
| rda-collaboration             | 192.168.131.46 | Up 4 Hours ago | ed4ec37435b7 | 7.4.1 |
| rda-configuration-service     | 192.168.131.46 | Up 4 Hours ago | 74e22e5ddee1 | 7.4.1 |
| rda-configuration-service     | 192.168.131.50 | Up 4 Hours ago | b09637691cbd | 7.4.1 |
+-------------------------------+----------------+----------------+--------------+-------+
Step-3: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods
+-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host           | ID       | Site        | Age     |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App   | alert-ingester                         | True        | rda-alert-inge | 7861bd4f |             | 4:20:52 |      8 |        31.33 |               |              |
| App   | alert-ingester                         | True        | rda-alert-inge | 4abc521f |             | 4:20:52 |      8 |        31.33 |               |              |
| App   | alert-processor                        | True        | rda-alert-proc | 9bf94e67 |             | 4:20:50 |      8 |        31.33 |               |              |
| App   | alert-processor                        | True        | rda-alert-proc | 4e679139 |             | 4:20:48 |      8 |        31.33 |               |              |
| App   | alert-processor-companion              | True        | rda-alert-proc | 745dfbb9 |             | 4:20:39 |      8 |        31.33 |               |              |
| App   | alert-processor-companion              | True        | rda-alert-proc | 02f6bce0 |             | 4:20:41 |      8 |        31.33 |               |              |
| App   | asset-dependency                       | True        | rda-asset-depe | fc6c7a60 |             | 4:28:00 |      8 |        31.33 |               |              |
| App   | asset-dependency                       | True        | rda-asset-depe | d3ca4c11 |             | 4:27:07 |      8 |        31.33 |               |              |
| App   | authenticator                          | True        | rda-identity-6 | 4cd59d9c |             | 4:27:01 |      8 |        31.33 |               |              |
| App   | authenticator                          | True        | rda-identity-6 | 174298c3 |             | 4:25:53 |      8 |        31.33 |               |              |
| App   | cfx-app-controller                     | True        | rda-app-contro | 4d923832 |             | 4:20:42 |      8 |        31.33 |               |              |
| App   | cfx-app-controller                     | True        | rda-app-contro | b16deafa |             | 4:20:25 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | rda-access-man | 09d1fada |             | 4:27:56 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | rda-access-man | e0af2bcc |             | 4:27:54 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | rda-collaborat | 9e7f7bcb |             | 4:20:31 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | rda-collaborat | 38db5386 |             | 4:20:25 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | rda-file-brows | 589e18f8 |             | 4:20:20 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | rda-file-brows | 853545f8 |             | 4:19:59 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | rda-irm-servic | d17f8dcd |             | 4:20:06 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | rda-irm-servic | 44decaa7 | *leader*    | 4:19:41 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | rda-notificati | 74e58855 |             | 4:20:14 |      8 |        31.33 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | kafka-connectivity                                  | ok       | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=0, Brokers=[0, 1, 2] |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | kafka-connectivity                                  | ok       | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=2, Brokers=[0, 1, 2] |
| rda_app   | alert-processor                        | rda-alert-pr | 4e679139 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-processor                        | rda-alert-pr | 4e679139 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-processor                        | rda-alert-pr | 4e679139 |             | service-dependency:cfx-app-controller               | ok       | 2 pod(s) found for cfx-app-controller                       |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+

Run the below commands to initiate upgrading the RDA Fabric OIA Application services.

rdaf app upgrade OIA --tag 7.4.1

Please wait till all of the new OIA application service containers are in Up state and run the below command to verify their status and make sure they are running with 7.4.1 version.

rdaf app status

+-----------------------------------+----------------+------------+--------------+-------+
| Name                              | Host           | Status     | Container Id | Tag   |
+-----------------------------------+----------------+------------+--------------+-------+
| cfx-rda-app-controller            | 192.168.133.96 | Up 2 hours | deab59a554f6 | 7.4.1 |
| cfx-rda-app-controller            | 192.168.133.92 | Up 2 hours | 7e3cbfc6d899 | 7.4.1 |
| cfx-rda-reports-registry          | 192.168.133.96 | Up 2 hours | 934ef236dde2 | 7.4.1 |
| cfx-rda-reports-registry          | 192.168.133.92 | Up 2 hours | 8749187dfb82 | 7.4.1 |
| cfx-rda-notification-service      | 192.168.133.96 | Up 2 hours | eaaa0116b25c | 7.4.1 |
| cfx-rda-notification-service      | 192.168.133.92 | Up 2 hours | 7f5b91f6b166 | 7.4.1 |
| cfx-rda-file-browser              | 192.168.133.96 | Up 2 hours | 62ba48307a89 | 7.4.1 |
| cfx-rda-file-browser              | 192.168.133.92 | Up 2 hours | ad83ab7f2611 | 7.4.1 |
| cfx-rda-configuration-service     | 192.168.133.96 | Up 2 hours | 6f24b3296c44 | 7.4.1 |
| cfx-rda-configuration-service     | 192.168.133.92 | Up 2 hours | ad93c6ddf2bc | 7.4.1 |
| cfx-rda-alert-ingester            | 192.168.133.96 | Up 2 hours | 9132494ea9ab | 7.4.1 |
| cfx-rda-alert-ingester            | 192.168.133.92 | Up 2 hours | f5312c1fc474 | 7.4.1 |
+-----------------------------------+----------------+------------+--------------+-------+
Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods

+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host         | ID       | Site        | Age     |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App   | alert-ingester                         | True        | 9132494ea9ab | ad43cf79 |             | 1:56:34 |      4 |        31.21 |               |              |
| App   | alert-ingester                         | True        | f5312c1fc474 | 2a129b31 |             | 1:56:21 |      4 |        31.21 |               |              |
| App   | alert-processor                        | True        | 2afde67935ac | 33170bc7 |             | 1:54:29 |      4 |        31.21 |               |              |
| App   | alert-processor                        | True        | f289e1088a16 | 831fe5c3 |             | 1:54:14 |      4 |        31.21 |               |              |
| App   | alert-processor-companion              | True        | 83ebf4300ac5 | c9dba0df |             | 1:47:44 |      4 |        31.21 |               |              |
| App   | alert-processor-companion              | True        | 9b1b55d78d1a | a66ecf29 |             | 1:47:29 |      4 |        31.21 |               |              |
| App   | asset-dependency                       | True        | c1ddfde81b13 | 985fc496 |             | 2:20:03 |      4 |        31.21 |               |              |
| App   | asset-dependency                       | True        | c910651896fe | 9c355c7d |             | 2:20:06 |      4 |        31.21 |               |              |
| App   | authenticator                          | True        | f70beaa486a6 | 955eb254 |             | 2:19:59 |      4 |        31.21 |               |              |
| App   | authenticator                          | True        | a726b0f154c8 | 898c36b4 |             | 2:19:57 |      4 |        31.21 |               |              |
| App   | cfx-app-controller                     | True        | 7e3cbfc6d899 | 2097a877 |             | 1:58:49 |      4 |        31.21 |               |              |
| App   | cfx-app-controller                     | True        | deab59a554f6 | 3bd4ce27 |             | 1:59:02 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | f47c6cab13f1 | e0636eea |             | 2:19:32 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | 02b526adf7f9 | 7a286ce7 |             | 2:19:23 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | b602c2cddd90 | 836e0134 |             | 1:53:02 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 2f02987f249d | c4d4720d |             | 1:48:31 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | 62ba48307a89 | 48d1d0d2 |             | 1:57:34 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | ad83ab7f2611 | 93078496 |             | 1:57:14 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | 56dffc7d6501 | 672ff70a | *leader*    | 1:53:57 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | b40a96601c73 | 25fe51f5 |             | 1:53:42 |      4 |        31.21 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | kafka-connectivity                                  | ok       | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=1, Brokers=[1, 2, 3] |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | kafka-connectivity                                  | ok       | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=3, Brokers=[1, 2, 3] |
| rda_app   | alert-processor                        | 2afde67935ac | 33170bc7 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-processor                        | 2afde67935ac | 33170bc7 |             | minio-connectivity                                  | ok       |                                                             |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.8.3.5 Upgrade RDA Worker Services

Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.

rdafk8s worker upgrade --tag 3.4.1

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating state.

kubectl get pods -n rda-fabric -l app_component=rda-worker
NAME                          READY   STATUS    RESTARTS   AGE
rda-worker-69d485f476-99tnv   1/1     Running   0          45h
rda-worker-69d485f476-gwq4f   1/1     Running   0          45h

Step-3: Run the below command to put all Terminating RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance command that is required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF worker service PODs

for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds between each RDAF worker service upgrade by repeating above steps from Step-2 to Step-6 for rest of the RDAF worker service PODs.

Step-6: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.

rdac pods | grep rda-worker
rdafk8s worker status
+------------+----------------+-----------------+--------------+-------+
| Name       | Host           | Status          | Container Id | Tag   |
+------------+----------------+-----------------+--------------+-------+
| rda-worker | 192.168.131.45 | Up 19 Hours ago | 6360f61b4249 | 3.4.1 |
| rda-worker | 192.168.131.44 | Up 19 Hours ago | 806b7b334943 | 3.4.1 |
+------------+----------------+-----------------+--------------+-------+

Step-7: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck
  • Upgrade RDA Worker Services

Please run the below command to initiate upgrading the RDA Worker service PODs.

rdaf worker upgrade --tag 3.4.1

Note

If the worker is deployed in proxy environment, please add the required environment proxy variables in /opt/rdaf/deployment-scripts/values.yaml, under the section rda_worker -> env:, instead of making changes to worker.yaml (this is needed only if there are any new changes needed for worker)

Please wait for 120 seconds to let the newer version of RDA Worker service containers join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service containers.

rdac pods | grep worker
rdaf worker status

+------------+----------------+------------+--------------+-------+
| Name       | Host           | Status     | Container Id | Tag   |
+------------+----------------+------------+--------------+-------+
| rda_worker | 192.168.133.96 | Up 2 hours | 03061dd8dfcc | 3.4.1 |
| rda_worker | 192.168.133.92 | Up 2 hours | cbb31b875cf6 | 3.4.1 |
+------------+----------------+------------+--------------+-------+
Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck

2.8.4.Post Upgrade Steps

2.8.4.1 OIA

1. Deploy latest Alerts and Incidents Dashboard configuration

Go to Main Menu --> Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on Deploy action to deploy the latest Dashboards configuration for Alerts and Incidents.

Warning

It is mandatory to deploy the oia_l1_l2_bundle (Alerts and Incidents Dashboards configuration) as the existing dashboard configuration for the same has been deprecated.

After deploying the oia_l1_l2_bundle, within each Incident dashboard page, below pages are enabled by default irrespective of the corresponding features are configured or not.

  • Alerts
  • Topology
  • Metrics
  • Insights
  • Collaboration
  • Diagnostics
  • Remediation
  • Activities

Within each Incident page, the Alerts and Collaboration pages are mandatory, while the rest of the pages are optional until they are configured within the system.

If you need to remove these optional pages from the default Incident's view dashboard, please follow the below steps.

Go to Main Menu --> Configuration --> RDA Administration --> Dashboards --> User Dashboards --> Edit JSON config of incident-details-app dashboard and delete the below highlighted JSON configuration blocks.

....
....
"dashboard_pages": [
{
  "name": "incident-details-alerts",
  "label": "Alerts",
  "icon": "alert.svg"
},
{
  "name": "incident-details-topology",
  "label": "Topology",
  "icon": "topology.svg"
},
{
  "name": "incident-details-metrics",
  "label": "Metrics",
  "icon": "metrics.svg"
},
{
  "name": "incident-details-insights",
  "label": "Insights",
  "icon": "nextSteps.svg"
},
{
  "name": "incident-details-collaboration",
  "label": "Collaboration",
  "icon": "collaboration.svg"
},
{
  "name": "incident-details-diagnostics",
  "label": "Diagnostics",
  "icon": "diagnostic.svg"
},
{
  "name": "incident-details-remediation",
  "label": "Remediation",
  "icon": "remedial.svg"
},
{
  "name": "incident-details-activities",
  "label": "Activities",
  "icon": "activities.svg"
}
....
....

Note

Please note that, these deleted configuration blocks of Topology, Metrics, Insights, Diagnostics, Remediation and Activities can be added back once the corresponding features are configured within the system.

1. Deploy latest Alerts and Incidents Dashboard configuration

Go to Main Menu --> Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on Deploy action to deploy the latest Dashboards configuration for Alerts and Incidents.

Warning

It is mandatory to deploy the oia_l1_l2_bundle (Alerts and Incidents Dashboards configuration) as the existing dashboard configuration for the same has been deprecated.

After deploying the oia_l1_l2_bundle, within each Incident dashboard page, below pages are enabled by default irrespective of the corresponding features are configured or not.

  • Alerts
  • Topology
  • Metrics
  • Insights
  • Collaboration
  • Diagnostics
  • Remediation
  • Activities

Within each Incident page, the Alerts and Collaboration pages are mandatory, while the rest of the pages are optional until they are configured within the system.

If you need to remove these optional pages from the default Incident's view dashboard, please follow the below steps.

Go to Main Menu --> Configuration --> RDA Administration --> Dashboards --> User Dashboards --> Edit JSON config of incident-details-app dashboard and delete the below highlighted JSON configuration blocks.

....
....
"dashboard_pages": [
{
  "name": "incident-details-alerts",
  "label": "Alerts",
  "icon": "alert.svg"
},
{
  "name": "incident-details-topology",
  "label": "Topology",
  "icon": "topology.svg"
},
{
  "name": "incident-details-metrics",
  "label": "Metrics",
  "icon": "metrics.svg"
},
{
  "name": "incident-details-insights",
  "label": "Insights",
  "icon": "nextSteps.svg"
},
{
  "name": "incident-details-collaboration",
  "label": "Collaboration",
  "icon": "collaboration.svg"
},
{
  "name": "incident-details-diagnostics",
  "label": "Diagnostics",
  "icon": "diagnostic.svg"
},
{
  "name": "incident-details-remediation",
  "label": "Remediation",
  "icon": "remedial.svg"
},
{
  "name": "incident-details-activities",
  "label": "Activities",
  "icon": "activities.svg"
}
....
....

Note

Please note that, these deleted configuration blocks of Topology, Metrics, Insights, Diagnostics, Remediation and Activities can be added back once the corresponding features are configured within the system.