Installing OIA (Operations Intelligence & Analytics)
This document provides instructions about fresh Installation of OIA (Operations Intelligence & Analytics), which is also referred as AIOps.
1. Setup & Install
cfxOIA is an application that is installed on top of RDA Fabric platform.
1.1 Tag Version: 7.4.1
Pre-requisites:
Below are the pre-requisites which need to be in place before installing the OIA (AIOps) application services.
RDAF Deployment CLI Version: 1.2.1
RDAF Infrastructure Services Tag Version: 1.0.3
RDAF Core Platform & Worker Services Tag Version: 3.4.1
RDAF Client (RDAC) Tag Version: 3.4.1
- Install and Configure RDAF Deployment CLI (for Non-Kubernetes or Kubernetes)
- Setup & Configure Docker On-premise Registry and download all RDAF Platform's service images (Infrastructure, Core Platform, Application and Worker services)
- Setup and Install RDAF Infrastructure and Platform services
- Install RDAF Worker services
- Setup and Install RDAC CLI
Warning
Please complete all of the above pre-requisites before installing the OIA (AIOps) application services.
Login as rdauser user into on-premise docker registry or RDA Fabric Platform VM on which RDAF deployment CLI was installed (ex: putty)
Before installing the OIA (AIOps) application services, please run the below command to update HAProxy (Loadbalancer) configuration.
Run the below rdaf
or rdafk8s
command, to make sure all of the RDAF infrastructure services are up and running.
Run the below rdac pods
command, to make sure all of the RDAF core platform and worker services are up and running.
+-------+----------------------------------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------|
| App | asset-dependency | rda-asset-depe | 090669bf | | 20:18:21 | 8 | 47.03 | | |
| App | authenticator | rda-identity-5 | 57905b20 | | 20:19:11 | 8 | 47.03 | | |
| App | cfxdimensions-app-access-manager | rda-access-man | 6338ad29 | | 20:18:44 | 8 | 47.03 | | |
| App | cfxdimensions-app-notification-service | rda-notificati | bb9e3e7b | | 20:09:52 | 8 | 31.33 | | |
| App | cfxdimensions-app-resource-manager | rda-resource-m | e5a28e16 | | 20:18:34 | 8 | 47.03 | | |
| App | user-preferences | rda-user-prefe | fd09d3ba | | 20:18:08 | 8 | 47.03 | | |
| Infra | api-server | rda-api-server | b1b910d9 | | 20:19:22 | 8 | 47.03 | | |
| Infra | collector | rda-collector- | 99553e51 | | 20:18:17 | 8 | 47.03 | | |
| Infra | registry | rda-registry-7 | a46cd712 | | 20:19:15 | 8 | 47.03 | | |
| Infra | scheduler | rda-scheduler- | d5537051 | *leader* | 20:18:26 | 8 | 47.03 | | |
| Infra | worker | rda-worker-54d | 1f769792 | rda-site-01 | 20:06:48 | 4 | 15.6 | 0 | 0 |
+-------+----------------------------------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
Run the below rdac healthcheck
command to check the health status of all of the RDAF core platform and worker services.
All of the dependency checks should show as ok under Status column.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_infra | api-server | rda-api-serv | b1b910d9 | | service-status | ok | |
| rda_infra | api-server | rda-api-serv | b1b910d9 | | minio-connectivity | ok | |
| rda_app | asset-dependency | rda-asset-de | 090669bf | | service-status | ok | |
| rda_app | asset-dependency | rda-asset-de | 090669bf | | minio-connectivity | ok | |
| rda_app | authenticator | rda-identity | 57905b20 | | service-status | ok | |
| rda_app | authenticator | rda-identity | 57905b20 | | minio-connectivity | ok | |
| rda_app | authenticator | rda-identity | 57905b20 | | DB-connectivity | ok | |
| rda_app | cfxdimensions-app-access-manager | rda-access-m | 6338ad29 | | service-status | ok | |
| rda_app | cfxdimensions-app-access-manager | rda-access-m | 6338ad29 | | minio-connectivity | ok | |
| rda_app | cfxdimensions-app-access-manager | rda-access-m | 6338ad29 | | service-dependency:registry | ok | 1 pod(s) found for registry |
| rda_app | cfxdimensions-app-access-manager | rda-access-m | 6338ad29 | | service-initialization-status | ok | |
| rda_app | cfxdimensions-app-access-manager | rda-access-m | 6338ad29 | | DB-connectivity | ok | |
| rda_app | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b | | service-status | ok | |
| rda_app | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b | | minio-connectivity | ok | |
| rda_app | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b | | service-initialization-status | ok | |
| rda_app | cfxdimensions-app-notification-service | rda-notifica | bb9e3e7b | | DB-connectivity | ok | |
| rda_app | cfxdimensions-app-resource-manager | rda-resource | e5a28e16 | | service-status | ok | |
| rda_app | cfxdimensions-app-resource-manager | rda-resource | e5a28e16 | | minio-connectivity | ok | |
| rda_app | cfxdimensions-app-resource-manager | rda-resource | e5a28e16 | | service-dependency:registry | ok | 1 pod(s) found for registry |
| rda_app | cfxdimensions-app-resource-manager | rda-resource | e5a28e16 | | service-dependency:cfxdimensions-app-access-manager | ok | 1 pod(s) found for cfxdimensions-app-access-manager |
| rda_app | cfxdimensions-app-resource-manager | rda-resource | e5a28e16 | | service-initialization-status | ok | |
| rda_app | cfxdimensions-app-resource-manager | rda-resource | e5a28e16 | | DB-connectivity | ok | |
| rda_infra | collector | rda-collecto | 99553e51 | | service-status | ok | |
| rda_infra | collector | rda-collecto | 99553e51 | | minio-connectivity | ok | |
| rda_infra | collector | rda-collecto | 99553e51 | | opensearch-connectivity:default | ok | |
| rda_infra | registry | rda-registry | a46cd712 | | service-status | ok | |
| rda_infra | registry | rda-registry | a46cd712 | | minio-connectivity | ok | |
| rda_infra | scheduler | rda-schedule | d5537051 | | service-status | ok | |
| rda_infra | scheduler | rda-schedule | d5537051 | | minio-connectivity | ok | |
| rda_infra | scheduler | rda-schedule | d5537051 | | DB-connectivity | ok | |
| rda_app | user-preferences | rda-user-pre | fd09d3ba | | service-status | ok | |
| rda_app | user-preferences | rda-user-pre | fd09d3ba | | minio-connectivity | ok | |
| rda_app | user-preferences | rda-user-pre | fd09d3ba | | service-dependency:registry | ok | 1 pod(s) found for registry |
| rda_app | user-preferences | rda-user-pre | fd09d3ba | | service-initialization-status | ok | |
| rda_app | user-preferences | rda-user-pre | fd09d3ba | | DB-connectivity | ok | |
| rda_infra | worker | rda-worker-5 | 1f769792 | rda-site-01 | service-status | ok | |
| rda_infra | worker | rda-worker-5 | 1f769792 | rda-site-01 | minio-connectivity | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
Installing OIA (AIOps) Application Services:
Set RDA Fabric platform's application configuration as aiops
using the below command.
Note
Other supported options for above command are below:
-
rda
: Choose this option when only RDA Fabric platform need to be installed along with RDA Worker and RDA Event Gateway services without AIOps (OIA) or Asset Intelligence (AIA) applications. -
aiops
: Choose this option when Operations Intelligence (OIA, a.k.a AIOps) application need to be installed. -
asset
: Choose this option when Asset Intelligence (AIA) application need to be installed. (Note: AIA application type is deprecated and all of it's capabilities are available through base RDA Fabric platform itself. For more information, please contact cfx-support@cloudfabric.com) -
all
: Choose this option, when all of the supported applications need to be installed.
Run the below command to deploy RDAF OIA (AIOps) application services. (Note: Below shown tag name is a sample one for a reference only, for actual tag, please contact CloudFabrix support team at support@cloudfabrix.com)
After installing the OIA (AIOps) application services, run the below command to see the running status of the deployed application services.
+---------------------------------+----------------+-----------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+---------------------------------+----------------+-----------------+--------------+-------+
| rda-alert-ingester | 192.168.125.46 | Up 20 Hours ago | 610bb0e286d6 | 7.4.1 |
| rda-alert-processor | 192.168.125.46 | Up 20 Hours ago | 79ee6788f73e | 7.4.1 |
| rda-app-controller | 192.168.125.46 | Up 20 Hours ago | 6c672102d5ff | 7.4.1 |
| rda-collaboration | 192.168.125.46 | Up 20 Hours ago | 34f25c05afce | 7.4.1 |
| rda-configuration-service | 192.168.125.46 | Up 20 Hours ago | 112ccaf4b0e6 | 7.4.1 |
| rda-dataset-caas-all-alerts | 192.168.125.46 | Up 20 Hours ago | 2b48d4dfbfd0 | 7.4.1 |
| rda-dataset-caas-current-alerts | 192.168.125.46 | Up 20 Hours ago | 03cdc77ddf1f | 7.4.1 |
| rda-event-consumer | 192.168.125.46 | Up 20 Hours ago | 21113ba951a1 | 7.4.1 |
| rda-file-browser | 192.168.125.46 | Up 20 Hours ago | 425dac228fc9 | 7.4.1 |
| rda-ingestion-tracker | 192.168.125.46 | Up 20 Hours ago | 8a984a536a97 | 7.4.1 |
| rda-irm-service | 192.168.125.46 | Up 20 Hours ago | 258aadc0c1af | 7.4.1 |
| rda-ml-config | 192.168.125.46 | Up 20 Hours ago | bf23d58903f7 | 7.4.1 |
| rda-notification-service | 192.168.125.46 | Up 20 Hours ago | a15c5232b25d | 7.4.1 |
| rda-reports-registry | 192.168.125.46 | Up 20 Hours ago | 3890b5dfb8ae | 7.4.1 |
| rda-smtp-server | 192.168.125.46 | Up 20 Hours ago | 6aadab781947 | 7.4.1 |
| rda-webhook-server | 192.168.125.46 | Up 20 Hours ago | 6bf555aed18b | 7.4.1 |
+---------------------------------+--------------+-----------------+--------------+-------+
+---------------------------------+----------------+-----------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+---------------------------------+----------------+-----------------+--------------+-------+
| rda-alert-ingester | 192.168.125.46 | Up 20 Hours ago | 610bb0e286d6 | 7.4.1 |
| rda-alert-processor | 192.168.125.46 | Up 20 Hours ago | 79ee6788f73e | 7.4.1 |
| rda-app-controller | 192.168.125.46 | Up 20 Hours ago | 6c672102d5ff | 7.4.1 |
| rda-collaboration | 192.168.125.46 | Up 20 Hours ago | 34f25c05afce | 7.4.1 |
| rda-configuration-service | 192.168.125.46 | Up 20 Hours ago | 112ccaf4b0e6 | 7.4.1 |
| rda-dataset-caas-all-alerts | 192.168.125.46 | Up 20 Hours ago | 2b48d4dfbfd0 | 7.4.1 |
| rda-dataset-caas-current-alerts | 192.168.125.46 | Up 20 Hours ago | 03cdc77ddf1f | 7.4.1 |
| rda-event-consumer | 192.168.125.46 | Up 20 Hours ago | 21113ba951a1 | 7.4.1 |
| rda-file-browser | 192.168.125.46 | Up 20 Hours ago | 425dac228fc9 | 7.4.1 |
| rda-ingestion-tracker | 192.168.125.46 | Up 20 Hours ago | 8a984a536a97 | 7.4.1 |
| rda-irm-service | 192.168.125.46 | Up 20 Hours ago | 258aadc0c1af | 7.4.1 |
| rda-ml-config | 192.168.125.46 | Up 20 Hours ago | bf23d58903f7 | 7.4.1 |
| rda-notification-service | 192.168.125.46 | Up 20 Hours ago | a15c5232b25d | 7.4.1 |
| rda-reports-registry | 192.168.125.46 | Up 20 Hours ago | 3890b5dfb8ae | 7.4.1 |
| rda-smtp-server | 192.168.125.46 | Up 20 Hours ago | 6aadab781947 | 7.4.1 |
| rda-webhook-server | 192.168.125.46 | Up 20 Hours ago | 6bf555aed18b | 7.4.1 |
+---------------------------------+--------------+-----------------+--------------+-------+
Configuring OIA (AIOps) Application:
Login into RDAF portal as admin@cfx.com user.
Create a new Service Blueprint for OIA (AIOps) application and Machine Learning (ML) application.
For OIA (AIOps) Application: Go to Main Menu --> Configuration --> Artifacts --> Service Blueprints --> View details --> Click on Add and copy & paste the below configuration and Click on Save
name: cfxOIA
id: 81a1a2202
version: 2023_02_12_01
category: ITOM
comment: Operations Intelligence & Analytics (AIOps)
enabled: true
type: Service
provider: CloudFabrix Software, Inc.
attrs: {}
apps:
- label: cfxOIA
appType: dimensions
appName: incident-room-manager
icon_url: /assets/img/applications/OIA.png
permission: app:irm:read
service_pipelines: []
For Machine Learning (ML) Application: Go to Main Menu --> Configuration --> Artifacts --> Service Blueprints --> View details --> Click on Add and copy & paste the below configuration and Click on Save
name: cfxML
id: 81a1a030
version: 2023_02_12_01
category: ITOM
comment: Machine Learning (ML) Experiments
enabled: true
type: Service
provider: CloudFabrix Software, Inc.
attrs: {}
apps:
- label: cfxML
appType: dimensions
appName: ml-config
icon_url: /assets/img/applications/ML.png
permission: app:irm:read
service_pipelines: []
2. Upgrade
This section provides instructions on how to upgrade existing deployment of RDAF platform and it's application OIA (Operations Intelligence & Analytics), which is also referred as AIOps.
2.1 Upgrade from 7.0.x to 7.0.6
Upgrade Prerequisites
Below are the pre-requisites which need to be in place before upgrafing the OIA (AIOps) application services.
RDAF Deployment CLI Version Upgrade: From 1.0.6 or higher to 1.1.2
RDAF Infrastructure Services Tag Version: From 1.0.1 or higher to 1.0.2 (Note: Not applicable if the services are already running at 1.0.2 version)
RDAF Core Platform & Worker Services Tag Version: From 3.0.9 to 3.1.0
RDAF Client (RDAC) Tag Version: From 3.0.9 to 3.1.0
- Upgrade RDAF Deployment CLI (for Kubernetes or Non-Kubernetes)
- Fetch upgrade images on Docker On-premise Registry and download all RDAF Platform's service images (Infrastructure, Core Platform, Application and Worker services)
Warning
Please complete all of the above pre-requisites before installing the OIA (AIOps) application services.
On-premise docker-registry
Login into RDAF on-premise docker-registry VM or RDAF platform VM as rdauser
using SSH client on which rdaf
CLI was installed and run the below command to verify status of the docker-registry
service.
+-----------------+---------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+-----------------+---------------+------------+--------------+-------+
| docker-registry | 111.92.12.140 | Up 4 weeks | 71b8036fc64f | 1.0.1 |
+-----------------+---------------+------------+--------------+-------+
RDAF Infrastructure, Platform and Application services:
Login into RDAF on-premise docker-registry VM or RDAF platform VM as rdauser
using SSH client on which rdaf
CLI was installed and run the below command to verify status of the RDAF platform's infrastructure, core platform, application and worker services.
+----------------+--------------+-----------------+--------------+------------------------------+
| Name | Host | Status | Container Id | Tag |
+----------------+--------------+-----------------+--------------+------------------------------+
| haproxy | 111.92.12.41 | Up 6 days | 245a37201207 | 1.0.2 |
| keepalived | 111.92.12.41 | Not Provisioned | N/A | N/A |
| nats | 111.92.12.41 | Up 6 days | 15469a93d96f | 1.0.2 |
| minio | 111.92.12.41 | Up 6 days | 3fd3f97bf25b | RELEASE.2022-11-07T23-47-39Z |
| mariadb | 111.92.12.41 | Up 6 days | 0fa1a0027993 | 1.0.2 |
| opensearch | 111.92.12.41 | Up 6 days | dae308716400 | 1.0.2 |
| zookeeper | 111.92.12.41 | Up 6 days | 4d8f61b4ab17 | 1.0.2 |
| kafka | 111.92.12.41 | Up 6 days | 0dee08cd9c59 | 1.0.2 |
| redis | 111.92.12.41 | Up 6 days | d1eccf90846e | 1.0.2 |
| redis-sentinel | 111.92.12.41 | Up 6 days | 683beb7b913e | 1.0.2 |
+----------------+--------------+-----------------+--------------+------------------------------+
+--------------------------+--------------+-----------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+--------------+-----------+--------------+-------+
| cfx-rda-access-manager | 111.92.12.41 | Up 6 days | e487cdf24b46 | 3.0.9 |
| cfx-rda-resource-manager | 111.92.12.41 | Up 6 days | a7a21a31a26e | 3.0.9 |
| cfx-rda-user-preferences | 111.92.12.41 | Up 6 days | 9306d8da4b5a | 3.0.9 |
| portal-backend | 111.92.12.41 | Up 6 days | 55df761dad1d | 3.0.9 |
| portal-frontend | 111.92.12.41 | Up 6 days | 2183f00efa64 | 3.0.9 |
| rda_api_server | 111.92.12.41 | Up 6 days | 3ba6256d1694 | 3.0.9 |
| rda_asset_dependency | 111.92.12.41 | Up 6 days | d1a8b76bb114 | 3.0.9 |
| rda_collector | 111.92.12.41 | Up 6 days | 441427d2bb1e | 3.0.9 |
| rda_identity | 111.92.12.41 | Up 6 days | 2c1215d9155a | 3.0.9 |
| rda_registry | 111.92.12.41 | Up 6 days | 7358e6ee6298 | 3.0.9 |
| rda_scheduler | 111.92.12.41 | Up 6 days | ee72c66f8c80 | 3.0.9 |
+--------------------------+--------------+-----------+--------------+-------+
+------------+--------------+-----------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+------------+--------------+-----------+--------------+-------+
| rda_worker | 111.92.12.43 | Up 6 days | 88f4916ce18e | 3.0.9 |
| rda_worker | 111.92.12.43 | Up 6 days | 88f491612345 | 3.0.9 |
+------------+--------------+-----------+--------------+-------+
+------------------------------+--------------+-----------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+------------------------------+--------------+-----------+--------------+-------+
| all-alerts-cfx-rda-dataset- | 111.92.12.42 | Up 6 days | 58a75c01c51f | 7.0.5 |
| caas | | | | |
| cfx-rda-alert-ingester | 111.92.12.42 | Up 6 days | bc9a78953b73 | 7.0.5 |
| cfx-rda-alert-processor | 111.92.12.42 | Up 6 days | 28401e5c2570 | 7.0.5 |
| cfx-rda-app-builder | 111.92.12.42 | Up 6 days | be8f100056fd | 7.0.5 |
| cfx-rda-app-controller | 111.92.12.42 | Up 6 days | a7a4ef35097d | 7.0.5 |
| cfx-rda-collaboration | 111.92.12.42 | Up 6 days | d9d980b28a2b | 7.0.5 |
| cfx-rda-configuration- | 111.92.12.42 | Up 6 days | db1a45835e1a | 7.0.5 |
| service | | | | |
| cfx-rda-event-consumer | 111.92.12.42 | Up 6 days | baf09bad3ce1 | 7.0.5 |
| cfx-rda-file-browser | 111.92.12.42 | Up 6 days | 32ccdfca8d8f | 7.0.5 |
| cfx-rda-ingestion-tracker | 111.92.12.42 | Up 6 days | 1030345f2179 | 7.0.5 |
| cfx-rda-irm-service | 111.92.12.42 | Up 6 days | 89d931f7d7b8 | 7.0.5 |
| cfx-rda-ml-config | 111.92.12.42 | Up 6 days | 57fc39489a08 | 7.0.5 |
| cfx-rda-notification-service | 111.92.12.42 | Up 6 days | 408dbebb33c5 | 7.0.5 |
| cfx-rda-reports-registry | 111.92.12.42 | Up 6 days | 3296cba8b3e4 | 7.0.5 |
| cfx-rda-smtp-server | 111.92.12.42 | Up 6 days | 0f9884b6e7c8 | 7.0.5 |
| cfx-rda-webhook-server | 111.92.12.42 | Up 6 days | a4403dee414e | 7.0.5 |
| current-alerts-cfx-rda- | 111.92.12.42 | Up 6 days | d6cc63214103 | 7.0.5 |
| dataset-caas | | | | |
+------------------------------+--------------+-----------+--------------+-------+
Important
Please take RDAF platform's full data backup before performing an upgrade. For more information on RDAF platform's backup and restore commands using rdaf
CLI, please refer at RDAF Platform Backup
Download RDAF Platform & OIA Images
- Login into On-premise docker registry VM as
rdauser
using SSH client and run the below command to download RDAF platform's and OIA (AIOps) application service's updated images.
- Please wait until all of the RDAF platform's and OIA (AIOps) application service's images are downloaded. Run the below command to verify if the images are downloaded appropriately.
Upgrade RDAF deployment CLI on RDAF Platform VM
Please follow and repeat the steps to download and upgrade the rdaf
deployment CLI on RDAF platform VM using the steps outlined under RDAF CLI Upgrade on On-premise docker registry VM section.
Upgrade RDAF Platform & OIA Services
RDAF Platform Services Upgrade:
Run the below command to upgrade RDAF platform's services to version
Once above command is completed, run the below command to verify all of RDAF platform's services are upgraded to the specified version and all of their corresponding containers are in running state.
RDAF Client CLI Upgrade:
Run the below command to upgrade the RDAF client CLI rdac
to latest version.
After rdac
CLI is upgraded, run the below commands to see all of the running RDAF platform's services pods.
+-------+----------------------------------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| App | cfxdimensions-app-access-manager | d412efb99f2e | ccb83d20 | | 4:13:45 | 8 | 31.21 | | |
| App | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de | | 1 day, 18:33:27 | 8 | 31.21 | | |
| App | cfxdimensions-app-resource-manager | ec87d2ee6387 | 33ee28ca | | 4:13:31 | 8 | 31.21 | | |
| App | user-preferences | 520bca813ddf | f4ca7d44 | | 4:13:14 | 8 | 31.21 | | |
| Infra | api-server | 0656b4230f44 | 6d4d40ab | | 0:33:06 | 8 | 31.21 | | |
| Infra | collector | 6336341682ad | 042af0af | | 4:11:19 | 8 | 31.21 | | |
| Infra | registry | cae649622fba | 4e4c4a4d | | 4:11:03 | 8 | 31.21 | | |
| Infra | scheduler | 3ab379305be1 | b2bb9915 | *leader* | 4:10:59 | 8 | 31.21 | | |
+-------+----------------------------------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
Run the below command to verify functional health of each platform's service and verify all of their status is in OK state.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_infra | api-server | 0656b4230f44 | 6d4d40ab | | service-status | ok | |
| rda_infra | api-server | 0656b4230f44 | 6d4d40ab | | minio-connectivity | ok | |
| rda_app | asset-dependency | e006dfd39d9b | 9f02a8f1 | | service-status | ok | |
| rda_app | asset-dependency | e006dfd39d9b | 9f02a8f1 | | minio-connectivity | ok | |
| rda_app | authenticator | 1782a79e36c5 | adda9bc0 | | service-status | ok | |
| rda_app | authenticator | 1782a79e36c5 | adda9bc0 | | minio-connectivity | ok | |
| rda_app | authenticator | 1782a79e36c5 | adda9bc0 | | DB-connectivity | ok | |
| rda_app | cfxdimensions-app-access-manager | d412efb99f2e | ccb83d20 | | service-status | ok | |
| rda_app | cfxdimensions-app-access-manager | d412efb99f2e | ccb83d20 | | minio-connectivity | ok | |
| rda_app | cfxdimensions-app-access-manager | d412efb99f2e | ccb83d20 | | service-dependency:registry | ok | 1 pod(s) found for registry |
| rda_app | cfxdimensions-app-access-manager | d412efb99f2e | ccb83d20 | | service-initialization-status | ok | |
| rda_app | cfxdimensions-app-access-manager | d412efb99f2e | ccb83d20 | | DB-connectivity | ok | | |
| rda_app | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de | | service-status | ok | |
| rda_app | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de | | minio-connectivity | ok | |
| rda_app | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de | | service-initialization-status | ok | |
| rda_app | cfxdimensions-app-notification-service | 34c2ea6675d5 | 93ac81de | | DB-connectivity | ok | |
| rda_app | cfxdimensions-app-resource-manager | ec87d2ee6387 | 33ee28ca | | service-status | ok | |
| rda_app | cfxdimensions-app-resource-manager | ec87d2ee6387 | 33ee28ca | | minio-connectivity | ok | |
| rda_app | cfxdimensions-app-resource-manager | ec87d2ee6387 | 33ee28ca | | service-dependency:registry | ok | 1 pod(s) found for registry |
| rda_app | cfxdimensions-app-resource-manager | ec87d2ee6387 | 33ee28ca | | service-dependency:cfxdimensions-app-access-manager | ok | 1 pod(s) found for cfxdimensions-app-access-manager |
| rda_app | cfxdimensions-app-resource-manager | ec87d2ee6387 | 33ee28ca | | service-initialization-status | ok | |
| rda_app | cfxdimensions-app-resource-manager | ec87d2ee6387 | 33ee28ca | | DB-connectivity | ok | |
| rda_infra | collector | 6336341682ad | 042af0af | | service-status | ok | |
| rda_infra | collector | 6336341682ad | 042af0af | | minio-connectivity | ok | |
| rda_infra | collector | 6336341682ad | 042af0af | | opensearch-connectivity:default | ok | |
| rda_infra | scheduler | 3ab379305be1 | b2bb9915 | | service-status | ok | |
| rda_infra | scheduler | 3ab379305be1 | b2bb9915 | | minio-connectivity | ok | |
| rda_infra | scheduler | 3ab379305be1 | b2bb9915 | | DB-connectivity | ok | |
| rda_app | user-preferences | 520bca813ddf | f4ca7d44 | | service-status | ok | |
| rda_app | user-preferences | 520bca813ddf | f4ca7d44 | | minio-connectivity | ok | |
| rda_app | user-preferences | 520bca813ddf | f4ca7d44 | | service-dependency:registry | ok | 1 pod(s) found for registry |
| rda_app | user-preferences | 520bca813ddf | f4ca7d44 | | service-initialization-status | ok | |
| rda_app | user-preferences | 520bca813ddf | f4ca7d44 | | DB-connectivity | ok | | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
RDAF Worker Service Upgrade:
Run the below command to upgrade RDAF worker services to latest version.
After upgrading the RDAF worker service using the above command, run the below command to verify it's running status and the version.
+------------+--------------+-------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+------------+--------------+-------------+--------------+-------+
| rda_worker | 111.92.12.60 | Up 1 minute | 4ce2a8f13d16 | 3.1.0 |
+------------+--------------+-------------+--------------+-------+
Run the below command to verify the functional health of the each RDAF worker service and verify that all of their status is in OK state.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_infra | api-server | 0656b4230f44 | 6d4d40ab | | service-status | ok | |
...
...
| rda_infra | worker | 4ce2a8f13d16 | d627124d | rda-site-01 | service-status | ok | |
| rda_infra | worker | 4ce2a8f13d16 | d627124d | rda-site-01 | minio-connectivity | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
Create Kafka Topics for OIA Application Services:
Download the below script and execute it on where rdafk8s setup
was run during the initial RDAF platform setup. Please make sure the file /opt/rdaf/rdaf.cfg
exist which is required for the below script to execute successfully.
RDAF OIA Application Services Upgrade:
Run the below command to upgrade the RDAF OIA (AIOps) application services to latest version.
Once above command is completed, run the below command to verify all of the RDAF OIA application services are upgraded to the specified version and all of their corresponding containers are in running state.
Wait for 3 to 5 minutes and run the below command to verify the functional health of each RDAF OIA application service and verify all of their status is in OK state.
2.2. Upgrade from 7.2.0.x to 7.2.1.1
RDAF Platform: From 3.2.0.3 to 3.2.1.3
OIA (AIOps) Application: From 7.2.0.3 to 7.2.1.1/7.2.1.5
RDAF Deployment rdaf & rdafk8s
CLI: From 1.1.7 to 1.1.8
RDAF Client rdac
CLI: From 3.2.0.3 to 3.2.1.3
2.2.1. Upgrade Prerequisites
Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.
Important
Please make sure full backup of the RDAF platform system is completed before performing the upgrade.
Kubernetes: Please run the below backup command to take the backup of application data.
Non-Kubernetes: Please run the below backup command to take the backup of application data. Note: Please make sure this backup-dir is mounted across all infra,cli vms.Run the below command on RDAF Management system and make sure the Kubernetes PODs are NOT in restarting
mode (it is applicable to only Kubernetes environment)
- Verify that RDAF deployment
rdaf & rdafk8s
CLI's version is 1.1.7 on the VM where CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.
- On-premise docker registry service version is 1.0.2
- RDAF Infrastructure services version is 1.0.2 (
rda-nats
service version is1.0.2.1
andrda-minio
service version isRELEASE.2022-11-11T03-44-20Z
)
- RDAF Platform services version is 3.2.0.3
- RDAF OIA Application services version is 7.2.0.3 (
rda-event-consumer
service version is7.2.0.5
)
Login into the VM where rdaf & rdafk8s
deployment CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.
- Download the RDAF Deployment CLI's newer version 1.1.8 bundle.
- Upgrade the
rdaf & rdafk8s
CLI to version 1.1.8
- Verify the installed
rdaf & rdafk8s
CLI version is upgraded to 1.1.8
Download the below python script which is going to be used to identify K8s POD names for each RDA Fabric service POD Ids. Skip this step if this script was already downloaded.
Download the below upgrade python script.
Please run the below python upgrade script. It creates a kafka topic called fsm-events
, creates /opt/rdaf/config/network_config/policy.json file
, and adds rda-fsm
service to values.yaml file.
Important
Please make sure above upgrade script is executed before moving to next step.
- Download the RDAF Deployment CLI's newer version 1.1.8 bundle and copy it to RDAF management VM on which
rdaf & rdafk8s
deployment CLI was installed.
For RHEL OS Environment
For Ubuntu OS Environment
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf & rdafk8s
CLI to version 1.1.8
- Verify the installed
rdaf & rdafk8s
CLI version
Download the below upgrade script and copy it to RDAF management VM on which rdaf & rdafk8s
deployment CLI was installed.
Please run the downloaded python upgrade script. It creates a kafka topic called fsm-events
, creates /opt/rdaf/config/network_config/policy.json file
, and adds rda-fsm
service to values.yaml file.
Important
Please make sure above upgrade script is executed before moving to next step.
- Download the RDAF Deployment CLI's newer version 1.1.8 bundle
- Upgrade the
rdaf
CLI to version 1.1.8
- Verify the installed
rdaf
CLI version is upgraded to 1.1.8
- To stop application services, run the below command. Wait until all of the services are stopped.
- To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
- To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
Run the below RDAF command to check infra status
+----------------+--------------+-----------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+----------------+--------------+-----------------+--------------+-------+
| haproxy | 192.168.131.41 | Up 2 weeks | ee9d25dc2276 | 1.0.2 |
| | | | | |
| haproxy | 192.168.131.42 | Up 2 weeks | e6ad57ac421d | 1.0.2 |
| | | | | |
| keepalived | 192.168.131.41 | active | N/A | N/A |
| | | | | |
| keepalived | 192.168.131.42 | active | N/A | N/A |
+----------------+--------------+-----------------+--------------+-------+
Run the below RDAF command to check infra healthcheck status
+----------------+-----------------+--------+----------------------+--------------+--------------+
| Name | Check | Status | Reason | Host | Container Id |
+----------------+-----------------+--------+----------------------+--------------+--------------+
| haproxy | Port Connection | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Service Status | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Firewall Port | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Port Connection | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| haproxy | Service Status | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| haproxy | Firewall Port | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| keepalived | Service Status | OK | N/A | 192.168.107.63 | N/A |
| keepalived | Service Status | OK | N/A | 192.168.107.64 | N/A |
| nats | Port Connection | OK | N/A | 192.168.107.63 | f57ed825681b |
| nats | Service Status | OK | N/A | 192.168.107.63 | f57ed825681b |
+----------------+-----------------+--------+----------------------+--------------+--------------+
Note
Please take backup of /opt/rdaf/deployment-scripts/values.yaml
Download the below upgrade python script.
Please run the below python upgrade script. It creates a few topics for applying config changes to existing topics in HA setups
. creates /opt/rdaf/config/network_config/policy.json file
, adding fsm service
to values.yaml file.
Important
Please make sure above upgrade script is executed before moving to next step.
- Download the RDAF Deployment CLI's newer version 1.1.8 bundle and copy it to RDAF management VM on which
rdaf
deployment CLI was installed.
For RHEL OS Environment
For Ubuntu OS Environment
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.1.8
- Verify the installed
rdaf & rdafk8s
CLI version
Download the below upgrade script and copy it to RDAF management VM on which rdaf
deployment CLI was installed.
Please run the below python upgrade script. It creates a few topics for applying config changes to existing topics in HA setups
. creates /opt/rdaf/config/network_config/policy.json file
, adding fsm service
to values.yaml file.
Important
Please make sure above upgrade script is executed before moving to next step.
2.2.2. Download the new Docker Images
Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.
Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.
Please make sure 3.2.1.3 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-registry
- rda-rda-scheduler
- rda-collector
- rda-stack-mgr
- rda-identity
- rda-fsm
- rda-access-manager
- rda-resource-manager
- rda-user-preferences
- onprem-portal
- onprem-portal-nginx
- rda-worker-all
- onprem-portal-dbinit
- cfxdx-nb-nginx-all
- rda-event-gateway
- rdac
- rdac-full
Please make sure 7.2.1.1 image tag is downloaded for the below RDAF OIA Application services.
- rda-app-controller
- rda-alert-processor
- rda-file-browser
- rda-smtp-server
- rda-ingestion-tracker
- rda-reports-registry
- rda-ml-config
- rda-event-consumer
- rda-webhook-server
- rda-irm-service
- rda-alert-ingester
- rda-collaboration
- rda-notification-service
- rda-configuration-service
Please make sure 7.2.1.5 image tag is downloaded for the below RDAF OIA Application services.
- rda-smtp-server
- rda-event-consumer
- rda-webhook-server
- rda-collaboration
- rda-configuration-service
- rda-alert-ingester
Downloaded Docker images are stored under the below path.
/opt/rdaf/data/docker/registry/v2
Run the below command to check the filesystem's disk usage on which docker images are stored.
Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.
2.2.3.Upgrade Steps
2.2.3.1 Upgrade RDAF Platform Services
Step-1: Run the below command to initiate upgrading RDAF Platform services.
As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating
state and newer version PODs into Pending
state.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating
state.
Step-3: Run the below command to put all Terminating
RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance
command that required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.
Step-6: Run the below command to delete the Terminating
RDAF platform service PODs
for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.
Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.1.3
version.
+----------------------+--------------+---------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+----------------------+--------------+---------------+--------------+---------+
| rda-api-server | 192.168.131.45 | Up 2 Days ago | dde8ab1f9331 | 3.2.1.3 |
| rda-api-server | 192.168.131.44 | Up 2 Days ago | e6ece7235e72 | 3.2.1.3 |
| rda-registry | 192.168.131.45 | Up 2 Days ago | a577766fb8b2 | 3.2.1.3 |
| rda-registry | 192.168.131.44 | Up 2 Days ago | 1aecc089b0c3 | 3.2.1.3 |
| rda-identity | 192.168.131.45 | Up 2 Days ago | fea1c0ef7263 | 3.2.1.3 |
| rda-identity | 192.168.131.44 | Up 2 Days ago | 2a48f402f678 | 3.2.1.3 |
| rda-fsm | 192.168.131.45 | Up 2 Days ago | 5006c8a6e5f3 | 3.2.1.3 |
| rda-fsm | 192.168.131.44 | Up 2 Days ago | 199cac791a90 | 3.2.1.3 |
| rda-access-manager | 192.168.131.44 | Up 2 Days ago | e20495c61be2 | 3.2.1.3 |
| .... | .... | .... | .... | .... |
+----------------------+--------------+---------------+--------------+---------+
Run the below command to check rda-fsm
service is up and running and also verify that one of the rda-scheduler
service is elected as a leader under Site
column.
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | cf4c4f37c47a | 0633b451 | | service-status | ok | |
| rda_app | alert-ingester | cf4c4f37c47a | 0633b451 | | minio-connectivity | ok | |
| rda_app | alert-ingester | cf4c4f37c47a | 0633b451 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | cf4c4f37c47a | 0633b451 | | service-initialization-status | ok | |
| rda_app | alert-ingester | cf4c4f37c47a | 0633b451 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app | alert-ingester | 7b9f1370e018 | f348532b | | service-status | ok | |
| rda_app | alert-ingester | 7b9f1370e018 | f348532b | | minio-connectivity | ok | |
| rda_app | alert-ingester | 7b9f1370e018 | f348532b | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 7b9f1370e018 | f348532b | | service-initialization-status | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.2.3.2 Upgrade rdac
CLI
Run the below command to upgrade the rdac
CLI
2.2.3.3 Upgrade RDA Worker Services
Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating
state.
Step-3: Run the below command to put all Terminating
RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance
command that is required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.
Step-6: Run the below command to delete the Terminating
RDAF worker service PODs
for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds between each RDAF worker service upgrade by repeating above steps from Step-2 to Step-6 for rest of the RDAF worker service PODs.
Step-6: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.
+------------+--------------+---------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+------------+--------------+---------------+--------------+---------+
| rda-worker | 192.168.131.50 | Up 2 Days ago | 497059c45d6e | 3.2.1.3 |
| rda-worker | 192.168.131.49 | Up 2 Days ago | 434b2ca40ed8 | 3.2.1.3 |
| .... | .... | .... | .... | .... |
+------------+--------------+---------------+--------------+---------+
Step-7: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.
2.2.3.4 Upgrade OIA Application Services
Step-1: Run the below commands to initiate upgrading RDAF OIA Application services. First command upgrades the specified services from 7.2.0.3 to 7.2.1.1 and following command upgrades rest of the services from 7.2.0.3 to 7.2.1.5 and 7.2.1.6 respectively
rdafk8s app upgrade OIA --tag 7.2.1.1 --service rda-app-controller --service rda-alert-processor --service rda-file-browser --service rda-ingestion-tracker --service rda-reports-registry --service rda-ml-config --service rda-irm-service --service rda-notification-service
rdafk8s app upgrade OIA --tag 7.2.1.5 --service rda-smtp-server --service rda-event-consumer --service rda-webhook-server --service rda-collaboration --service rda-configuration-service
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each OIA application service is in Terminating state.
Step-3: Run the below command to put all Terminating OIA application service PODs into maintenance mode. It will list all of the POD Ids of OIA application services along with rdac maintenance
command that are required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the OIA application services.
Step-6: Run the below command to delete the Terminating OIA application service PODs
for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the OIA application service PODs.
Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.1.1 or 7.2.1.5 version.
+---------------------------+--------------+---------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+---------------------------+--------------+---------------+--------------+---------+
| rda-alert-ingester | 192.168.131.49 | Up 5 Days ago | b323998abd15 | 7.2.1.1 |
| rda-alert-ingester | 192.168.131.50 | Up 5 Days ago | 710f262e27aa | 7.2.1.1 |
| rda-alert-processor | 192.168.131.47 | Up 5 Days ago | ec1c53d94439 | 7.2.1.1 |
| rda-alert-processor | 192.168.131.46 | Up 5 Days ago | deee4db62708 | 7.2.1.1 |
| rda-app-controller | 192.168.131.49 | Up 5 Days ago | ef96deb9adda | 7.2.1.1 |
| rda-app-controller | 192.168.131.50 | Up 5 Days ago | 6880b5632adb | 7.2.1.1 |
| rda-collaboration | 192.168.131.49 | Up 2 Days ago | cc1b1c882250 | 7.2.1.5 |
| rda-collaboration | 192.168.131.50 | Up 2 Days ago | 13be7e8bfa3f | 7.2.1.5 |
+---------------------------+--------------+---------------+--------------+---------+
Step-7: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service
has leader status under Site
column.
Run the below command to check if all services has ok status and does not throw any failure messages.
Warning
For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
Run the below command to initiate upgrading RDAF Platform services.
Please wait till all of the new platform service are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.1.3 version.+--------------------------+--------------+------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+--------------+------------+--------------+---------+
| cfx-rda-access-manager | 192.168.107.60 | Up 6 hours | 80dac9d727a3 | 3.2.1.3 |
| cfx-rda-resource-manager | 192.168.107.60 | Up 6 hours | 68534a5c1d4c | 3.2.1.3 |
| cfx-rda-user-preferences | 192.168.107.60 | Up 6 hours | 78405b639915 | 3.2.1.3 |
| portal-backend | 192.168.107.60 | Up 6 hours | 636e6968f661 | 3.2.1.3 |
| portal-frontend | 192.168.107.60 | Up 6 hours | 2fd426bd6aa2 | 3.2.1.3 |
| rda_api_server | 192.168.107.60 | Up 6 hours | e0994b366f98 | 3.2.1.3 |
| rda_asset_dependency | 192.168.107.60 | Up 6 hours | 07610621408c | 3.2.1.3 |
| rda_collector | 192.168.107.60 | Up 6 hours | 467d6b3d13f8 | 3.2.1.3 |
| rda_fsm | 192.168.107.60 | Up 6 hours | e32de86fe341 | 3.2.1.3 |
| rda_identity | 192.168.107.60 | Up 6 hours | 45136d89b2cf | 3.2.1.3 |
| rda_registry | 192.168.107.60 | Up 6 hours | 334d7d4cfa41 | 3.2.1.3 |
| rda_scheduler | 192.168.107.60 | Up 6 hours | acf5a9ab556a | 3.2.1.3 |
+--------------------------+--------------+------------+--------------+---------+
Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-status | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-status | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=1, Brokers=[2, 3, 1] |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
- Upgrade
rdac
CLI
Run the below command to upgrade the rdac
CLI
- Upgrade RDA Worker Services
Please run the below command to initiate upgrading the RDA Worker service PODs.
Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.+------------+--------------+-----------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+------------+--------------+-----------+--------------+---------+
| rda_worker | 192.168.107.61 | Up 2 days | d951118ee757 | 3.2.1.3 |
| rda_worker | 192.168.107.62 | Up 2 days | f7033a72f013 | 3.2.1.3 |
+------------+--------------+-----------+--------------+---------+
Run the below commands to initiate upgrading RDAF OIA Application services. First command upgrades the specified services from 7.2.0.3 to 7.2.1.1 and second command upgrades rest of the services from 7.2.0.3 to 7.2.1.5
rdaf app upgrade OIA --tag 7.2.1.1 --service cfx-rda-app-controller --service cfx-rda-alert-processor --service cfx-rda-file-browser --service cfx-rda-ingestion-tracker --service cfx-rda-reports-registry --service cfx-rda-ml-config --service cfx-rda-irm-service --service cfx-rda-notification-service
rdaf app upgrade OIA --tag 7.2.1.5 --service cfx-rda-smtp-server --service cfx-rda-event-consumer --service cfx-rda-webhook-server --service cfx-rda-collaboration --service cfx-rda-configuration-service
Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.1.1 or 7.2.1.5 version.
+-------------------------------+--------------+-----------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+-------------------------------+--------------+-----------+--------------+---------+
| cfx-rda-alert-ingester | 192.168.107.66 | Up 2 days | 79d6756db639 | 7.2.1.5 |
| cfx-rda-alert-ingester | 192.168.107.67 | Up 2 days | 9a0775246a0f | 7.2.1.5 |
| cfx-rda-alert-processor | 192.168.107.66 | Up 2 days | 057552584cfe | 7.2.1.1 |
| cfx-rda-alert-processor | 192.168.107.67 | Up 2 days | 787f0cb42734 | 7.2.1.1 |
| cfx-rda-app-controller | 192.168.107.66 | Up 2 days | 07f406e984ad | 7.2.1.1 |
| cfx-rda-app-controller | 192.168.107.67 | Up 2 days | 0b27802473c1 | 7.2.1.1 |
| cfx-rda-collaboration | 192.168.107.66 | Up 2 days | 7322550c3cee | 7.2.1.5 |
+-------------------------------+--------------+-----------+--------------+---------+
2.2.4.Post Upgrade Steps
-
(Optional) Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select
oia_l1_l2_bundle
and Click on deploy action -
Enable ML experiments manually if any experiments are configured (Organization --> Configuration --> ML Experiments)
-
(Optional) Add the following to All Incident Mappings.
## prefarbly after projectId field's json block
{
"to": "notificationId",
"from": "notificationId"
},
- (Optional) New option called
skip_retry_on_keywords
is added within the Incident mapper, which will allow the user to control when to skip retry attempt while making an API call during create or update ticket operations on external ITSM system. (Ex: ServiceNow).
In the below example, if the API error response contains serviceIdentifier is not available
or Ticket is already in inactive state no update is allowed
message, it will skip retrying the API call as these are expected errors and retrying will not API successful.
{
"to": "skip_retry_on_keywords",
"func": {
"evaluate": {
"expr": "'[\"serviceIdentifier is not available\",\"Ticket is already in Inactive state no update is allowed\"]'"
}
}
}
2.3. Upgrade from 7.2.1.x to 7.2.2
2.3.1. Pre-requisites
Below are the pre-requisites which need to be in place before upgrading the OIA (AIOps) application services.
-
RDAF Deployment CLI Version: 1.1.8
-
RDAF Infrastructure Services Tag Version: 1.0.2,1.0.2.1(nats)
-
RDAF Core Platform & Worker Services Tag Version: 3.2.1 / 3.2.1.x
-
RDAF Client (RDAC) Tag Version: 3.2.1 / 3.2.1.x
-
OIA Services Tag Version: 7.2.1 / 7.2.1.x
-
CloudFabrix recommends taking VMware VM snapshots where AIOps solution is deployed
Important
Applicable only if FSM is configured for ITSM ticketing:
Before proceeding with the upgrade, please make sure to disable the below Service Blueprints.
- Create Ticket
- Update Ticket
- Resolve Ticket
- Read Alert Stream
- Read Incident Stream
- Read ITSM ticketing Inbound Notifications
Warning
Make sure all of the above pre-requisites are met before proceeding with the upgrade process.
Warning
Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
- Download the RDAF Deployment CLI's newer version 1.1.9 bundle
- Upgrade the
rdaf
CLI to version 1.1.9
- Verify the installed
rdaf
CLI version is upgraded to 1.1.9
- Download the RDAF Deployment CLI's newer version 1.1.9 bundle and copy it to RDAF management VM on which
rdaf
deployment CLI was installed.
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.1.9
- Verify the installed
rdaf
CLI version
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.1.9
- Verify the installed
rdaf
CLI version
- To stop OIA (AIOps) application services, run the below command. Wait until all of the services are stopped.
- To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
- To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
- Upgrade kafka using below command
Run the below RDAF command to check infra status
+----------------+--------------+-----------------+--------------+------------------------------+
| Name | Host | Status | Container Id | Tag |
+----------------+--------------+-----------------+--------------+------------------------------+
| haproxy | 192.168.107.40 | Up 2 weeks | 92875cebe689 | 1.0.2 |
| keepalived | 192.168.107.40 | Not Provisioned | N/A | N/A |
| nats | 192.168.107.41 | Up 2 weeks | e365e0b794c7 | 1.0.2.1 |
| minio | 192.168.107.41 | Up 2 weeks | 900c8b078059 | RELEASE.2022-11-11T03-44-20Z |
| mariadb | 192.168.107.41 | Up 2 weeks | c549e07c2688 | 1.0.2 |
| opensearch | 192.168.107.41 | Up 2 weeks | 783204d75ba9 | 1.0.2 |
| zookeeper | 192.168.107.41 | Up 2 weeks | f51138ff8a95 | 1.0.2 |
| kafka | 192.168.107.41 | Up 4 days | 255020d998c9 | 1.0.2 |
| redis | 192.168.107.41 | Up 2 weeks | 5d929327121d | 1.0.2 |
| redis-sentinel | 192.168.107.41 | Up 2 weeks | 4a5fdde49a21 | 1.0.2 |
+----------------+--------------+-----------------+--------------+------------------------------+
Run the below RDAF command to check infra healthcheck status
+----------------+-----------------+--------+----------------------+--------------+--------------+
| Name | Check | Status | Reason | Host | Container Id |
+----------------+-----------------+--------+----------------------+--------------+--------------+
| haproxy | Port Connection | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Service Status | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Firewall Port | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Port Connection | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| haproxy | Service Status | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| haproxy | Firewall Port | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| keepalived | Service Status | OK | N/A | 192.168.107.63 | N/A |
| keepalived | Service Status | OK | N/A | 192.168.107.64 | N/A |
| nats | Port Connection | OK | N/A | 192.168.107.63 | f57ed825681b |
| nats | Service Status | OK | N/A | 192.168.107.63 | f57ed825681b |
+----------------+-----------------+--------+----------------------+--------------+--------------+
-
Run the below python upgrade script. It is for applying the below configuration & settings.
- Create kafka topics and configure the topic message max size to 8mb
- Create kafka-external user in config.json.
- Add new alert-processor companion service settings in values.yaml
- Configure and apply security index purge policy for Opensearch
Important
Take a backup of /opt/rdaf/deployment-scripts/values.yaml before running the below upgrade script.
Important
Make sure above upgrade script is executed before moving to next step.
2.3.2. Download the new Docker Images
Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.
Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.
Make sure 3.2.2 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-registry
- rda-rda-scheduler
- rda-collector
- rda-stack-mgr
- rda-identity
- rda-fsm
- rda-access-manager
- rda-resource-manager
- rda-user-preferences
- onprem-portal
- onprem-portal-nginx
- rda-worker-all
- onprem-portal-dbinit
- cfxdx-nb-nginx-all
- rda-event-gateway
- rdac
- rdac-full
Make sure 7.2.2 image tag is downloaded for the below RDAF OIA Application services.
- rda-app-controller
- rda-alert-processor
- rda-file-browser
- rda-smtp-server
- rda-ingestion-tracker
- rda-reports-registry
- rda-ml-config
- rda-event-consumer
- rda-webhook-server
- rda-irm-service
- rda-alert-ingester
- rda-collaboration
- rda-notification-service
- rda-configuration-service
Downloaded Docker images are stored under the below path.
/opt/rdaf/data/docker/registry/v2
Run the below command to check the filesystem's disk usage on which docker images are stored.
Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.
2.3.3. Upgrade Services
2.3.3.1 Upgrade RDAF Platform Services
Run the below command to initiate upgrading RDAF Platform services.
Wait till all of the new platform service are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.2 version.+---------------------+--------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+---------------------+--------------+------------+--------------+-------+
| rda_api_server | 192.168.107.60 | Up 4 hours | 0da7ebeadceb | 3.2.2 |
| rda_registry | 192.168.107.60 | Up 4 hours | 841a4e03447d | 3.2.2 |
| rda_scheduler | 192.168.107.60 | Up 4 hours | 806af221a299 | 3.2.2 |
| rda_collector | 192.168.107.60 | Up 4 hours | 9ae8da4d2182 | 3.2.2 |
| rda_asset_dependenc | 192.168.107.60 | Up 4 hours | e96cf642b2d6 | 3.2.2 |
| y | | | | |
| rda_identity | 192.168.107.60 | Up 4 hours | 2a57ce63a756 | 3.2.2 |
| rda_fsm | 192.168.107.60 | Up 4 hours | 2b645a75b5f0 | 3.2.2 |
+--------------------------+--------------+------------+--------------+--+
2.3.3.2 Upgrade RDAC cli
Run the below command to upgrade the rdac
CLI
Run the below command to verify that one of the scheduler service
is elected as a leader under Site column.
+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App | fsm | True | 8b5dfca4cce9 | c0a8bbd7 | | 7:33:16 | 8 | 31.21 | | |
| App | ingestion-tracker | True | d37e78507693 | e1bd1405 | | 7:21:16 | 8 | 31.21 | | |
| App | ml-config | True | 0c73604632bc | 65594689 | | 7:22:02 | 8 | 31.21 | | |
| App | reports-registry | True | be82a9e704a2 | 567f1275 | | 7:25:23 | 8 | 31.21 | | |
| App | smtp-server | True | 08a8dd347660 | 06242bab | | 7:23:35 | 8 | 31.21 | | |
| App | user-preferences | True | fc7a4a5a0591 | 53dce7ca | | 7:32:25 | 8 | 31.21 | | |
| App | webhook-server | True | 20a2afb33b6c | fdb1eb21 | | 7:23:53 | 8 | 31.21 | | |
| Infra | api-server | True | b1e7105b231e | 33f6ed2c | | 2:04:53 | 8 | 31.21 | | |
| Infra | collector | True | f5abb5cac9a5 | eb17ce02 | | 3:50:51 | 8 | 31.21 | | |
| Infra | registry | True | ce73263c7828 | 8cda9974 | | 7:34:05 | 8 | 31.21 | | |
| Infra | scheduler | True | d9d62c1f1bb7 | 96047389 | *leader* | 7:33:59 | 8 | 31.21 | | |
| Infra | worker | True | ba1198f05f6b | afd229a8 | rda-site-01 | 7:26:20 | 8 | 31.21 | 7 | 109 |
+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-status | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-status | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=1, Brokers=[2, 3, 1] |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.3.3.3 Upgrade RDA Worker Services
Run the below command to initiate upgrading the RDA worker service(s).
Tip
If the RDA worker is deployed in http proxy environment, add the required environment variables for http proxy settings in /opt/rdaf/deployment-scripts/values.yaml
under rda_worker
section. Below is the sample http proxy configuration for worker service.
rda_worker:
mem_limit: 8G
memswap_limit: 8G
privileged: false
environment:
RDA_ENABLE_TRACES: 'no'
RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
http_proxy: "http://user:password@192.168.122.107:3128"
https_proxy: "http://user:password@192.168.122.107:3128"
HTTP_PROXY: "http://user:password@192.168.122.107:3128"
HTTPS_PROXY: "http://user:password@192.168.122.107:3128
Wait for 120 seconds to let the newer version of RDA worker services join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA worker services.
+------------+--------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+------------+--------------+------------+--------------+-------+
| rda_worker | 192.168.107.60 | Up 4 hours | d968c908d3e3 | 3.2.2 |
+------------+--------------+------------+--------------+-------+
2.3.3.4 Upgrade OIA/AIA Application Services
Run the below commands to initiate upgrading RDAF OIA/AIA Application services
Wait till all of the new OIA/AIA application services are in Running state and run the below command to verify their status and make sure they are running with 7.2.2 version. Check the new service cfx-rda-alert-processor-companion
is deployed. Make sure all OIA/AIA services are up with the new tag.
+-----------------------------------+--------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+-----------------------------------+--------------+------------+--------------+-------+
| cfx-rda-app-controller | 192.168.107.60 | Up 3 hours | 017692a218b8 | 7.2.2 |
| cfx-rda-reports-registry | 192.168.107.60 | Up 3 hours | be82a9e704a2 | 7.2.2 |
| cfx-rda-notification-service | 192.168.107.60 | Up 3 hours | 42d3c8c4861c | 7.2.2 |
| cfx-rda-file-browser | 192.168.107.60 | Up 3 hours | 46b9dedab4b0 | 7.2.2 |
| cfx-rda-configuration-service | 192.168.107.60 | Up 3 hours | 6bef9741ff46 | 7.2.2 |
| cfx-rda-alert-ingester | 192.168.107.60 | Up 3 hours | 13975b9efe7d | 7.2.2 |
| cfx-rda-webhook-server | 192.168.107.60 | Up 3 hours | 20a2afb33b6c | 7.2.2 |
| cfx-rda-smtp-server | 192.168.107.60 | Up 3 hours | 08a8dd347660 | 7.2.2 |
| cfx-rda-event-consumer | 192.168.107.60 | Up 3 hours | b0b62c88064a | 7.2.2 |
| cfx-rda-alert-processor | 192.168.107.60 | Up 3 hours | ab24dcbd6e3a | 7.2.2 |
| cfx-rda-irm-service | 192.168.107.60 | Up 3 hours | 11c92a206eaa | 7.2.2 |
| cfx-rda-ml-config | 192.168.107.60 | Up 3 hours | 0c73604632bc | 7.2.2 |
| cfx-rda-collaboration | 192.168.107.60 | Up 3 hours | a5cfe5b681bb | 7.2.2 |
| cfx-rda-ingestion-tracker | 192.168.107.60 | Up 3 hours | d37e78507693 | 7.2.2 |
| cfx-rda-alert-processor-companion | 192.168.107.60 | Up 3 hours | b74d82710af9 | 7.2.2 |
+-----------------------------------+--------------+------------+--------------+-------+
cfxdimensions-app-irm_service
is elected as a leader under Site column.
+-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------|
| App | alert-ingester | True | 13975b9efe7d | dd32fdef | | 12:07:37 | 8 | 31.21 | | |
| App | alert-processor | True | ab24dcbd6e3a | a980d44e | | 12:06:10 | 8 | 31.21 | | |
| App | alert-processor-companion | True | b74d82710af9 | 8f37b360 | | 12:04:19 | 8 | 31.21 | | |
| App | asset-dependency | True | 83c5d941f3a6 | f17cc305 | | 12:16:59 | 8 | 31.21 | | |
| App | authenticator | True | fb82e1664219 | b6f19086 | | 12:16:47 | 8 | 31.21 | | |
| App | cfx-app-controller | True | 017692a218b8 | 55015d69 | | 12:09:04 | 8 | 31.21 | | |
| App | cfxdimensions-app-access-manager | True | 87871b87d45e | b0465aa5 | | 12:16:19 | 8 | 31.21 | | |
| App | cfxdimensions-app-collaboration | True | a5cfe5b681bb | c5b40c98 | | 12:05:05 | 8 | 31.21 | | |
| App | cfxdimensions-app-file-browser | True | 46b9dedab4b0 | 3bcc6bc5 | | 12:08:13 | 8 | 31.21 | | |
| App | cfxdimensions-app-irm_service | True | 11c92a206eaa | 851f07b7 | *leader* | 12:05:48 | 8 | 31.21 | | |
| App | cfxdimensions-app-notification-service | True | 42d3c8c4861c | 891ab559 | | 12:08:31 | 8 | 31.21 | | |
| App | cfxdimensions-app-resource-manager | True | a35dd8127434 | 29b57c51 | | 12:16:08 | 8 | 31.21 | | |
+-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------+
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_app | alert-ingester | 13975b9efe7d | dd32fdef | | service-status | ok | |
| rda_app | alert-ingester | 13975b9efe7d | dd32fdef | | minio-connectivity | ok | |
| rda_app | alert-ingester | 13975b9efe7d | dd32fdef | | service-dependency:configuration-service | ok | 1 pod(s) found for configuration-service |
| rda_app | alert-ingester | 13975b9efe7d | dd32fdef | | service-initialization-status | ok | |
| rda_app | alert-ingester | 13975b9efe7d | dd32fdef | | kafka-connectivity | ok | Cluster=oDO7X5AZTh-78HgTt0WbrA, Broker=1, Brokers=[1] |
| rda_app | alert-processor | ab24dcbd6e3a | a980d44e | | service-status | ok | |
| rda_app | alert-processor | ab24dcbd6e3a | a980d44e | | minio-connectivity | ok | |
| rda_app | alert-processor | ab24dcbd6e3a | a980d44e | | service-dependency:cfx-app-controller | ok | 1 pod(s) found for cfx-app-controller |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
Note
Run the rdaf prune_images
command on to cleanup old docker images.
2.3.4. Post Upgrade Steps
1. Download the script from below path to migrate the UI-Icon URL from private to Public
Tip
This script run is an optional step to perform only if (1) white labeling customization was done on the Login page with an uploaded image before the version upgrade, (2) you are experiencing that the custom image is no longer showing up at the Login page after the upgrade.
- Copy the above script to
rda_identity
platform service container. Run the below command to get the container-id forrda_identity
and the host IP on which it is running.
+--------------------------+--------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+--------------+------------+--------------+-------+
| rda_api_server | 192.168.107.40 | Up 7 hours | 6540e670fdb9 | 3.2.2 |
| rda_registry | 192.168.107.40 | Up 7 hours | 98bead8d0599 | 3.2.2 |
....
| rda_identity | 192.168.107.40 | Up 7 hours | 67940390d61f | 3.2.2 |
| rda_fsm | 192.168.107.40 | Up 7 hours | a96870b5f32f | 3.2.2 |
| cfx-rda-access-manager | 192.168.107.40 | Up 7 hours | 33362030fe52 | 3.2.2 |
| cfx-rda-resource-manager | 192.168.107.40 | Up 7 hours | 54a082b754be | 3.2.2 |
| cfx-rda-user-preferences | 192.168.107.40 | Up 7 hours | 829f95d98741 | 3.2.2 |
+--------------------------+--------------+------------+--------------+-------+
- Login to the host on which
rda_identity
service is running as rdauser using SSH CLI and run below command to copy the above downloaded script.
- Run the below command to switch into
rda_identity
service's container shell.
- Execute below command to migrate the customer branding (white labelling) changes.
2. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action in row level
3. Enable ML experiments manually if any experiments are configured (Organization --> Configuration --> Machine Learning)
4. FSM Installation Steps ( Applicable only for Remedy ITSM ticketing deployment )
a) Update the Team configuration that was created for ITSM ticketing (Team with Source 'Others'). Include the following content in the JSON editor of the Team's configuration. Adjust or add alert sources and execution delay as necessary.
[
{
"alert_source": "SNMP",
"execution_delay": 900,
"auto_share": {
"create": true,
"update": true,
"close": true,
"resolved": true,
"cancel": true,
"alert_count_changes": true
}
},
{
"alert_source": "Syslog",
"execution_delay": 900,
"auto_share": {
"create": true,
"update": true,
"close": true,
"resolved": true,
"cancel": true,
"alert_count_changes": true
}
}
]
b) Download and Update latest FSM model Configuration ->RDA Administration -> FSM Models
Important
Take a backup of existing model before update
c) Add formatting templates Configuration ->RDA Administration -> Formatting Templates
- snow-notes-template
{% for r in rows %}
<b>Message</b> : {{r.a_message}} <br>
<b>RaisedAt</b> : {{r.a_raised_ts}} <br>
<b>UpdatedAt</b> : {{r.a_updated_ts}} <br>
<b>Status</b> : {{r.a_status}} <br>
<b>AssetName</b> : {{r.a_asset_name}} <br>
<b>AssetType</b> : {{r.a_asset_type}} <br>
<b>RepeatCount</b> : {{r.a_repeat_count}} <br>
<b>Action</b> : {{r.action_name}} <br>
<br><br>
{%endfor%}
- snow-description-template
d) Deploy FSM bundles
fsm_events_kafka_publisher_bundles,oia_fsm_aots_ticketing_bundle
oia_fsm_common_ticketing_bundles
e) Create 'fsm-debug-outbound-ticketing' and 'aots_ticket_notifications' PStreams from the UI if they do not already exist
f) Enable Service Blueprints - Read Alert Stream, Read Incident Stream, Create Ticket, Update Ticket, Resolve Ticket, Read AOTS Inbound Notifications
2.4. Upgrade from 7.2.1.x to 7.2.2.1
2.4.1. Pre-requisites
Below are the pre-requisites which need to be in place before upgrading the OIA (AIOps) application services.
-
RDAF Deployment CLI Version: 1.1.8
-
RDAF Infrastructure Services Tag Version: 1.0.2,1.0.2.1(nats)
-
RDAF Core Platform & Worker Services Tag Version: 3.2.1.3
-
RDAF Client (RDAC) Tag Version: 3.2.1 / 3.2.1.x
-
OIA Services Tag Version: 7.2.1.1/7.2.1.5/7.2.1.6
-
CloudFabrix recommends taking VMware VM snapshots where RDA Fabric platform/applications are deployed
Warning
Make sure all of the above pre-requisites are met before proceeding with the upgrade process.
Warning
Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
Important
Please make sure full backup of the RDAF platform system is completed before performing the upgrade.
Kubernetes: Please run the below backup command to take the backup of application data.
Non-Kubernetes: Please run the below backup command to take the backup of application data.Note: Please make sure the shared backup-dir is NFS mounted across all RDA Fabric Virtual Machines.
Run the below K8s commands and make sure the Kubernetes PODs are NOT in restarting
mode (it is applicable to only Kubernetes environment)
RDAF Deployment CLI Upgrade:
Please follow the below given steps.
Note
Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.
- Run the below command to verify the current version of RDAF CLI is 1.1.8 version.
- Download the RDAF Deployment CLI's newer version 1.1.9.1 bundle
- Upgrade the
rdaf
CLI to version 1.1.9.1
- Verify the installed
rdaf
CLI version is upgraded to 1.1.9.1
- Download the RDAF Deployment CLI's newer version 1.1.9.1 bundle and copy it to RDAF management VM on which
rdaf
deployment CLI was installed.
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.1.9.1
- Verify the installed
rdaf
CLI version
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.1.9.1
- Verify the installed
rdaf
CLI version
2.4.2. Download the new Docker Images
Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.
Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.
Please make sure 3.2.2.1 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-registry
- rda-rda-scheduler
- rda-collector
- rda-stack-mgr
- rda-identity
- rda-fsm
- rda-access-manager
- rda-resource-manager
- rda-user-preferences
- onprem-portal
- onprem-portal-nginx
- rda-worker-all
- onprem-portal-dbinit
- cfxdx-nb-nginx-all
- rda-event-gateway
- rdac
- rdac-full
Please make sure 7.2.2.1 image tag is downloaded for the below RDAF OIA Application services.
- rda-app-controller
- rda-alert-processor
- rda-file-browser
- rda-smtp-server
- rda-ingestion-tracker
- rda-reports-registry
- rda-ml-config
- rda-event-consumer
- rda-webhook-server
- rda-irm-service
- rda-alert-ingester
- rda-collaboration
- rda-notification-service
- rda-configuration-service
Downloaded Docker images are stored under the below path.
/opt/rdaf/data/docker/registry/v2
Run the below command to check the filesystem's disk usage on which docker images are stored.
Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.
2.4.3. Upgrade Services
2.4.3.1 Upgrade RDAF Infra Services
Download the below upgrade script and copy it to RDAF management VM on which rdaf
deployment CLI was installed.
Please run the downloaded upgrade script. It configures and applies the below changes.
- Creates a new Kafka user specifically for allowing Kafka topics which need to be exposed to external systems to publish the data such as events or alerts or notifications.
- Updates the
/opt/rdaf/config/network_config/config.json
file with newly created Kafka user's credentials. - Creates and applies lifecycle management policy for Opensearch's default
security audit logs
index to purge the older data. It is configured to purge the data that is older than 15 days. - Updates
/opt/rdaf/deployment-scripts/values.yaml
file to add the support for newalert processor companion
service. It also updatesrda-worker
service configuration to attach a new persistent-volume. The persisten-volume is created out of local host's directory path @/opt/rdaf/config/worker/rda_packages
on whichrda-worker
service is running.
Important
Please make sure above upgrade script is executed before moving to next step.
- Update
kafka-values.yaml
with below parameters.
Tip
- Upgrade script generates
kafka-values.yaml.latest
file in/opt/rdaf/deployment-scripts/
directory which will have updated configuration. - Please take a backup of the
kafka-values.yaml
file before making changes. - Please skip the changes if the current
kafka-values.yaml
file already has below mentioned parameters.
Edit kafka-values.yaml
file.
Find the below parameter and delete it if it exists.
Add below highlighted paramters. Please skip if these are already configured.
global:
imagePullSecrets:
- cfxregistry-cred
image:
registry: 192.168.10.10:5000
repository: rda-platform-kafka
tag: 1.0.2
pullPolicy: Always
heapOpts: -Xmx2048m -Xms2048m
defaultReplicationFactor: 3
offsetsTopicReplicationFactor: 3
transactionStateLogReplicationFactor: 3
transactionStateLogMinIsr: 2
maxMessageBytes: '8399093'
numPartitions: 15
externalAccess:
enabled: true
autoDiscovery:
enabled: true
service:
type: NodePort
nodePorts:
- 31252
- 31533
- 31964
serviceAccount:
create: true
rbac:
create: true
authorizerClassName: kafka.security.authorizer.AclAuthorizer
logRetentionHours: 24
allowEveryoneIfNoAclFound: true
Apply above configuration changes to kafka
infra service.
- Please wait till all of the Kafka service pods are in Running state.
- Please make sure all infra services are in Running state before moving to next section.
- Additionally, please run the below command to make sure there are no errors with RDA Fabric services.
2.4.3.2 Upgrade RDAF Platform Services
Step-1: Run the below command to initiate upgrading RDAF Platform services.
As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating
state and newer version PODs into Pending
state.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state. (Note: Please wait if a POD is in ContainerCreating
state until it is transistioned into Terminating
state.)
Step-3: Run the below command to put all Terminating
RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance
command that required to be put in maintenance mode.
Note
If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.
Warning
Wait for 120 seconds before executing Step-6.
Step-6: Run the below command to delete the Terminating RDAF platform service PODs
for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.
Please wait till all of the new platform service PODs are in Running
state and run the below command to verify their status and make sure all of them are running with 3.2.2.1 version.
+----------------------+----------------+-----------------+--------------+-------------+
| Name | Host | Status | Container Id | Tag |
+----------------------+--------------+-----------------+--------------+---------------+
| rda-api-server | 192.168.131.45 | Up 19 Hours ago | 4d5adbbf954b | 3.2.2.1 |
| rda-api-server | 192.168.131.44 | Up 19 Hours ago | 2c58bccaf38d | 3.2.2.1 |
| rda-registry | 192.168.131.44 | Up 20 Hours ago | 408a4ddcc685 | 3.2.2.1 |
| rda-registry | 192.168.131.45 | Up 20 Hours ago | 4f01fc820585 | 3.2.2.1 |
| rda-identity | 192.168.131.44 | Up 20 Hours ago | bdd1e91f86ec | 3.2.2.1 |
| rda-identity | 192.168.131.45 | Up 20 Hours ago | e63af9c6e9d9 | 3.2.2.1 |
| rda-fsm | 192.168.131.45 | Up 20 Hours ago | 3ec246cf7edd | 3.2.2.1 |
+----------------------+--------------+-----------------+--------------+---------------+
Run the below command to check one of the rda-scheduler
service is elected as a leader under Site
column.
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Infra | api-server | True | rda-api-server | 35a17877 | | 20:15:37 | 8 | 31.33 | | |
| Infra | api-server | True | rda-api-server | 8f678e25 | | 20:14:39 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | 17ce190d | | 20:47:41 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | 6b91bf23 | | 20:47:22 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-5 | 4ee8ef7d | | 20:48:20 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-5 | 895b7f5c | | 20:47:39 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | ab79ba8d | | 20:47:43 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | f2cefc92 | *leader* | 20:47:23 | 8 | 31.33 | | |
| Infra | worker | True | rda-worker-df5 | e2174794 | rda-site-01 | 20:28:50 | 8 | 31.33 | 1 | 97 |
| Infra | worker | True | rda-worker-df5 | 6debca1d | rda-site-01 | 20:26:08 | 8 | 31.33 | 2 | 91 |
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | rda-alert-in | 1afb8c8d | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 1afb8c8d | | minio-connectivity | ok | |
| rda_app | alert-ingester | rda-alert-in | 1afb8c8d | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | rda-alert-in | 1afb8c8d | | service-initialization-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 1afb8c8d | | kafka-connectivity | ok | Cluster=nzyeX9qkR-ChWXC0fRvSyQ, Broker=0, Brokers=[0, 2, 1] |
| rda_app | alert-ingester | rda-alert-in | 5751f199 | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 5751f199 | | minio-connectivity | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.4.3.3 Upgrade RDAC cli
2.4.3.4 Upgrade RDA Worker Services
Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.
Tip
If the RDA worker is deployed in http proxy environment, add the required environment variables for http proxy settings in /opt/rdaf/deployment-scripts/values.yaml
under rda_worker section. Below is the sample http proxy configuration for worker service.
rda_worker:
mem_limit: 8G
memswap_limit: 8G
privileged: false
environment:
RDA_ENABLE_TRACES: 'no'
RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
extraEnvs:
- name: http_proxy
value: "http://user:password@192.168.122.107:3128"
- name: https_proxy
value: "http://user:password@192.168.122.107:3128"
- name: HTTP_PROXY
value: "http://user:password@192.168.122.107:3128"
- name: HTTPS_PROXY
value: "http://user:password@192.168.122.107:3128"
Step-2: Run the below command to check the status of the existing and newer worker PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating state.
(Note: Please wait if a POD is in ContainerCreating
state until it is transistioned into Terminating
state.)
Step-3: Run the below command to put all Terminating
RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance command that is required to be put in maintenance
mode.
Step-4: Copy & Paste the rdac maintenance command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.
Warning
Wait for 120 seconds before executing Step-6.
Step-6: Run the below command to delete the Terminating
RDAF worker service PODs
for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Repeat above steps from Step-2 to Step-6 for rest of the RDAF Worker service PODs.
Please wait till all the new worker service pods are in Running state.
Step-7: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.
+------------+----------------+----------------+--------------+-------------+
| Name | Host | Status | Container Id | Tag |
+------------+----------------+----------------+--------------+-------------+
| rda-worker | 192.168.131.44 | Up 6 Hours ago | eb679ed8a6c6 | 3.2.2.1 |
| rda-worker | 192.168.131.45 | Up 6 Hours ago | a3356b168c50 | 3.2.2.1 |
| | | | | |
+------------+----------------+----------------+--------------+-------------+
Step-8: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.
2.4.3.5 Upgrade OIA Application Services
Step-1: Run the below commands to initiate upgrading RDAF OIA Application services.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each OIA application service is in Terminating
state.
(Note: Please wait if a POD is in ContainerCreating
state until it is transistioned into Terminating
state.)
Step-3: Run the below command to put all Terminating OIA application service PODs into maintenance mode. It will list all of the POD Ids of OIA application services along with rdac maintenance
command that are required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the OIA application services.
Warning
Wait for 120 seconds before executing Step-6.
Step-6: Run the below command to delete the Terminating OIA application service PODs
for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Repeat above steps from Step-2 to Step-6 for rest of OIA application service PODs.
Please wait till all of the new OIA application service PODs are in Running
state and run the below command to verify their status and make sure they are running with 7.2.2.1 version.
+-------------------------------+----------------+-----------------+--------------+-----------------+
| Name | Host | Status | Container Id | Tag |
+-------------------------------+--------------+-----------------+--------------+-------------------+
| rda-alert-ingester | 192.168.131.50 | Up 1 Days ago | a400c11be238 | 7.2.2.1 |
| rda-alert-ingester | 192.168.131.49 | Up 1 Days ago | 5187d5a093a5 | 7.2.2.1 |
| rda-alert-processor | 192.168.131.46 | Up 1 Days ago | 34901aba5e7d | 7.2.2.1 |
| rda-alert-processor | 192.168.131.47 | Up 1 Days ago | e6fe0aa7ffe4 | 7.2.2.1 |
| rda-alert-processor-companion | 192.168.131.50 | Up 1 Days ago | 8e3cc2f3b252 | 7.2.2.1 |
| rda-alert-processor-companion | 192.168.131.49 | Up 1 Days ago | 4237fb52031c | 7.2.2.1 |
| rda-app-controller | 192.168.131.47 | Up 1 Days ago | fbe360d13fa3 | 7.2.2.1 |
| rda-app-controller | 192.168.131.46 | Up 1 Days ago | 8346f5c69e7b | 7.2.2.1 |
+-------------------------------+----------------+-----------------+--------------+-----------------+
Step-7: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service
has leader status under Site
column.
Run the below command to check if all RDA worker services has ok status and does not throw any failure messages.
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------|
| App | cfxdimensions-app-collaboration | True | rda-collaborat | ba007878 | | 22:57:58 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | rda-file-brows | bf349af7 | | 23:00:54 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | rda-file-brows | 46c7c2dc | | 22:52:17 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | rda-irm-servic | 34698062 | | 23:00:23 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | rda-irm-servic | b824b35b | *leader* | 22:50:33 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | rda-notificati | 73d2c7f9 | | 23:01:23 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | rda-notificati | bac009ba | | 22:59:05 | 8 | 31.33 | | |
| App | cfxdimensions-app-resource-manager | True | rda-resource-m | 3e164b71 | | 23:25:24 | 8 | 31.33 | | |
| App | cfxdimensions-app-resource-manager | True | rda-resource-m | dba599c6 | | 23:25:00 | 8 | 31.33 | | |
| App | configuration-service | True | rda-configurat | dd7ec9d9 | | 5:46:22 | 8 | 31.33 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
2.4.4 Post Installation Steps
-
Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select
oia_l1_l2_bundle
and Click on deploy action -
Download the script from below path to migrate the UI-Icon URL from private to Public
Tip
This script run is an optional step to perform only if (1) white labeling customization was done on the Login page with an uploaded image before the version upgrade, (2) you are experiencing that the custom image is no longer showing up at the Login page after the upgrade.
- Copy the above script to
rda_identity
platform service container. Run the below command to get the container-id forrda_identity
and the host IP on which it is running.
+--------------------------+--------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+--------------+------------+--------------+-------+
| rda_api_server | 192.168.107.40 | Up 7 hours | 6540e670fdb9 | 3.2.2.1 |
| rda_registry | 192.168.107.40 | Up 7 hours | 98bead8d0599 | 3.2.2.1 |
....
| rda_identity | 192.168.107.40 | Up 7 hours | 67940390d61f | 3.2.2.1 |
| rda_fsm | 192.168.107.40 | Up 7 hours | a96870b5f32f | 3.2.2.1 |
| cfx-rda-access-manager | 192.168.107.40 | Up 7 hours | 33362030fe52 | 3.2.2.1 |
| cfx-rda-resource-manager | 192.168.107.40 | Up 7 hours | 54a082b754be | 3.2.2.1 |
| cfx-rda-user-preferences | 192.168.107.40 | Up 7 hours | 829f95d98741 | 3.2.2.1 |
+--------------------------+--------------+------------+--------------+-------+
- Login to the host on which
rda_identity
service is running as rdauser using SSH CLI and run below command to copy the above downloaded script.
- Run the below command to switch into
rda_identity
service's container shell.
- Execute below command to migrate the customer branding (white labelling) changes.
- In this new version (7.2.2.1), suppression policy added support to read the data from a pstream to suppress the alerts. As a pre-requisite for this feature to work, the pstream that is going to be used in a suppression policy, should be configured with
attr_name
and it's value using which it can filter the alerts to apply the suppression policy. Additionally, the attributesstart_time_utc
andend_time_utc
should be in ISO datetime format.
- This new version also added a new feature to enrich the incoming alerts using either
dataset
orpstream
or both within each alert's source mapper configuration. Below is a sample configuration for a reference on how to usedataset_enrich
andstream_enrich
functions within the alert mapper.
Dataset based enrichment:
- name: Dataset name
- condition: CFXQL based condition which can be defined with one or more conditions with
AND
andOR
between each condition. Each condition is evaluated in the specified order and it picks the enrichment value(s) for whichever condition matches. - enriched_columns: Specify one or more attributes to be selected as enriched attributes on above condition match. When no attribute is specified, it will pick all of the available attributes.
{
"func": {
"dataset_enrich": {
"name": "nagios-host-group-members",
"condition": "host_name is '$assetName'",
"enriched_columns": "group_id,hostgroup_name"
}
}
}
Pstream based enrichment:
- name: Pstream name
- condition: CFXQL based condition which can be defined with one or more conditions with
AND
andOR
between each condition. Each condition is evaluated in the specified order and it picks the enrichment value(s) for whichever condition matches. - enriched_columns: Specify one or more attributes to be selected as enriched attributes on above condition match. When no attribute is specified, it will pick all of the available attributes.
{
"func": {
"stream_enrich": {
"name": "nagios-host-group-members",
"condition": "host_name is '$assetName'",
"enriched_columns": {
"group_id": "stream_id",
"hostgroup_name": "stream_hostgroup"
}
}
}
}
2.5. Upgrade from 7.2.2.1 to 7.2.2.2
RDAF Platform: From 3.2.2.x to 3.2.2.2
OIA (AIOps) Application: From 7.2.2.x to 7.2.2.2
RDAF Deployment rdaf & rdafk8s
CLI: From 1.1.9.x to 1.1.9.2
RDAF Client rdac
CLI: From 3.2.2.x to 3.2.2.2
2.5.1. Prerequisites
Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.
-
RDAF Deployment CLI version: 1.1.9.1
-
Infra Services tag: 1.0.2, 1.0.2.1(nats)
-
Platform Services and RDA Worker tag: 3.2.2.1
-
OIA Application Services tag: 7.2.2.1
-
CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed
-
RDAF Deployment CLI version: 1.1.9
-
Infra Services tag: 1.0.2,1.0.2.1(nats)
-
Platform Services and RDA Worker tag: 3.2.2
-
OIA Application Services tag: 7.2.2
-
CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed
Warning
Make sure all of the above pre-requisites are met before proceeding with the upgrade process.
Warning
Kubernetes: Though Kubernetes based RDA Fabric deployment supports zero downtime upgrade, it is recommended to schedule a maintenance window for upgrading RDAF Platform and AIOps services to newer version.
Non-Kubernetes: Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
Important
Please make sure full backup of the RDAF platform system is completed before performing the upgrade.
Kubernetes: Please run the below backup command to take the backup of application data.
Non-Kubernetes: Please run the below backup command to take the backup of application data. Note: Please make sure this backup-dir is mounted across all infra,cli vms.Run the below command on RDAF Management system and make sure the Kubernetes PODs are NOT in restarting
mode (it is applicable to only Kubernetes environment)
- Verify that RDAF deployment
rdaf
cli version is 1.1.9 orrdafk8s
cli version is 1.1.9.1 on the VM where CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployments.
- On-premise docker registry service version is 1.0.2
- RDAF Infrastructure services version is 1.0.2 (
rda-nats
service version is1.0.2.1
andrda-minio
service version isRELEASE.2022-11-11T03-44-20Z
)
Run the below command to get RDAF Infra services details
- RDAF Platform services version is 3.2.2.1
Run the below command to get RDAF Platform services details
- RDAF OIA Application services version is 7.2.2.1
Run the below command to get RDAF App services details
Run the below command to get RDAF Infra services details
- RDAF Platform services version is 3.2.2
Run the below command to get RDAF Platform services details
- RDAF OIA Application services version is 7.2.2
Run the below command to get RDAF App services details
RDAF Deployment CLI Upgrade:
Please follow the below given steps.
Note
Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.
Login into the VM where rdaf & rdafk8s
deployment CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.
- Download the RDAF Deployment CLI's newer version 1.1.9.2 bundle.
- Upgrade the
rdaf & rdafk8s
CLI to version 1.1.9.2
- Verify the installed
rdaf & rdafk8s
CLI version is upgraded to 1.1.9.2
- Download the RDAF Deployment CLI's newer version 1.1.9.2 bundle and copy it to RDAF management VM on which
rdaf & rdafk8s
deployment CLI was installed.
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.1.9.2
- Verify the installed
rdaf
CLI version
- Download the RDAF Deployment CLI's newer version 1.1.9.2 bundle
- Upgrade the
rdaf
CLI to version 1.1.9.2
- Verify the installed
rdaf
CLI version is upgraded to 1.1.9.2
- To stop application services, run the below command. Wait until all of the services are stopped.
- To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
- To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
- Download the RDAF Deployment CLI's newer version 1.1.9.2 bundle and copy it to RDAF management VM on which
rdaf & rdafk8s
deployment CLI was installed.
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.1.9.2
- Verify the installed
rdaf
CLI version
2.5.2. Download the new Docker Images
Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.
Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.
Please make sure 1.0.2.1 image tag is downloaded for the below RDAF Infra services.
- rda-platform-haproxy
Please make sure 3.2.2.2 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-registry
- rda-rda-scheduler
- rda-collector
- rda-stack-mgr
- rda-identity
- rda-fsm
- rda-access-manager
- rda-resource-manager
- rda-user-preferences
- onprem-portal
- onprem-portal-nginx
- rda-worker-all
- onprem-portal-dbinit
- cfxdx-nb-nginx-all
- rda-event-gateway
- rdac
- rdac-full
Please make sure 3.2.2.3 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-worker-all
- cfxdx-nb-nginx-all
Please make sure 7.2.2.2 image tag is downloaded for the below RDAF OIA Application services.
- rda-app-controller
- rda-alert-processor
- rda-file-browser
- rda-smtp-server
- rda-ingestion-tracker
- rda-reports-registry
- rda-ml-config
- rda-event-consumer
- rda-webhook-server
- rda-irm-service
- rda-alert-ingester
- rda-collaboration
- rda-notification-service
- rda-configuration-service
Please make sure 7.2.2.3 image tag is downloaded for the below RDAF OIA Application services.
- rda-irm-service
Downloaded Docker images are stored under the below path.
/opt/rdaf/data/docker/registry/v2
Run the below command to check the filesystem's disk usage on which docker images are stored.
Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.
2.5.3.Upgrade Steps
2.5.3.1 Upgrade RDAF Infra Services
Download the below python script (rdaf_upgrade_119_119_1_to_119_2.py
)
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.9.2/rdaf_upgrade_119_119_1_to_119_2.py
Please run the downloaded python script. It generates a new values.yaml.latest
with new environment variables for HAProxy infrastructure service and rda-portal (front-end) platform service.
These environment variables need to be configured with appropriate values when CFX RDA Fabric portal need to be integrated and cross launched from 3rd party end user UI portal.
Note
Below mentioned environment variables are mandatory, however, their values can be left empty if integration with 3rd party external UI portal is not required.
- HAProxy environment variables
EXTERNAL_PORTAL_URL
: 3rd party UI portal url (ex: https://external-portal.acme.com)
CFX_IP_ADDRESS
: RDA Fabric platform's load balancer's virtual IP address (when configured in HA) or load balancer's IP address to access the UI portal.
- rda-protal (front-end) environment variable
CFX_URL_PREFIX
: Specify custom base URI string which can be used within the 3rd party end user UI portal to redirect the requests to RDA Fabric platform.
Please run the downloaded python upgrade script.
Once the script is executed it will create /opt/rdaf/deployment-scripts/values.yaml.latest
file.
Note
Please take a backup of /opt/rdaf/deployment-scripts/values.yaml
file.
cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup
Edit /opt/rdaf/deployment-scripts/values.yaml
and apply the below changes for haproxy and rda_portal services
Under haproxy
service configuration, set the environment variable EXTERNAL_PORTAL_URL
with external portal URL. Note: https://external-portal.acme.com
is used for a reference only. Also, set the environment variable CFX_IP_ADDRESS
with RDA Fabric load-balancer's IP address (non-HA configuration) or virtual IP address when configured in HA.
haproxy:
mem_limit: 2G
memswap_limit: 2G
environment:
EXTERNAL_PORTAL_URL: "https://external-portal.acme.com"
CFX_IP_ADDRESS: "<rda-fabric-ui-portal-ip>"
Under rda_portal
service configuration, set the environment variable CFX_URL_PREFIX
with customer URI string as shown below. Note: aiops
is used for a reference only. When configured, all requests which hits https://external-portal.acme.com/aiops
URI path on 3rd party UI portal, requests are forwarded to RDA Fabric platform and vice-versa.
rda_portal:
...
...
portal_frontend:
resources:
requests:
memory: 100Mi
limits:
memory: 2Gi
env:
CFX_URL_PREFIX: "aiops"
Configure the environment variables with empty values when 3rd party external portal integration is NOT needed.
rda_portal:
...
...
portal_frontend:
resources:
requests:
memory: 100Mi
limits:
memory: 2Gi
env:
CFX_URL_PREFIX: ""
-
Upgrade HAProxy service using below command
Run the below RDAF command to check infra status
+----------------+----------------+-------------+--------------+---------+ | Name | Host | Status | Container Id | Tag | +----------------+----------------+-------------+--------------+---------+ | haproxy | 192.168.131.41 | Up 25 hours | 21ce252eec14 | 1.0.2.1 | | haproxy | 192.168.131.42 | Up 25 hours | 329a6aa40e40 | 1.0.2.1 | | keepalived | 192.168.131.41 | active | N/A | N/A | | keepalived | 192.168.131.42 | active | N/A | N/A | | nats | 192.168.131.41 | Up 2 months | 7b7a15f7d742 | 1.0.2.1 | | nats | 192.168.131.42 | Up 2 months | a92cd1df2cbf | 1.0.2.1 | +----------------+----------------+-----------------+--------------+-----+
Before initiating the upgrade steps, RDA Fabric platform, worker and application services need to be stopped.
- To stop OIA application services, run the below command. Wait until all of the services are stopped.
- To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
- To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
- Upgrade HAProxy using below command
Run the below RDAF command to check infra status
+----------------+----------------+-------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+----------------+----------------+-------------+--------------+---------+
| haproxy | 192.168.107.63 | Up 25 hours | 21ce252eec14 | 1.0.2.1 |
| haproxy | 192.168.107.64 | Up 25 hours | 329a6aa40e40 | 1.0.2.1 |
| keepalived | 192.168.107.63 | active | N/A | N/A |
| keepalived | 192.168.107.64 | active | N/A | N/A |
| nats | 192.168.107.63 | Up 2 months | 7b7a15f7d742 | 1.0.2.1 |
| nats | 192.168.107.64 | Up 2 months | a92cd1df2cbf | 1.0.2.1 |
+----------------+----------------+-----------------+--------------+-----+
Run the below RDAF command to check infra healthcheck status
+----------------+-----------------+--------+----------------------+--------------+-----------------+
| Name | Check | Status | Reason | Host | Container Id |
+----------------+-----------------+--------+----------------------+--------------+-----------------+
| haproxy | Port Connection | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Service Status | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Firewall Port | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Port Connection | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| haproxy | Service Status | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| haproxy | Firewall Port | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| keepalived | Service Status | OK | N/A | 192.168.107.63 | N/A |
| keepalived | Service Status | OK | N/A | 192.168.107.64 | N/A |
| nats | Port Connection | OK | N/A | 192.168.107.63 | f57ed825681b |
| nats | Service Status | OK | N/A | 192.168.107.63 | f57ed825681b |
+----------------+-----------------+--------+----------------------+--------------+-----------------+
2.5.3.2 Upgrade RDAF Platform Services
Step-1: Run the below command to initiate upgrading RDAF Platform services.
As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating
state and newer version PODs into Pending
state.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating
state.
Step-3: Run the below command to put all Terminating
RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance
command that required to be put in maintenance mode.
Note
If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.
Step-6: Run the below command to delete the Terminating
RDAF platform service PODs
for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.
Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.2.2
version.
+----------------------+----------------+----------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+----------------------+----------------+----------------+--------------+---------+
| rda-api-server | 192.168.131.44 | Up 1 Hours ago | f97c1658a0b7 | 3.2.2.2 |
| rda-api-server | 192.168.131.44 | Up 1 Days ago | 99cc29596560 | 3.2.2.2 |
| rda-registry | 192.168.131.44 | Up 1 Days ago | ee2d72396575 | 3.2.2.2 |
| rda-registry | 192.168.131.44 | Up 2 Hours ago | 95c36fc91800 | 3.2.2.2 |
| rda-identity | 192.168.131.44 | Up 1 Days ago | 3d6aeb4c6c53 | 3.2.2.2 |
| rda-identity | 192.168.131.44 | Up 2 Hours ago | 9303f3d0e7ed | 3.2.2.2 |
| rda-fsm | 192.168.131.44 | Up 2 Hours ago | 342cbfe89b78 | 3.2.2.2 |
| rda-fsm | 192.168.131.44 | Up 1 Days ago | 5e77c12fc920 | 3.2.2.2 |
| rda-access-manager | 192.168.131.44 | Up 2 Hours ago | b218a44f022c | 3.2.2.2 |
| rda-access-manager | 192.168.131.44 | Up 1 Days ago | 70ed48e783b9 | 3.2.2.2 |
+----------------------+--------------+----------------+--------------+-----------+
Run the below command to check rda-fsm
service is up and running and also verify that one of the rda-scheduler
service is elected as a leader under Site
column.
+-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| Infra | api-server | True | 40d242cf70f5 | 6f7ecfe2 | | 2 days, 7:40:27 | 8 | 31.33 | | |
| Infra | api-server | True | 9145166d798b | 6114b271 | | 2 days, 7:40:52 | 8 | 31.33 | | |
| Infra | collector | True | a450b3da5188 | 1a86bf07 | | 2 days, 7:39:59 | 8 | 31.33 | | |
| Infra | collector | True | 82ccb77d84e7 | 46c83c44 | | 2 days, 7:39:44 | 8 | 31.33 | | |
| Infra | registry | True | c93e2eff7c37 | 30ad85d6 | | 2 days, 7:40:32 | 8 | 31.33 | | |
| Infra | registry | True | 44d01548a49c | 0bb96897 | | 2 days, 7:40:26 | 8 | 31.33 | | |
| Infra | scheduler | True | 159d453aad50 | 2cb4831c | *leader* | 2 days, 7:40:20 | 8 | 31.33 | | |
| Infra | scheduler | True | 0682962441e4 | d6b1fb3b | | 2 days, 7:40:12 | 8 | 31.33 | | |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
Run the below command to check if all services has ok status and does not throw any failure messages.
Warning
For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
Run the below command to initiate upgrading RDAF Platform services.
Please wait till all of the new platform service are in Up state and run the below command to verify their status and make sure all of them are running with 3.2.2.2 version.
+--------------------------+----------------+---------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+----------------+---------------+--------------+---------+
| rda_api_server | 192.168.107.61 | Up 58 minutes | 9145166d798b | 3.2.2.2 |
| rda_api_server | 192.168.107.62 | Up 57 minutes | 40d242cf70f5 | 3.2.2.2 |
| rda_registry | 192.168.107.61 | Up 57 minutes | c93e2eff7c37 | 3.2.2.2 |
| rda_registry | 192.168.107.62 | Up 57 minutes | 44d01548a49c | 3.2.2.2 |
| rda_scheduler | 192.168.107.61 | Up 57 minutes | 159d453aad50 | 3.2.2.2 |
| rda_scheduler | 192.168.107.62 | Up 57 minutes | 0682962441e4 | 3.2.2.2 |
| rda_collector | 192.168.107.61 | Up 56 minutes | a450b3da5188 | 3.2.2.2 |
| rda_collector | 192.168.107.62 | Up 56 minutes | 82ccb77d84e7 | 3.2.2.2 |
+--------------------------+--------------+---------------+--------------+-----------+
Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-status | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-status | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=1, Brokers=[2, 3, 1] |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.5.3.3 Upgrade rdac
CLI
2.5.3.4 Upgrade RDA Worker Services
Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating
state.
Step-3: Run the below command to put all Terminating
RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance
command that is required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.
Step-6: Run the below command to delete the Terminating
RDAF worker service PODs
for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds between each RDAF worker service upgrade by repeating above steps from Step-2 to Step-6 for rest of the RDAF worker service PODs.
Step-7: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.
+------------+----------------+---------------+--------------+-----------+
| Name | Host | Status | Container Id | Tag |
+------------+----------------+---------------+--------------+-----------+
| rda-worker | 192.168.131.49 | Up 2 Days ago | 7f5cc2a6ff82 | 3.2.2.3 |
| rda-worker | 192.168.131.50 | Up 2 Days ago | 17e06d02128d | 3.2.2.3 |
+------------+----------------+---------------+--------------+-----------+
Step-8: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.
- Upgrade RDA Worker Services
Please run the below command to initiate upgrading the RDA Worker service PODs.
Please wait for 120 seconds to let the newer version of RDA Worker service containers join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service containers.
+------------+----------------+------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+------------+----------------+------------+--------------+---------+
| rda_worker | 192.168.107.61 | Up 2 hours | aa8319a88bc1 | 3.2.2.3 |
| rda_worker | 192.168.107.62 | Up 2 hours | 56e78986283f | 3.2.2.3 |
+------------+----------------+------------+--------------+---------+
2.5.3.5 Upgrade OIA Application Services
Step-1: Run the below commands to initiate upgrading RDAF OIA Application services
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each OIA application service is in Terminating state.
Step-3: Run the below command to put all Terminating OIA application service PODs into maintenance mode. It will list all of the POD Ids of OIA application services along with rdac maintenance
command that are required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the OIA application services.
Step-6: Run the below command to delete the Terminating OIA application service PODs
for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the OIA application service PODs.
Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.2.2 version.
+-------------------------------+----------------+-----------------+--------------+-----------+
| Name | Host | Status | Container Id | Tag |
+-------------------------------+----------------+-----------------+--------------+-----------+
| rda-alert-ingester | 192.168.131.46 | Up 1 Days ago | f546428c2a1a | 7.2.2.2 |
| rda-alert-ingester | 192.168.131.46 | Up 1 Days ago | 88a68aa40a9a | 7.2.2.2 |
| rda-alert-processor | 192.168.131.46 | Up 1 Days ago | 5d958ce95d4c | 7.2.2.2 |
| rda-alert-processor | 192.168.131.46 | Up 1 Days ago | cddbfed7dbba | 7.2.2.2 |
| rda-alert-processor-companion | 192.168.131.46 | Up 1 Days ago | 127cd9e895a1 | 7.2.2.2 |
| rda-alert-processor-companion | 192.168.131.46 | Up 1 Days ago | 1ac3ae88d16f | 7.2.2.2 |
| rda-app-controller | 192.168.131.46 | Up 1 Days ago | cf7d126099a6 | 7.2.2.2 |
| rda-app-controller | 192.168.131.46 | Up 1 Days ago | fcd5bb29c429 | 7.2.2.2 |
| rda-collaboration | 192.168.131.46 | Up 1 Days ago | 9c3243fb3094 | 7.2.2.2 |
+-------------------------------+----------------+-----------------+--------------+-----------+
Step-7: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service
has leader status under Site
column.
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------|
| App | cfxdimensions-app-access-manager | True | 3a164c761ac7 | 6f02493c | | 2 days, 7:38:22 | 8 | 31.33 | | |
| App | cfxdimensions-app-access-manager | True | d56b629c2c3b | e5ff5696 | | 2 days, 7:38:05 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | 8aafda236efe | 126203ec | | 2 days, 7:11:18 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | 3ea382fdc6af | 618a650b | | 2 days, 7:10:58 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | d6f0d127ab06 | deb9c0c4 | | 2 days, 7:17:45 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | 2b9851b95094 | 013f5b00 | | 2 days, 7:17:25 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | 8361c0008d18 | a9fe343e | *leader* | 2 days, 7:12:36 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | ca8a2cbdca81 | 8f497bb7 | | 2 days, 7:12:14 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | dfbbcdcddafc | 8d0425ec | | 2 days, 7:18:24 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | 753472f0a9be | 485800b5 | | 2 days, 7:18:06 | 8 | 31.33 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | rda-alert-in | 47518623 | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 47518623 | | minio-connectivity | ok | |
| rda_app | alert-ingester | rda-alert-in | 47518623 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | rda-alert-in | 47518623 | | service-initialization-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 47518623 | | kafka-connectivity | ok | Cluster=nzyeX9qkR-ChWXC0fRvSyQ, Broker=1, Brokers=[0, 2, 1] |
| rda_app | alert-ingester | rda-alert-in | 82bcaa7c | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 82bcaa7c | | minio-connectivity | ok |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
Upgrade rda-irm-service
to 7.2.2.3:
Step-1: Run the below commands to initiate upgrading the rda-irm-service service
Step-2: Run the below command to check the status of the existing rda-irm-service
PODs and make sure atleast one instance of rda-irm-service
service is in Terminating state.
Step-3: Run the below command to put rda-irm-service
that is in Terminating state into maintenance mode. It will list all of the rda-irm-service
POD services along with rdac maintenance
command that are required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the rda-irm-service
Step-6: Run the below command to delete the Terminating rda-irm-service
service PODs
for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the rda-irm-service
service PODs.
Please wait till all of the new rda-irm-service
service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.2.3 version.
+-------------------------------+----------------+-----------------+--------------+-----------+
| Name | Host | Status | Container Id | Tag |
+-------------------------------+----------------+-----------------+--------------+-----------+
| rda-irm-service | 192.168.131.46 | Up 1 Days ago | f546428c2a1a | 7.2.2.3 |
| rda-irm-service | 192.168.131.46 | Up 1 Days ago | 88a68aa40a9a | 7.2.2.3 |
| rda-alert-processor | 192.168.131.46 | Up 1 Days ago | 5d958ce95d4c | 7.2.2.2 |
| rda-alert-processor | 192.168.131.46 | Up 1 Days ago | cddbfed7dbba | 7.2.2.2 |
+-------------------------------+----------------+-----------------+--------------+-----------+
Step-7: Run the below command to verify all rda-irm-service
PODs are up and running. Please wait till the cfxdimensions-app-irm_service
has leader status under Site
column.
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------|
| App | cfxdimensions-app-access-manager | True | 3a164c761ac7 | 6f02493c | | 2 days, 7:38:22 | 8 | 31.33 | | |
| App | cfxdimensions-app-access-manager | True | d56b629c2c3b | e5ff5696 | | 2 days, 7:38:05 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | 8aafda236efe | 126203ec | | 2 days, 7:11:18 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | 3ea382fdc6af | 618a650b | | 2 days, 7:10:58 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | d6f0d127ab06 | deb9c0c4 | | 2 days, 7:17:45 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | 2b9851b95094 | 013f5b00 | | 2 days, 7:17:25 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | 8361c0008d18 | a9fe343e | *leader* | 2 days, 7:12:36 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | ca8a2cbdca81 | 8f497bb7 | | 2 days, 7:12:14 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | dfbbcdcddafc | 8d0425ec | | 2 days, 7:18:24 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | 753472f0a9be | 485800b5 | | 2 days, 7:18:06 | 8 | 31.33 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
Run the below commands to initiate upgrading the RDA Fabric OIA Application services.
Please wait till all of the new OIA application service containers are in Up state and run the below command to verify their status and make sure they are running with 7.2.2.2 version.
+-------------------------------+----------------+-----------------+--------------+-----------+
| Name | Host | Status | Container Id | Tag |
+-------------------------------+----------------+-----------------+--------------+-----------+
| rda-alert-ingester | 192.168.131.46 | Up 1 Days ago | f546428c2a1a | 7.2.2.2 |
| rda-alert-ingester | 192.168.131.46 | Up 1 Days ago | 88a68aa40a9a | 7.2.2.2 |
| rda-alert-processor | 192.168.131.46 | Up 1 Days ago | 5d958ce95d4c | 7.2.2.2 |
| rda-alert-processor | 192.168.131.46 | Up 1 Days ago | cddbfed7dbba | 7.2.2.2 |
| rda-alert-processor-companion | 192.168.131.46 | Up 1 Days ago | 127cd9e895a1 | 7.2.2.2 |
| rda-alert-processor-companion | 192.168.131.46 | Up 1 Days ago | 1ac3ae88d16f | 7.2.2.2 |
| rda-app-controller | 192.168.131.46 | Up 1 Days ago | cf7d126099a6 | 7.2.2.2 |
| rda-app-controller | 192.168.131.46 | Up 1 Days ago | fcd5bb29c429 | 7.2.2.2 |
| rda-collaboration | 192.168.131.46 | Up 1 Days ago | 9c3243fb3094 | 7.2.2.2 |
+-------------------------------+----------------+-----------------+--------------+-----------+
Upgrade rda-irm-service
to 7.2.2.3:
Run the below commands to initiate upgrading the rda-irm-service
service to 7.2.2.3 version.
Please wait till all of the rda-irm-service
containers are in Up state and run the below command to verify their status and make sure they are running with 7.2.2.3 version.
2.5.4.Post Upgrade Steps
2.5.4.1 OIA
1. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle
and Click on deploy action
2. Enable ML experiments manually if any experiments are configured Organization --> Configuration --> ML Experiments
3. By default resizableColumns
: false for alerts and incidents tabular report. If you want to resizable for alerts and incidents tabular report then make it true. Go to user Configuration -> RDA Administration -> User Dashboards then search below Dashboard
a) oia-alert-group-view-alerts-os
b) oia-alert-group-view-details-os
c) oia-alert-groups-os
d) oia-alert-tracking-os
e) oia-alerts-os
f) oia-event-tracking-os
g) oia-event-tracking-view-alerts
h) oia-incident-alerts-os
i) oia-view-alerts-policy
j) oia-view-groups-policy
k) incident-collaboration
l) oia-incidents-os-template
m) oia-incidents-os
n) oia-incidents
o) oia-my-incidents
2.5.4.2 DNAC
1. Make sure Prime credentials are added under Configuration → RDA Integrations → Credentials
Note
Make sure credential names matches with bot names specified below in Point No. 4
2. Deploy latest dna_center_bundle bundle from Configuration → RDA Integrations → Bundles → Click deploy action row level for dna_center_bundle.
3. Run dnac_create_pstreams pipeline from Configuration → RDA Integrations → Pipelines → Published Pipelines and search for dnac_create_pstreams and Click on Run in the action menu.
4. In the same Published Pipelines search for prime_clients_report and click on Edit Pipeline in Plain Text and uncomment the lines as shown below and change the version of pipeline and also check box publish pipeline and Click on Save
5. Download the latest DNAC template from below link to platform VM (where rdac is installed) and execute the command given below
rdac object add --name "dynamic_dnac_template.html" --folder widget_labels --file /tmp/dynamic_dnac_template.html
2.6. Upgrade from 7.2.2.2 to 7.3
RDAF Platform: From 3.2.2.2 to 3.3
OIA (AIOps) Application: From 7.2.2.2 to 7.3
RDAF Deployment rdaf & rdafk8s
CLI: From 1.1.9.2 to 1.1.10
RDAF Client rdac
CLI: From 3.2.2.2 to 3.3
2.6.1. Prerequisites
Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.
-
RDAF Deployment CLI version: 1.1.9.2
-
Infra Services tag: 1.0.2,1.0.2.1(nats, haproxy)
-
Platform Services and RDA Worker tag: 3.2.2.2/3.2.2.3
-
OIA Application Services tag: 7.2.2.2
-
CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed
-
RDAF Deployment CLI version: 1.1.9.2
-
Infra Services tag: 1.0.2,1.0.2.1(nats, haproxy)
-
Platform Services and RDA Worker tag: 3.2.2.2/3.2.2.3
-
OIA Application Services tag: 7.2.2.2
-
CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed
For FSM Pre-Upgrade & Post-Upgrade steps Click Here
Warning
Make sure all of the above pre-requisites are met before proceeding with the upgrade process.
Warning
Kubernetes: Though Kubernetes based RDA Fabric deployment supports zero downtime upgrade, it is recommended to schedule a maintenance window for upgrading RDAF Platform and AIOps services to newer version.
Non-Kubernetes: Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
Important
Please make sure full backup of the RDAF platform system is completed before performing the upgrade.
Kubernetes: Please run the below backup command to take the backup of application data.
Non-Kubernetes: Please run the below backup command to take the backup of application data. Note: Please make sure this backup-dir is mounted across all infra,cli vms.Run the below command on RDAF Management system and make sure the Kubernetes PODs are NOT in restarting
mode (it is applicable to only Kubernetes environment)
- Verify that RDAF deployment
rdaf
cli version is 1.1.9.2 orrdafk8s
cli version is 1.1.9.2 on the VM where CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployments.
- On-premise docker registry service version is 1.0.2
- RDAF Infrastructure services version is 1.0.2 (
rda-nats
service version is1.0.2.1
andrda-minio
service version isRELEASE.2022-11-11T03-44-20Z
)
Run the below command to get RDAF Infra services details
- RDAF Platform services version is 3.2.2.2 / 3.2.2.3
Run the below command to get RDAF Platform services details
- RDAF OIA Application services version is 7.2.2.2
Run the below command to get RDAF App services details
Run the below command to get RDAF Infra services details
- RDAF Platform services version is 3.2.2.2 / 3.2.2.3
Run the below command to get RDAF Platform services details
- RDAF OIA Application services version is 7.2.2.2
Run the below command to get RDAF App services details
RDAF Deployment CLI Upgrade:
Please follow the below given steps.
Note
Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.
Login into the VM where rdaf & rdafk8s
deployment CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.
- Download the RDAF Deployment CLI's newer version 1.1.10 bundle.
- Upgrade the
rdaf & rdafk8s
CLI to version 1.1.10
- Verify the installed
rdaf & rdafk8s
CLI version is upgraded to 1.1.10
- Download the RDAF Deployment CLI's newer version 1.1.10 bundle and copy it to RDAF management VM on which
rdaf & rdafk8s
deployment CLI was installed.
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.1.10
- Verify the installed
rdaf
CLI version
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.1.10
- Verify the installed
rdaf
CLI version
- Download the RDAF Deployment CLI's newer version 1.1.10 bundle
- Upgrade the
rdaf
CLI to version 1.1.10
- Verify the installed
rdaf
CLI version is upgraded to 1.1.10
- To stop application services, run the below command. Wait until all of the services are stopped.
- To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
- To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
- Download the RDAF Deployment CLI's newer version 1.1.10 bundle and copy it to RDAF management VM on which
rdaf & rdafk8s
deployment CLI was installed.
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.1.10
- Verify the installed
rdaf
CLI version
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.1.10
- Verify the installed
rdaf
CLI version
2.6.2. Download the new Docker Images
Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.
Note
Neo4j graphdb service is optional, please skip this step if this service is not needed.
Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.
Please make sure 3.3 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-registry
- rda-rda-scheduler
- rda-collector
- rda-identity
- rda-fsm
- rda-access-manager
- rda-resource-manager
- rda-user-preferences
- onprem-portal
- onprem-portal-nginx
- rda-worker-all
- onprem-portal-dbinit
- cfxdx-nb-nginx-all
- rda-event-gateway
- rda-chat-helper
- rdac
- rdac-full
Please make sure 7.3 image tag is downloaded for the below RDAF OIA Application services.
- rda-app-controller
- rda-alert-processor
- rda-file-browser
- rda-smtp-server
- rda-ingestion-tracker
- rda-reports-registry
- rda-ml-config
- rda-event-consumer
- rda-webhook-server
- rda-irm-service
- rda-alert-ingester
- rda-collaboration
- rda-notification-service
- rda-configuration-service
- rda-irm-service
- rda-alert-processor-companion
Please make sure 7.3.0.1 image tag is downloaded for the below RDAF OIA Application services.
- rda-event-consumer
Please make sure 7.3.2 image tag is downloaded for the below RDAF OIA Application services.
- rda-alert-ingester
Downloaded Docker images are stored under the below path.
/opt/rdaf/data/docker/registry/v2
Run the below command to check the filesystem's disk usage on which docker images are stored.
Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.
2.6.3.Upgrade Steps
2.6.3.1 Upgrade RDAF Infra Services
RDA Fabric platform has introduced supporting GraphDB service in 3.3 release. It is an optional service and it can be skipped during the upgrade process.
Download the python script (rdaf_upgrade_1192_1110_without_graphdb.py
) if GraphDB service is NOT going to be installed.
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.10/rdaf_upgrade_1192_1110_without_graphdb.py
Please run the downloaded python upgrade script.
It generates a new values.yaml.latest
with new environment variables for rda_scheduler infrastructure service.
Tip
Please skip the below step if GraphDB service is NOT going to be installed.
Warning
For installing neo4j GraphDB service, please add additional disk to RDA Fabric Infrastructure VM. Clicking Here
It is a pre-requisite and this step need to be completed before installing the neo4j GraphDB service.
Download the python script (rdaf_upgrade_1192_1110.py
) if GraphDB service is going to be installed.
Please run the downloaded python upgrade script.
It generates a new values.yaml.latest
with new environment variables for rda_scheduler infrastructure service and /opt/rdaf/config/network_config/config.json
file appended with neo4j GraphDB infra service
Once the above python script (with or with-out GraphDB configuration) is executed it will create /opt/rdaf/deployment-scripts/values.yaml.latest
file.
Note
Please take a backup of /opt/rdaf/deployment-scripts/values.yaml
file.
cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup
Edit /opt/rdaf/deployment-scripts/values.yaml
and apply the below changes for rda_scheduler service.
Under rda_scheduler
service configuration, set the below environment variables
Note
When integrating CFX RDA Fabric portal with GitHub, configure the following environment variables with appropriate values. However, these variables can be left empty if integration with GitHub is NOT required.
RDA_GIT_ACCESS_TOKEN: ''
RDA_GIT_URL: ''
RDA_GITHUB_ORG: ''
RDA_GITHUB_REPO: ''
RDA_GITHUB_BRANCH_PREFIX: ''
Note
For reference, please see the configuration of the rda_scheduler service mentioned below.
rda_scheduler:
mem_limit: 2G
memswap_limit: 2G
privileged: false
environment:
RDA_GIT_ACCESS_TOKEN: "ghp_cU3sDYe5yeARJrJaLflJLUBFdybDWY3KaKjV"
RDA_GIT_URL: "https://api.github.com"
RDA_GITHUB_ORG: "Organization Name"
RDA_GITHUB_REPO: "test-playground"
RDA_GITHUB_BRANCH_PREFIX: "main"
RDA_ENABLE_TRACES: "no"
DISABLE_REMOTE_LOGGING_CONTROL: "no"
RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
Tip
- Please skip the below step of installing neo4j GraphDB service if it is not needed.
- Please use the below mentioned command and wait till all of the neo4j pods are in Running state.
Run the below RDAF command to check infra status
+----------------+----------------+-------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+----------------+----------------+-------------+--------------+---------+
| haproxy | 192.168.131.41 | Up 25 hours | 21ce252eec14 | 1.0.2.1 |
| haproxy | 192.168.131.42 | Up 25 hours | 329a6aa40e40 | 1.0.2.1 |
| keepalived | 192.168.131.41 | active | N/A | N/A |
| keepalived | 192.168.131.42 | active | N/A | N/A |
| nats | 192.168.131.41 | Up 2 months | 7b7a15f7d742 | 1.0.2.1 |
| nats | 192.168.131.42 | Up 2 months | a92cd1df2cbf | 1.0.2.1 |
| rda-neo4j | 192.168.109.65 | Up 23 Hours | 7e533c138867 | 5.11.0 |
+----------------+----------------+-----------------+--------------+-----+
Tip
- Please skip the below step of installing neo4j GraphDB service if it is not needed.
- Install neo4j service using below command
Run the below RDAF command to check infra status
+----------------+----------------+-------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+----------------+----------------+-------------+--------------+---------+
| haproxy | 192.168.107.63 | Up 25 hours | 21ce252eec14 | 1.0.2.1 |
| haproxy | 192.168.107.64 | Up 25 hours | 329a6aa40e40 | 1.0.2.1 |
| keepalived | 192.168.107.63 | active | N/A | N/A |
| keepalived | 192.168.107.64 | active | N/A | N/A |
| nats | 192.168.107.63 | Up 2 months | 7b7a15f7d742 | 1.0.2.1 |
| nats | 192.168.107.64 | Up 2 months | a92cd1df2cbf | 1.0.2.1 |
| neo4j | 192.168.107.63 | Up 42 hours | ee7e26cecb82 | 5.11.0 |
+----------------+----------------+-------------+--------------+---------+
Run the below RDAF command to check infra healthcheck status
+----------------+-----------------+--------+-----------------+----------------+--------------+
| Name | Check | Status | Reason | Host | Container Id |
+----------------+-----------------+--------+-----------------+----------------+--------------+
| haproxy | Port Connection | OK | N/A | 192.168.107.63 | 21ce252eec14 |
| haproxy | Service Status | OK | N/A | 192.168.107.63 | 21ce252eec14 |
| haproxy | Firewall Port | OK | N/A | 192.168.107.64 | 329a6aa40e40 |
| keepalived | Service Status | OK | N/A | 192.168.107.63 | N/A |
| keepalived | Service Status | OK | N/A | 192.168.107.64 | N/A |
| nats | Port Connection | OK | N/A | 192.168.107.63 | 7b7a15f7d742 |
| nats | Service Status | OK | N/A | 192.168.107.63 | 7b7a15f7d742 |
| nats | Firewall Port | OK | N/A | 192.168.107.64 | a92cd1df2cbf |
| minio | Port Connection | OK | N/A | 192.168.107.62 | cb4b5f67dfc8 |
| minio | Service Status | OK | N/A | 192.168.107.62 | cb4b5f67dfc8 |
| mariadb | Port Connection | OK | N/A | 192.168.107.63 | 717b2b539a95 |
| mariadb | Service Status | OK | N/A | 192.168.107.63 | 717b2b539a95 |
| opensearch | Firewall Port | OK | N/A | 192.168.107.65 | 193de5b9d521 |
| zookeeper | Service Status | OK | N/A | 192.168.107.63 | 9df371735ec2 |
| kafka | Port Connection | OK | N/A | 192.168.107.65 | 8c5acc5d3073 |
| kafka | Service Status | OK | N/A | 192.168.107.65 | 8c5acc5d3073 |
| kafka | Firewall Port | OK | N/A | 192.168.107.65 | 8c5acc5d3073 |
| redis | Service Status | OK | Redis Slave | 192.168.107.65 | 0db5415aacee |
| redis | Firewall Port | OK | N/A | 192.168.107.65 | 0db5415aacee |
| redis-sentinel | Port Connection | OK | N/A | 192.168.107.63 | 66cc0ff7d29e |
| redis-sentinel | Service Status | OK | N/A | 192.168.107.63 | 66cc0ff7d29e |
| neo4j | Service Status | OK | N/A | 192.168.107.63 | ee7e26cecb82 |
| neo4j | Firewall Port | OK | N/A | 192.168.107.63 | ee7e26cecb82 |
| portal | Service Status | OK | N/A | 192.168.107.62 | d6c9b498227e |
| portal | Firewall Port | OK | N/A | 192.168.107.62 | d6c9b498227e |
+----------------+-----------------+--------+-----------------+--------------+----------------+
Before initiating the upgrade steps, RDA Fabric's platform, worker and application services need to be stopped.
- To stop OIA application services, run the below command. Wait until all of the services are stopped.
- To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
- To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
2.6.3.2 Upgrade RDAF Platform Services
Step-1: Run the below command to initiate upgrading RDAF Platform services.
As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating
state and newer version PODs into Pending
state.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating
state.
Step-3: Run the below command to put all Terminating
RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance
command that required to be put in maintenance mode.
Note
If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.
Step-6: Run the below command to delete the Terminating
RDAF platform service PODs
for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.
Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.3
version.
+---------------------+----------------+---------------+----------------+-------+
| Name | Host | Status | Container Id | Tag |
+---------------------+----------------+---------------+----------------+-------+
| rda-api-server | 192.168.131.46 | Up 1 Days ago | faf4cdd79dd4 | 3.3 |
| rda-api-server | 192.168.131.44 | Up 1 Days ago | 409c81c1000d | 3.3 |
| rda-registry | 192.168.131.46 | Up 1 Days ago | fa2682e9f7bb | 3.3 |
| rda-registry | 192.168.131.45 | Up 1 Days ago | 91eca9476848 | 3.3 |
| rda-identity | 192.168.131.46 | Up 1 Days ago | 4e5e337eabe7 | 3.3 |
| rda-identity | 192.168.131.44 | Up 1 Days ago | b10571cfa217 | 3.3 |
| rda-fsm | 192.168.131.44 | Up 1 Days ago | 1cea17b4d5e0 | 3.3 |
| rda-fsm | 192.168.131.46 | Up 1 Days ago | ac34fce6b2aa | 3.3 |
| rda-chat-helper | 192.168.131.45 | Up 1 Days ago | ea083e20a082 | 3.3 |
+---------------------+---------------+---------------+----------------+--------+
Run the below command to check rda-fsm
service is up and running and also verify that one of the rda-scheduler
service is elected as a leader under Site
column.
+-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| Infra | api-server | True | rda-api-server | b52f3919 | | 1 day, 3:43:49 | 8 | 31.33 | | |
| Infra | api-server | True | rda-api-server | 4fe976c4 | | 1 day, 3:42:42 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | 50ba4175 | | 1 day, 23:01:14 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | e8d040a0 | | 1 day, 23:01:33 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-5 | 4b220140 | | 1 day, 23:00:29 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-5 | 711afddf | | 1 day, 23:01:37 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | 21bbd0a9 | *leader* | 1 day, 23:01:15 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | ff2700dd | | 1 day, 22:59:38 | 8 | 31.33 | | |
| Infra | worker | True | rda-worker-59b | 94f56928 | rda-site-01 | 1 day, 22:36:25 | 8 | 31.33 | 3 | 95 |
| Infra | worker | True | rda-worker-59b | 786e86c2 | rda-site-01 | 1 day, 21:00:51 | 8 | 31.33 | 0 | 108 |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
Warning
For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
Run the below command to initiate upgrading RDAF Platform services.
Please wait till all of the new platform service are in Up state and run the below command to verify their status and make sure all of them are running with 3.3 version.
+--------------------------+----------------+------------+--------------+------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+----------------+------------+--------------+------+
| rda_api_server | 192.168.107.61 | Up 5 hours | 6fc70d6b82aa | 3.3 |
| rda_api_server | 192.168.107.62 | Up 5 hours | afa31a2c614b | 3.3 |
| rda_registry | 192.168.107.61 | Up 5 hours | 9f8adbb08b95 | 3.3 |
| rda_registry | 192.168.107.62 | Up 5 hours | cc8e5d27eb0a | 3.3 |
| rda_scheduler | 192.168.107.61 | Up 5 hours | f501e240e7a3 | 3.3 |
| rda_scheduler | 192.168.107.62 | Up 5 hours | c5b2b258efe1 | 3.3 |
| rda_collector | 192.168.107.61 | Up 5 hours | 2260fc37ebe5 | 3.3 |
| rda_collector | 192.168.107.62 | Up 5 hours | 3e7ab4518394 | 3.3 |
+--------------------------+----------------+------------+--------------+------+
Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-statu s | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | minio-connect ivity | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-depen dency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-initi alization-status | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | kafka-connect ivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=2, Brokers=[2, 3, 1] |
| rda_app | alert-ingester | 795652ebd914 | 91c603f4 | | service-statu s | ok | |
| rda_app | alert-ingester | 795652ebd914 | 91c603f4 | | minio-connect ivity | ok | |
| rda_app | alert-ingester | 795652ebd914 | 91c603f4 | | service-depen dency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 795652ebd914 | 91c603f4 | | service-initi alization-status | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.6.3.3 Upgrade rdac
CLI
2.6.3.4 Upgrade RDA Worker Services
Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating
state.
Step-3: Run the below command to put all Terminating
RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance
command that is required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.
Step-6: Run the below command to delete the Terminating
RDAF worker service PODs
for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds between each RDAF worker service upgrade by repeating above steps from Step-2 to Step-6 for rest of the RDAF worker service PODs.
Step-6: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.
+------------+----------------+------------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+------------+----------------+------------------+--------------+---------+
| rda-worker | 192.168.131.45 | Up 1 Days ago | afa217d2335a | 3.3 |
| rda-worker | 192.168.131.49 | Up 1 Days ago | e114872efc30 | 3.3 |
| rda-worker | 192.168.131.44 | Up 1 Minutes ago | 0787bdb1cfc1 | 3.3 |
| rda-worker | 192.168.131.50 | Up 3 Minutes ago | 185d3a08fa9c | 3.3 |
+------------+----------------+------------------+--------------+---------+
Step-7: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.
- Upgrade RDA Worker Services
Please run the below command to initiate upgrading the RDA Worker service PODs.
Please wait for 120 seconds to let the newer version of RDA Worker service containers join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service containers.
+------------+----------------+------------+--------------+---------+
| Name | Host | Status | Container Id | Tag |
+------------+----------------+------------+--------------+---------+
| rda_worker | 192.168.107.61 | Up 3 hours | 4fa9c94ffe3c | 3.3 |
| rda_worker | 192.168.107.62 | Up 3 hours | c0684c26c606 | 3.3 |
+------------+----------------+------------+--------------+---------+
2.6.3.5 Upgrade OIA Application Services
Step-1: Run the below commands to initiate upgrading RDAF OIA Application services
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each OIA application service is in Terminating state.
Step-3: Run the below command to put all Terminating OIA application service PODs into maintenance mode. It will list all of the POD Ids of OIA application services along with rdac maintenance
command that are required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the OIA application services.
Step-6: Run the below command to delete the Terminating OIA application service PODs
for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the OIA application service PODs.
Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.3 version.
+--------------------------------+----------------+----------------+----------------+-------+
| Name | Host | Status | Container Id | Tag |
+--------------------------------+----------------+----------------+----------------+-------+
| rda-alert-ingester | 192.168.131.47 | Up 1 Days ago | 653220e94e6b | 7.3 |
| rda-alert-ingester | 192.168.131.46 | Up 1 Days ago | b15255a3efcd | 7.3 |
| rda-alert-processor | 192.168.131.46 | Up 3 Hours ago | f5d6f91ceb37 | 7.3 |
| rda-alert-processor | 192.168.131.47 | Up 1 Days ago | 48a28bcff96e | 7.3 |
| rda-alert-processor-companion | 192.168.131.46 | Up 1 Days ago | 86e83ef2afa3 | 7.3 |
| rda-alert-processor-companion | 192.168.131.47 | Up 1 Days ago | ee74d9227837 | 7.3 |
| rda-app-controller | 192.168.131.47 | Up 1 Days ago | 9efeddfb6b65 | 7.3 |
+--------------------------------+----------------+----------------+----------------+-------+
Step-7: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service
has leader status under Site
column.
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------|
| App | cfxdimensions-app-access-manager | True | 3a164c761ac7 | 6f02493c | | 2 days, 7:38:22 | 8 | 31.33 | | |
| App | cfxdimensions-app-access-manager | True | d56b629c2c3b | e5ff5696 | | 2 days, 7:38:05 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | 8aafda236efe | 126203ec | | 2 days, 7:11:18 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | 3ea382fdc6af | 618a650b | | 2 days, 7:10:58 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | d6f0d127ab06 | deb9c0c4 | | 2 days, 7:17:45 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | 2b9851b95094 | 013f5b00 | | 2 days, 7:17:25 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | 8361c0008d18 | a9fe343e | *leader* | 2 days, 7:12:36 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | ca8a2cbdca81 | 8f497bb7 | | 2 days, 7:12:14 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | dfbbcdcddafc | 8d0425ec | | 2 days, 7:18:24 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | 753472f0a9be | 485800b5 | | 2 days, 7:18:06 | 8 | 31.33 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-status | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app | alert-ingester | 795652ebd914 | 91c603f4 | | service-status | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
Upgrade Event Consumer Service to 7.3.0.1:
Step-1: Run the below commands to initiate upgrading rda-event-consumer
services to 7.3.0.1
version
Step-2: Run the below commands to check the status of rda-event-consumer
service PODs and make sure atleast one instance of the service is in Terminating state.
Step-3: Run the below command to put all Terminating OIA application rda-event-consumer
service PODs into maintenance mode. It will list all of the POD Ids of rda-event-consumer
service along with rdac maintenance
command that are required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the rda-event-consumer
service.
Step-6: Run the below command to delete the Terminating rda-event-consumer
service PODs
for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the rda-event-consumer
service PODs.
Please wait till all of the new rda-event-consumer
service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.3.0.1 version.
Run the below command to check if all services has ok status and does not throw any failure messages.
Upgrade Alert Ingester Service to 7.3.2:
Step-1: Run the below commands to initiate upgrading rda-alert-ingester
services to 7.3.2
version
Step-2: Run the below commands to check the status of rda-alert-ingester
service PODs and make sure atleast one instance of the service is in Terminating state.
Step-3: Run the below command to put all Terminating OIA application rda-alert-ingester
service PODs into maintenance mode. It will list all of the POD Ids of rda-alert-ingester
service along with rdac maintenance
command that are required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the rda-alert-ingester
service.
Step-6: Run the below command to delete the Terminating rda-alert-ingester
service PODs
for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the rda-alert-ingester
service PODs.
Please wait till all of the new rda-alert-ingester
service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.3.2 version.
Run the below command to check if all services has ok status and does not throw any failure messages.
Run the below commands to initiate upgrading the RDA Fabric OIA Application services.
Please wait till all of the new OIA application service containers are in Up state and run the below command to verify their status and make sure they are running with 7.3 version.
+-----------------------------------+----------------+------------+--------------+-----+
| Name | Host | Status | Container Id | Tag |
+-----------------------------------+----------------+------------+--------------+-----+
| cfx-rda-irm-service | 192.168.107.66 | Up 5 hours | a53da18e68e8 | 7.3 |
| cfx-rda-irm-service | 192.168.107.67 | Up 5 hours | ae42ce5f7c5a | 7.3 |
| cfx-rda-ml-config | 192.168.107.66 | Up 5 hours | 5942676cea00 | 7.3 |
| cfx-rda-ml-config | 192.168.107.67 | Up 5 hours | a59e44cb9950 | 7.3 |
| cfx-rda-collaboration | 192.168.107.66 | Up 5 hours | 8465a6e01886 | 7.3 |
| cfx-rda-collaboration | 192.168.107.67 | Up 5 hours | 610a07bd2893 | 7.3 |
| cfx-rda-ingestion-tracker | 192.168.107.66 | Up 5 hours | fbc1c8d940ea | 7.3 |
| cfx-rda-ingestion-tracker | 192.168.107.67 | Up 5 hours | 607212ea01e9 | 7.3 |
| cfx-rda-alert-processor-companion | 192.168.107.66 | Up 5 hours | 6cb93d1bdda0 | 7.3 |
| cfx-rda-alert-processor-companion | 192.168.107.67 | Up 5 hours | 3f8bf14adb34 | 7.3 |
+-----------------------------------+----------------+------------+--------------+-----+
cfxdimensions-app-irm_service
has leader status under Site
column.
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------|
| App | cfxdimensions-app-access-manager | True | bd9e264212b5 | 68f9c494 | | 22:52:26 | 8 | 31.33 | | |
| App | cfxdimensions-app-access-manager | True | 5695b14a7743 | 9499b9f8 | | 22:50:52 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | 8465a6e01886 | cefbcfaa | | 22:23:26 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | 610a07bd2893 | d33b198b | | 22:23:05 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | 88352870e685 | e6ca73b0 | | 22:31:19 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | 18cdb22d4439 | 56e874fd | | 22:30:57 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | a53da18e68e8 | cdaf8950 | *leader* | 22:25:01 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | ae42ce5f7c5a | 472c324a | | 22:24:39 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | a11edf83127d | ba7d0978 | | 22:32:15 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | 458a0b43be9f | 2289a696 | | 22:31:53 | 8 | 31.33 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-status | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=2, Brokers=[2, 3, 1] |
| rda_app | alert-ingester | 795652ebd914 | 91c603f4 | | service-status | ok | |
| rda_app | alert-ingester | 795652ebd914 | 91c603f4 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 795652ebd914 | 91c603f4 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
Upgrade Event Consumer Service to 7.3.0.1:
Run the below commands to initiate upgrading the cfx-rda-event-consumer
services to 7.3.0.1 version.
Please wait till all of the new OIA application cfx-rda-event-consumer
service containers are in Up state and run the below command to verify their status and make sure they are running with 7.3.0.1 version.
Run the below command to check if all services has ok status and does not throw any failure messages.
Upgrade Alert Ingester Service to 7.3.2:
Run the below commands to initiate upgrading the cfx-rda-alert-ingester
services to 7.3.2 version.
Please wait till all of the new OIA application cfx-rda-alert-ingester
service containers are in Up state and run the below command to verify their status and make sure they are running with 7.3.2 version.
Run the below command to check if all services has ok status and does not throw any failure messages.
2.6.4.Post Upgrade Steps
2.6.4.1 OIA
1. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle
and Click on deploy action
2. Enable ML experiments manually if any experiments are configured Organization --> Configuration --> ML Experiments
3. By default resizableColumns
: false for alerts and incidents tabular report. If you want to resizable for alerts and incidents tabular report then make it true. Go to user Configuration -> RDA Administration -> User Dashboards then search below Dashboard
a) oia-alert-group-view-alerts-os
b) oia-alert-group-view-details-os
c) oia-alert-groups-os
d) oia-alert-tracking-os
e) oia-alerts-os
f) oia-event-tracking-os
g) oia-event-tracking-view-alerts
h) oia-incident-alerts-os
i) oia-view-alerts-policy
j) oia-view-groups-policy
k) incident-collaboration
l) oia-incidents-os-template
m) oia-incidents-os
n) oia-incidents
o) oia-my-incidents
4. Update oia-alerts-stream
pstream definition to have default values for a_ticket_id
as Not Available.(RDA Administration → Persistent Stream → oia-alerts stream → Edit)
1. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle
and Click on deploy action
2. Enable ML experiments manually if any experiments are configured Organization --> Configuration --> ML Experiments
3. Update oia-alerts-stream
pstream definition to have default values for a_ticket_id
as Not Available.(RDA Administration → Persistent Stream → oia-alerts stream → Edit)
4. By default resizableColumns
: false for alerts and incidents tabular report. If you want to resizable for alerts and incidents tabular report then make it true. Go to user Configuration -> RDA Administration -> User Dashboards then search below Dashboard
a) oia-alert-group-view-alerts-os
b) oia-alert-group-view-details-os
c) oia-alert-groups-os
d) oia-alert-tracking-os
e) oia-alerts-os
f) oia-event-tracking-os
g) oia-event-tracking-view-alerts
h) oia-incident-alerts-os
i) oia-view-alerts-policy
j) oia-view-groups-policy
k) incident-collaboration
l) oia-incidents-os-template
m) oia-incidents-os
n) oia-incidents
o) oia-my-incidents
2.6.4.2 DNAC
1. Deploy latest dna_center_bundle bundle from Configuration → RDA Integrations → Bundles → Click deploy action row level for dna_center_bundle.
2. Upload latest dictionaries of device_family_alias and dnac_building dictionaries
3. Need to Run Historical Data pipelines (4 of them) which are in published pipelines. All these pipelines need to be executed based on the data and by changing the query for a specific set of data to be filtered and execute the pipeline on specific rows.
4. Once Historical data pipelines execution is successfully completed (which might take a couple of hours to complete), We need to delete all 4 pipelines as shown in the screenshot.
Note
Update the schedule timings in Service Blueprints after the deployment as per the requirement.
2.7. Upgrade from 7.3 to 7.4
Note
This is a Non-K8s cli Upgrade Document
RDAF Infra Upgarade from 1.0.2 to 1.0.3
RDAF Platform: From 3.3 to 3.4
OIA (AIOps) Application: From 7.3 to 7.4
RDAF Deployment rdaf
CLI: From 1.1.10 to 1.2.0
RDAF Client rdac
CLI: From 3.3 to 3.4
2.7.1. Prerequisites
Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.
-
RDAF Deployment CLI version: 1.1.10
-
Infra Services tag: 1.0.2,1.0.2.1(nats, haproxy)
-
Platform Services and RDA Worker tag: 3.3
-
OIA Application Services tag: 7.3,7.3.0.1
-
AIA Application Services tag: 7.3
-
Delete “alert-model” dataset from datasets reports on UI before start upgrade
-
Check all MariaDB nodes are sync on HA setup using below commands before start upgrade
-
mysql -u<mysql username> -p<mysql password> -h <host IP> -P3307 -e "show status like 'wsrep_local_state_comment';"
-
mysql -u<mysql username> -p<mysql password> -h <host IP> -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";
-
CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed
Warning
Make sure all of the above pre-requisites are met before proceeding with the upgrade process.
Warning
Non-Kubernetes: Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
Important
Please make sure full backup of the RDAF platform system is completed before performing the upgrade.
Non-Kubernetes: Please run the below backup command to take the backup of application data.
Note: Please make sure this backup-dir is mounted across all infra,cli vms.- Verify that RDAF deployment
rdaf
cli version is 1.1.10 on the VM where CLI was installed for docker on-prem registry and managing Non-kubernetes deployments.
- On-premise docker registry service version is 1.0.2
- RDAF Infrastructure services version is 1.0.2 (
rda-nats
service version is1.0.2.1
andrda-minio
service version isRELEASE.2022-11-11T03-44-20Z
)
Run the below command to get RDAF Infra services details
- RDAF Platform services version is 3.3
Run the below command to get RDAF Platform services details
- RDAF OIA Application services version is 7.3 / 7.3.0.1
Run the below command to get RDAF App services details
RDAF Deployment CLI Upgrade:Please follow the below given steps.
Note
Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.
Login into the VM where rdaf
deployment CLI was installed for docker on-prem registry and managing Non-kubernetes deployment.
- To stop application services, run the below command. Wait until all of the services are stopped.
- To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
- To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
Note
Go to each mariaDB node and docker stop of mariaDB nodes eg: docker stop --time 120 infra-mariadb-1
If the setup is standalone go to mariaDB node and do docker stop (db container ID) –time 120
If it is a cluster we have to follow the reverse order to stop the services (node3 , node2 & node1)
- To stop RDAF Infra services, run the below command. Wait until all of the services are stopped.
- Download the RDAF Deployment CLI's newer version 1.2.0 bundle
- Upgrade the
rdaf
CLI to version 1.2.0
- Verify the installed
rdaf
CLI version is upgraded to 1.2.0
- Download the RDAF Deployment CLI's newer version 1.2.0 bundle and copy it to RDAF management VM on which `rdaf deployment CLI was installed.
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.2.0
- Verify the installed
rdaf
CLI version
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.2.0
- Verify the installed
rdaf
CLI version
- To stop application services, run the below command. Wait until all of the services are stopped.
- To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
- To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
Note
Go to each mariaDB node and docker stop of mariaDB nodes eg: docker stop --time 120 infra-mariadb-1
If the setup is standalone go to mariaDB node and do docker stop (db container ID) –time 120
If it is a cluster we have to follow the reverse order to stop the services (node3 , node2 & node1)
- To stop RDAF Infra services, run the below command. Wait until all of the services are stopped.
2.7.2. Download the new Docker Images
Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.
Note
Run the below command only when graphdb service to be installed. It is an optional service.
Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.
Please make sure 1.0.3 image tag is downloaded for the below RDAF Infra services.- haproxy
- nats
- mariadb
- opensearch
- kafka
- redis
- redis-sentinel
Please make sure 3.4 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-registry
- rda-rda-scheduler
- rda-collector
- rda-identity
- rda-fsm
- rda-access-manager
- rda-resource-manager
- rda-user-preferences
- onprem-portal
- onprem-portal-nginx
- rda-worker-all
- onprem-portal-dbinit
- cfxdx-nb-nginx-all
- rda-event-gateway
- rda-chat-helper
- rdac
- rdac-full
Please make sure 7.4 image tag is downloaded for the below RDAF OIA Application services.
- rda-app-controller
- rda-alert-processor
- rda-file-browser
- rda-smtp-server
- rda-ingestion-tracker
- rda-reports-registry
- rda-ml-config
- rda-event-consumer
- rda-webhook-server
- rda-irm-service
- rda-alert-ingester
- rda-collaboration
- rda-notification-service
- rda-configuration-service
- rda-irm-service
- rda-alert-processor-companion
- rda-event-consumer
- rda-alert-ingester
Downloaded Docker images are stored under the below path.
/opt/rdaf/data/docker/registry/v2
Run the below command to check the filesystem's disk usage on which docker images are stored.
Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.
2.7.3.Upgrade Steps
2.7.3.1 Upgrade RDAF Infra Services
RDA Fabric platform has introduced supporting GraphDB service in 3.4 release. It is an optional service and it can be skipped during the upgrade process.
Download the python script (rdaf_upgrade_1110_120.py
)
Please run the downloaded python upgrade script.
It generates a new values.yaml.latest
with new environment variables for rda_scheduler infrastructure service.
-
Verify after running the upgrade script it has to clear data in mount points /kafka-logs , /zookeeper and it has to delete zookeeper entries in
/opt/rdaf/rdaf.cfg
file and ininfra.yaml
file. -
Open
/opt/rdaf/rdaf.cfg
file and search for kraft_cluster_id in the kafka section it has to update. -
Once the above python script is executed it will create
/opt/rdaf/deployment-scripts/values.yaml.latest
file.
Note
Please take a backup of /opt/rdaf/deployment-scripts/values.yaml
file.
cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup
Edit /opt/rdaf/deployment-scripts/values.yaml
and apply the below changes for rda_scheduler service.
- Look for section scheduler env and copy NUM_SERVER_PROCESSES: 4 and update in
values.yaml
for the scheduler section. As shown in the below example
rda_scheduler:
mem_limit: 2G
memswap_limit: 2G
privileged: false
environment:
NUM_SERVER_PROCESSES: 4
RDA_GIT_ACCESS_TOKEN: ''
RDA_GIT_URL: https://api.github.com
RDA_GITHUB_ORG: ''
RDA_GITHUB_REPO: ''
RDA_GITHUB_BRANCH_PREFIX: main
RDA_ENABLE_TRACES: 'no'
DISABLE_REMOTE_LOGGING_CONTROL: 'no'
RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
Tip
Please skip the below step if GraphDB service is NOT going to be installed.
Warning
For installing GraphDB service, please add additional disk to RDA Fabric Infrastructure VM. Clicking Here
It is a pre-requisite and this step need to be completed before installing the GraphDB service.
- Upgrade kafka infra Service using below command
Run the below RDAF command to check infra status
+----------------------+----------------+-----------------+--------------+--------------+
| Name | Host | Status | Container Id | Tag |
+----------------------+----------------+-----------------+--------------+--------------+
| haproxy | 192.168.107.63 | Up 20 hours | a78256a09ee6 | 1.0.3 |
| haproxy | 192.168.107.64 | Up 20 hours | 968fe5c56865 | 1.0.3 |
| keepalived | 192.168.107.63 | active | N/A | N/A |
| keepalived | 192.168.107.64 | active | N/A | N/A |
| nats | 192.168.107.63 | Up 20 hours | ca708ba9a4ae | 1.0.3 |
| nats | 192.168.107.64 | Up 20 hours | 0755f1107200 | 1.0.3 |
| mariadb | 192.168.107.63 | Up 20 hours | f83efc183641 | 1.0.3 |
| mariadb | 192.168.107.64 | Up 20 hours | 6d9fb5d84d7c | 1.0.3 |
| mariadb | 192.168.107.65 | Up 13 hours | 014fd3e72f0a | 1.0.3 |
| opensearch | 192.168.107.63 | Up 20 hours | ffebb31f79ab | 1.0.3 |
| opensearch | 192.168.107.64 | Up 20 hours | e539c56b2ff8 | 1.0.3 |
| opensearch | 192.168.107.65 | Up 13 hours | 3f29d7388301 | 1.0.3 |
| kafka | 192.168.107.63 | Up 20 hours | cb15f52eb5d2 | 1.0.3 |
+----------------------+----------------+-----------------+--------------+--------------+
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| Name | Check | Status | Reason | Host | Container Id |
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| haproxy | Port Connection | OK | N/A | 192.168.107.63 | a78256a09ee6 |
| haproxy | Service Status | OK | N/A | 192.168.107.63 | a78256a09ee6 |
| haproxy | Firewall Port | OK | N/A | 192.168.107.63 | a78256a09ee6 |
| haproxy | Port Connection | OK | N/A | 192.168.107.64 | 968fe5c56865 |
| haproxy | Service Status | OK | N/A | 192.168.107.64 | 968fe5c56865 |
| haproxy | Firewall Port | OK | N/A | 192.168.107.64 | 968fe5c56865 |
| keepalived | Service Status | OK | N/A | 192.168.107.63 | N/A |
| keepalived | Service Status | OK | N/A | 192.168.107.64 | N/A |
| nats | Port Connection | OK | N/A | 192.168.107.63 | ca708ba9a4ae |
| nats | Service Status | OK | N/A | 192.168.107.63 | ca708ba9a4ae |
| nats | Firewall Port | OK | N/A | 192.168.107.63 | ca708ba9a4ae |
+----------------+-----------------+--------+------------------------------+----------------+--------------+
Note
In infra healthcheck or infra status one of the mariaDB node is down or failed we have to restart that node which is in exit or restarting state
docker restart <container id>
After restart if the node wont come up go to /opt/rdaf/config/mariadb/my_custom.cnf
and change the value in innodb like below
innodb_force_recovery=1
After changing the above parameter, restart the MariaDB container again. It should bring back the db up. After bringing up the MariaDB node make sure it is up and running & then delete the above added parameter
Verify all MariaDB nodes are sync on HA setup using below commands after infra upgrade
mysql -u<username> -p<password> -h <host IP> -P3307 -e "show status like 'wsrep_local_state_comment';"
+---------------------------+--------+
| Variable_name | Value |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+
mysql -u<username> -p<password> -h <host IP> -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
+----------------------+--------------+-------------+--------------+--------------------------------+
| graphdb[agent] | 192.168.133.97 | Up 18 hours | 3f90a6003415 | 1.0.3 |
| graphdb[agent] | 192.168.133.98 | Up 19 hours | c26141a16a97 | 1.0.3 |
| graphdb[agent] | 192.168.133.99 | Up 19 hours | 19ea6f54b5fa | 1.0.3 |
| graphdb[server] | 192.168.133.97 | Up 18 hours | f8fb50727a13 | 1.0.3 |
| graphdb[server] | 192.168.133.98 | Up 19 hours | 9c1f7d9d9dbb | 1.0.3 |
| graphdb[server] | 192.168.133.99 | Up 19 hours | 60a08e139c19 | 1.0.3 |
| graphdb[coordinator] | 192.168.133.97 | Up 18 hours | 56604839c6fc | 1.0.3 |
| graphdb[coordinator] | 192.168.133.98 | Up 19 hours | a1814d1a32ba | 1.0.3 |
| graphdb[coordinator] | 192.168.133.99 | Up 19 hours | 51df56d349c1 | 1.0.3 |
+----------------------+----------------+-------------+--------------+------------------------------+
Note
The Below Command will upgrade configuration in mariadb
It will take time to complete the below step
Note
The Below Command will create new kafka user with existing tenant id
2.7.3.2 Upgrade RDAF Platform Services
Warning
For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
Run the below command to initiate upgrading RDAF Platform services.
Please wait till all of the new platform service are in Up state and run the below command to verify their status and make sure all of them are running with 3.4 version.
+--------------------------+----------------+------------+--------------+------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+----------------+------------+--------------+------+
| rda_api_server | 192.168.107.61 | Up 5 hours | 6fc70d6b82aa | 3.4 |
| rda_api_server | 192.168.107.62 | Up 5 hours | afa31a2c614b | 3.4 |
| rda_registry | 192.168.107.61 | Up 5 hours | 9f8adbb08b95 | 3.4 |
| rda_registry | 192.168.107.62 | Up 5 hours | cc8e5d27eb0a | 3.4 |
| rda_scheduler | 192.168.107.61 | Up 5 hours | f501e240e7a3 | 3.4 |
| rda_scheduler | 192.168.107.62 | Up 5 hours | c5b2b258efe1 | 3.4 |
| rda_collector | 192.168.107.61 | Up 5 hours | 2260fc37ebe5 | 3.4 |
| rda_collector | 192.168.107.62 | Up 5 hours | 3e7ab4518394 | 3.4 |
+--------------------------+----------------+------------+--------------+------+
Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 02532fe3e9d9 | a9dcda71 | | service-status | ok | |
| rda_app | alert-ingester | 02532fe3e9d9 | a9dcda71 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 02532fe3e9d9 | a9dcda71 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 02532fe3e9d9 | a9dcda71 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 02532fe3e9d9 | a9dcda71 | | kafka-connectivity | ok | Cluster=ZTkxMmRjOTRjZDZiMTFlZQ, Broker=3, Brokers=[1, 2, 3] |
| rda_app | alert-ingester | 5f9b978db3e9 | 4d0892ee | | service-status | ok | |
| rda_app | alert-ingester | 5f9b978db3e9 | 4d0892ee | | minio-connectivity | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.7.3.3 Upgrade rdac
CLI
Run the below command to upgrade the rdac
CLI
2.7.3.4 Upgrade RDA Worker Services
- Upgrade RDA Worker Services
Please run the below command to initiate upgrading the RDA Worker service PODs.
Please wait for 120 seconds to let the newer version of RDA Worker service containers join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service containers.
+------------+----------------+-------------+--------------+-----+
| Name | Host | Status | Container Id | Tag |
+------------+----------------+-------------+--------------+-----+
| rda_worker | 192.168.107.61 | Up 23 hours | a8a33e57e9b6 | 3.4 |
| rda_worker | 192.168.107.62 | Up 23 hours | 9fc328bc0e26 | 3.4 |
+------------+----------------+-------------+--------------+-----+
2.7.3.5 Upgrade OIA Application Services
Run the below commands to initiate upgrading the RDA Fabric OIA Application services.
Please wait till all of the new OIA application service containers are in Up state and run the below command to verify their status and make sure they are running with 7.4 version.
+-----------------------------------+----------------+-------------+--------------+-----+
| Name | Host | Status | Container Id | Tag |
+-----------------------------------+----------------+-------------+--------------+-----+
| cfx-rda-app-controller | 192.168.107.66 | Up 23 hours | 1237d8c481d1 | 7.4 |
| cfx-rda-app-controller | 192.168.107.67 | Up 23 hours | 0d501cca27ba | 7.4 |
| cfx-rda-reports-registry | 192.168.107.66 | Up 23 hours | 65c0007b110e | 7.4 |
| cfx-rda-reports-registry | 192.168.107.67 | Up 23 hours | 90a43cd57188 | 7.4 |
| cfx-rda-notification-service | 192.168.107.66 | Up 23 hours | 11b53b25c182 | 7.4 |
| cfx-rda-notification-service | 192.168.107.67 | Up 23 hours | 3206acc1612f | 7.4 |
| cfx-rda-file-browser | 192.168.107.66 | Up 23 hours | bd8469446bb6 | 7.4 |
| cfx-rda-file-browser | 192.168.107.67 | Up 23 hours | 31f5f3ecd347 | 7.4 |
+-----------------------------------+----------------+-------------+--------------+-----+
cfxdimensions-app-irm_service
has leader status under Site
column.
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------------+--------+--------------+---------------+--------------|
| App | cfxdimensions-app-access-manager | True | bd9e264212b5 | 68f9c494 | | 22:52:26 | 8 | 31.33 | | |
| App | cfxdimensions-app-access-manager | True | 5695b14a7743 | 9499b9f8 | | 22:50:52 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | 8465a6e01886 | cefbcfaa | | 22:23:26 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | 610a07bd2893 | d33b198b | | 22:23:05 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | 88352870e685 | e6ca73b0 | | 22:31:19 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | 18cdb22d4439 | 56e874fd | | 22:30:57 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | a53da18e68e8 | cdaf8950 | *leader* | 22:25:01 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | ae42ce5f7c5a | 472c324a | | 22:24:39 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | a11edf83127d | ba7d0978 | | 22:32:15 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | 458a0b43be9f | 2289a696 | | 22:31:53 | 8 | 31.33 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-status | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 8c2198aa42b9 | 3661b780 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=2, Brokers=[2, 3, 1] |
| rda_app | alert-ingester | 795652ebd914 | 91c603f4 | | service-status | ok | |
| rda_app | alert-ingester | 795652ebd914 | 91c603f4 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 795652ebd914 | 91c603f4 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.7.4.Post Upgrade Steps
2.7.4.1 OIA
1. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle
and Click on deploy action
( if not deploy these bundle while drill down incident won’t show pages like Alerts, Insights etc.. )
2. Enable ML experiments manually if any experiments are configured Organization --> Configuration --> ML Experiments
3. By default resizableColumns
: false for alerts and incidents tabular report. If you want to resizable for alerts and incidents tabular report then make it true. Go to user Configuration -> RDA Administration -> User Dashboards then search below Dashboard
a) oia-alert-group-view-alerts-os
b) oia-alert-group-view-details-os
c) oia-alert-groups-os
d) oia-alert-tracking-os
e) oia-alerts-os
f) oia-event-tracking-os
g) oia-event-tracking-view-alerts
h) oia-incident-alerts-os
i) oia-view-alerts-policy
j) oia-view-groups-policy
k) incident-collaboration
l) oia-incidents-os-template
m) oia-incidents-os
n) oia-incidents
o) oia-my-incidents
4. Collaboration Service changes
-
Post deployment, modify the following file inside each of the collaboration docker service
-
To get the container id of the collaboration service use the following command to see where the collaboration service is running
-
docker exec -it (container-id) bash
-
vi /usr/lib/python3.7/site-packages/cfxdimensions-app-collaboration/app.properties
a) waitq.loop.exec.delay.secs=300
b) waitq.active.incidents.exec.threads=1
-
docker restart (container-id)
2.7.4.2 Post Installation FSM Steps (Applicable only for installations with FSM)
1. Update FSM model:
https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.0/oia_ticketing_with_soothing_interval.yml
2. Deploy below Bundles from Configuration -> RDA Administration ->Bundles
3. Update Pipelines from links given below to the Published Pipelines
https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.0/pipelines/fsm_read_incident_stream.yml
4. Update Close BMC Ticket Blueprint to run every 5 minutes instead of the previous 15-minute interval.
5. Enable below service blueprints from Configuration -> RDA Administration -> Service Blueprints
-
FSM Read Incident Stream
-
FSM Read Alert Stream
-
Create Ticket
-
Update Ticket
-
Resolve Ticket
-
Close BMC Ticket
2.7.4.3 DNAC
Below are the steps to upgrade ONLY DNAC functionality
1. Deploy latest dna_center_bundle
from Configuration → RDA Administration → Bundles → Click deploy action row level for dna_center_bundle.
Note
Steps 2, 3 & 4 items dictionary/Template files can be downloaded from below
2. Upload Latest dictionary of Device Family Alias dictionary from Configuration → RDA Administration → Datasets and search for device_family_alias
and Click on Manage Data action row level and Click on Import and upload latest device family dictionary and click on Save
3. In the same Datasets page look for dnac_host_info
and Click on Manage Data action row level and Click on Import and upload the latest dnac host info file and click on Save.
Note
It is recommended to add new dataset instead of importing the file to existing dataset.
4. Latest DNAC HTML Template can be uploaded from Configuration → RDA Administration → Object Store. Click on Upload and provide name as dynamic_dnac_template.html
then folder name as widget_labels
and upload latest HTML template and select check box of Enable Overwrite and click on Add
5. In Configuration → RDA Administration → Pipelines → Published Pipelines. Need to modify dnac_add_sources
pipeline by uncommenting the line of %% import_source = ‘DNAC_Alpharetta’
.
2.7.4.4 BCS
Below are the steps to upgrade ONLY BCS functionality
- Deploy latest
dna_center_bundle
bundle from Configuration → RDA Administration → Bundles → Click deploy action row level for bcs_operational_insights.
2.8. Upgrade From 7.3/7.4 to 7.4.1
RDAF Infra Upgrade: from 1.0.2 to 1.0.3, 1.0.3.1(haproxy)
RDAF Platform: From 3.3 to 3.4.1
OIA (AIOps) Application: From 7.3 to 7.4.1
RDAF Deployment rdafk8s
CLI: From 1.1.10 to 1.2.1
RDAF Client rdac
CLI: From 3.3 to 3.4.1
RDAF Infra Upgrade: From 1.0.3.1(haproxy)
RDAF Platform: From 3.4 to 3.4.1
OIA (AIOps) Application: From 7.4 to 7.4.1
RDAF Deployment rdaf
CLI: From 1.2.0 to 1.2.1
RDAF Client rdac
CLI: From 3.4 to 3.4.1
2.8.1. Prerequisites
Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.
-
RDAF Deployment CLI version: 1.1.10
-
Infra Services tag: 1.0.2,1.0.2.1(nats, haproxy)
-
Platform Services and RDA Worker tag: 3.3
-
OIA Application Services tag: 7.3,7.3.0.1(event_consumer),7.3.2(alert-ingester)
-
CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed
-
Delete
alert-model
dataset from datasets reports on UI before start upgrade -
Check all MariaDB nodes are sync on HA setup using below commands before start upgrade
Danger
Upgrading both kafka and mariadb infra services require a downtime to the RDAF platform and application services.
Please proceed to the below steps only after scheduled downtime is approved.
Tip
Please run the below commands on the VM host where RDAF deployment CLI was installed and rdafk8s setup
command was run. The mariadb
configuration is read from /opt/rdaf/rdaf.cfg
file.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "show status like 'wsrep_local_state_comment';"
Please verify that the mariadb
cluster state is in Synced state.
+---------------------------+--------+
| Variable_name | Value |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+
Please run the below command and verify that the mariadb
cluster size is 3.
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";
-
RDAF Deployment CLI version: 1.2.0
-
Infra Services tag: 1.0.3
-
Platform Services and RDA Worker tag: 3.4
-
OIA Application Services tag: 7.4
-
AIA Application Services tag: 7.4
-
CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed
Danger
In this release, all of the RDAF Infrastucture services are upgraded. So, it is mandatory to take VM level snapshot before proceeding with the upgrade process.
Warning
Make sure all of the above pre-requisites are met before proceeding with the upgrade process.
Warning
Kubernetes: Though Kubernetes based RDA Fabric deployment supports zero downtime upgrade, it is recommended to schedule a maintenance window for upgrading RDAF Platform and AIOps services to newer version.
Non-Kubernetes: Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
Important
Please make sure full backup of the RDAF platform system is completed before performing the upgrade.
Kubernetes: Please run the below backup command to take the backup of application data.
Non-Kubernetes: Please run the below backup command to take the backup of application data. Note: Please make sure this backup-dir is mounted across all infra,cli vms.Run the below command on RDAF Management system and make sure the Kubernetes PODs are NOT in restarting
mode (it is applicable to only Kubernetes environment)
- Verify that RDAF deployment
rdaf
cli version is 1.2.0 orrdafk8s
cli version is 1.1.10 on the VM where CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployments.
- On-premise docker registry service version is 1.0.2
- RDAF Infrastructure services version is 1.0.2 (
rda-nats
service version is1.0.2.1
andrda-minio
service version isRELEASE.2022-11-11T03-44-20Z
)
Run the below command to get RDAF Infra services details
- RDAF Platform services version is 3.3
Run the below command to get RDAF Platform services details
- RDAF OIA Application services version is 7.3/7.3.0.1/7.3.2
Run the below command to get RDAF App services details
Run the below command to get RDAF Infra services details
- RDAF Platform services version is 3.4
Run the below command to get RDAF Platform services details
- RDAF OIA Application services version is 7.4
Run the below command to get RDAF App services details
RDAF Deployment CLI Upgrade:
Please follow the below given steps.
Note
Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.
Login into the VM where rdaf & rdafk8s
deployment CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.
- Download the RDAF Deployment CLI's newer version 1.2.1 bundle.
- Upgrade the
rdaf & rdafk8s
CLI to version 1.2.1
- Verify the installed
rdaf & rdafk8s
CLI version is upgraded to 1.2.1
- Download the RDAF Deployment CLI's newer version 1.1.10 bundle and copy it to RDAF management VM on which
rdaf & rdafk8s
deployment CLI was installed.
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.2.1
- Verify the installed
rdaf
CLI version
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.2.1
- Verify the installed
rdaf
CLI version
- Download the RDAF Deployment CLI's newer version 1.1.10 bundle
- Upgrade the
rdaf
CLI to version 1.2.1
- Verify the installed
rdaf
CLI version is upgraded to 1.2.1
- Download the RDAF Deployment CLI's newer version 1.2.1 bundle and copy it to RDAF management VM on which
rdaf & rdafk8s
deployment CLI was installed.
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.2.1
- Verify the installed
rdaf
CLI version
- Extract the
rdaf
CLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdaf
CLI to version 1.2.1
- Verify the installed
rdaf
CLI version
2.8.2. Download the new Docker Images
Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.
Run the below command to upgrade the registry
To fetch registry please use the below commandRun the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.
Please make sure 1.0.3.1 image tag is downloaded for the below RDAF Infra service.
- rda-platform-haproxy
Please make sure 1.0.3 image tag is downloaded for the below RDAF Infra service.
- rda-platform-haproxy
- rda-platform-kafka
- rda-platform-zookeeper
- rda-platform-mariadb
- rda-platform-opensearch
- rda-platform-nats
- rda-platform-busybox
- rda-platform-nats-box
- rda-platform-nats-boot-config
- rda-platform-nats-server-config-reloader
- rda-platform-prometheus-nats-exporter
- rda-platform-redis
- rda-platform-redis-sentinel
- rda-platform-arangodb-starter
- rda-platform-kube-arangodb
- rda-platform-arangodb
- rda-platform-kubectl
- rda-platform-logstash
- rda-platform-fluent-bit
Please make sure RELEASE.2023-09-30T07-02-29Z image tag is downloaded for the below RDAF Infra service.
- minio
Please make sure 3.4.1 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-registry
- rda-rda-scheduler
- rda-collector
- rda-identity
- rda-fsm
- rda-stack-mgr
- rda-access-manager
- rda-resource-manager
- rda-user-preferences
- onprem-portal
- onprem-portal-nginx
- rda-worker-all
- onprem-portal-dbinit
- cfxdx-nb-nginx-all
- rda-event-gateway
- rda-chat-helper
- rdac
- rdac-full
- cfxcollector
Please make sure 7.4.1 image tag is downloaded for the below RDAF OIA Application services.
- rda-app-controller
- rda-alert-processor
- rda-file-browser
- rda-smtp-server
- rda-ingestion-tracker
- rda-reports-registry
- rda-ml-config
- rda-event-consumer
- rda-webhook-server
- rda-irm-service
- rda-alert-ingester
- rda-collaboration
- rda-notification-service
- rda-configuration-service
- rda-irm-service
- rda-alert-processor-companion
Downloaded Docker images are stored under the below path.
/opt/rdaf/data/docker/registry/v2
Run the below command to check the filesystem's disk usage on which docker images are stored.
Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.
2.8.3.Upgrade Steps
2.8.3.1 Upgrade RDAF Infra Services
2.8.3.1.1 Update RDAF Infra/Platform Services Configuration
Please download the below python script (python rdaf_upgrade_1110_121.py
)
Warning
Please verify the python binary version using which RDAF deployment CLI was installed.
ls -l /home/rdauser/.local/lib
--> this will show python version as a directory name. (ex: python3.7 or python3.8)
python --version
--> The major version (ex: Python 3.7.4 or 3.8.10) should match output from the above.
If it doesn't match, please run the below commands.
sudo mv /usr/bin/python /usr/bin/python_backup
sudo ln -s /usr/bin/python3.7 /usr/bin/python
--> Please choose the python binary version using which RDAF deployment CLI was installed. In this example, pythin3.7 binary was used.
Note: If the python version is not either 3.7.x or 3.8.x, please stop the upgrade and contact CloudFabrix support for additional assistance.
Please run the downloaded python upgrade script rdaf_upgrade_1110_121.py as shown below.
The below step will generate *values.yaml.latest files for all RDAF Infrastructure services under /opt/rdaf/deployment-scripts
directory.
Please run the below commands to take backup of the values.yaml files of Infrastrucutre and Application services.
cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup
cp /opt/rdaf/deployment-scripts/nats-values.yaml /opt/rdaf/deployment-scripts/nats-values.yaml.backup
cp /opt/rdaf/deployment-scripts/minio-values.yaml /opt/rdaf/deployment-scripts/minio-values.yaml.backup
cp /opt/rdaf/deployment-scripts/mariadb-values.yaml /opt/rdaf/deployment-scripts/mariadb-values.yaml.backup
cp /opt/rdaf/deployment-scripts/opensearch-values.yaml /opt/rdaf/deployment-scripts/opensearch-values.yaml.backup
cp /opt/rdaf/deployment-scripts/kafka-values.yaml /opt/rdaf/deployment-scripts/kafka-values.yaml.backup
cp /opt/rdaf/deployment-scripts/redis-values.yaml /opt/rdaf/deployment-scripts/redis-values.yaml.backup
cp /opt/rdaf/deployment-scripts/arangodb-operator-values.yaml /opt/rdaf/deployment-scripts/arangodb-operator-values.yaml.backup
Update NATs configuration:
Run the below command to copy the upgraded NATs configuration from nats-values.yaml.latest
to nats-values.yaml
cp /opt/rdaf/deployment-scripts/nats-values.yaml.latest /opt/rdaf/deployment-scripts/nats-values.yaml
Please update the memory limit value (below highlighted parameters) in /opt/rdaf/deployment-scripts/nats-values.yaml
by copying the current value from /opt/rdaf/deployment-scripts/nats-values.yaml.backup
file.
Note: Below given values are for a reference only.
Update Minio configuration:
Run the below command to copy the upgraded Minio configuration from minio-values.yaml.latest
to minio-values.yaml
cp /opt/rdaf/deployment-scripts/minio-values.yaml.latest /opt/rdaf/deployment-scripts/minio-values.yaml
Please update the memory limit value (below highlighted parameters) in /opt/rdaf/deployment-scripts/minio-values.yaml
by copying the current value from /opt/rdaf/deployment-scripts/minio-values.yaml.backup
file.
Note: Below given values are for a reference only.
Update Opensearch configuration:
Run the below command to copy the upgraded Opensearch configuration opensearch-values.yaml.latest
to opensearch-values.yaml
cp /opt/rdaf/deployment-scripts/opensearch-values.yaml.latest /opt/rdaf/deployment-scripts/opensearch-values.yaml
Please update the opensearchJavaOpts and memory limit values (below highlighted parameters) in /opt/rdaf/deployment-scripts/opensearch-values.yaml
by copying the current value from /opt/rdaf/deployment-scripts/opensearch-values.yaml.backup
file.
Note: Below given values are for a reference only.
Update Redis configuration:
Run the below command to copy the upgraded Redis configuratoin from redis-values.yaml.latest
to redis-values.yaml
cp /opt/rdaf/deployment-scripts/redis-values.yaml.latest /opt/rdaf/deployment-scripts/redis-values.yaml
Update MariaDB configuration:
Run the below command to copy the upgraded MariaDB configuratoin from mariadb-values.yaml.latest
to mariadb-values.yaml
cp /opt/rdaf/deployment-scripts/mariadb-values.yaml.latest /opt/rdaf/deployment-scripts/mariadb-values.yaml
Please update the below parameters (highlighted parameters in the given config example) in /opt/rdaf/deployment-scripts/mariadb-values.yaml
file.
-
memory: Update it by copying the current value from
/opt/rdaf/deployment-scripts/mariadb-values.yaml.backup
file -
initialDelaySeconds: set the value to 1200 (Under livenessProbe section)
-
failureThreshold: set the value to 15 (Under livenessProbe section)
-
expire_logs_days set the value to 1
-
innodb_buffer_pool_size: Update it by copying the current value from
/opt/rdaf/deployment-scripts/mariadb-values.yaml.backup
file -
Comment out wsrep_replicate_myisam=ON line. Please ignore, if it is already commented out.
Note: Below given values are for a reference only.
Update Kafka configuration:
Run the below command to copy the upgraded Kafka configuratoin from kafka-values.yaml.latest
to kafka-values.yaml
cp /opt/rdaf/deployment-scripts/kafka-values.yaml.latest /opt/rdaf/deployment-scripts/kafka-values.yaml
Please update the below parameters (highlighted parameters in the given config example) in /opt/rdaf/deployment-scripts/kafka-values.yaml
file.
-
memory: Update it by copying the current value from
/opt/rdaf/deployment-scripts/kafka-values.yaml.backup
file -
nodePorts: Update it by copying the current value from
kafka-values.yaml.backup
file, please make sure to maintain the order of the nodePorts same as in the current configuration. -
initialDelaySeconds: set the value to 1200 (Under livenessProbe section)
-
failureThreshold: set the value to 15 (Under livenessProbe section)
Note: Below given values are for a reference only.
Update rda_scheduler
Service Configuration:
Please take a backup of the /opt/rdaf/deployment-scripts/values.yaml
Edit /opt/rdaf/deployment-scripts/values.yaml
file and update the rda_scheduler service configuration by adding the below environment variable as shown below.
- NUM_SERVER_PROCESSES: Set the value to 4
....
....
rda_scheduler:
replicas: 1
privileged: true
resources:
requests:
memory: 100Mi
limits:
memory: 2Gi
env:
NUM_SERVER_PROCESSES: '4'
RDA_ENABLE_TRACES: 'no'
DISABLE_REMOTE_LOGGING_CONTROL: 'no'
RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
RDA_GIT_ACCESS_TOKEN: ''
RDA_GIT_URL: ''
RDA_GITHUB_ORG: ''
RDA_GITHUB_REPO: ''
RDA_GITHUB_BRANCH_PREFIX: ''
LABELS: tenant_name=rdaf-01
- Download the python script (
rdaf_upgrade_120_121.py
)
- Please run the downloaded python upgrade script.
- Install haproxy service using below command
Run the below RDAF command to check infra status
+----------------------+----------------+------------+--------------+------------------------------+
| Name | Host | Status | Container Id | Tag |
+----------------------+----------------+------------+--------------+------------------------------+
| haproxy | 192.168.133.97 | Up 2 hours | 342fc1338ba1 | 1.0.3.1 |
| haproxy | 192.168.133.98 | Up 2 hours | ec0de9d45a66 | 1.0.3.1 |
| keepalived | 192.168.133.97 | active | N/A | N/A |
| keepalived | 192.168.133.98 | active | N/A | N/A |
| nats | 192.168.133.97 | Up 4 hours | d2dc79419daa | 1.0.3 |
| nats | 192.168.133.98 | Up 4 hours | ef7c632bdb58 | 1.0.3 |
| minio | 192.168.133.93 | Up 4 hours | 414d2a2351b9 | RELEASE.2023-09-30T07-02-29Z |
| minio | 192.168.133.97 | Up 4 hours | aa0f20af7d70 | RELEASE.2023-09-30T07-02-29Z |
| minio | 192.168.133.98 | Up 4 hours | 91e123f8ba43 | RELEASE.2023-09-30T07-02-29Z |
| minio | 192.168.133.99 | Up 4 hours | 74e74cc328b5 | RELEASE.2023-09-30T07-02-29Z |
| mariadb | 192.168.133.97 | Up 4 hours | c2d71adc09ce | 1.0.3 |
| mariadb | 192.168.133.98 | Up 4 hours | 54615146c0fc | 1.0.3 |
| mariadb | 192.168.133.99 | Up 4 hours | 68e2a6088477 | 1.0.3 |
| opensearch | 192.168.133.97 | Up 3 hours | 7e700c133672 | 1.0.3 |
| opensearch | 192.168.133.98 | Up 3 hours | a582e7b552d6 | 1.0.3 |
| opensearch | 192.168.133.99 | Up 3 hours | f752837167e2 | 1.0.3 |
+----------------------+----------------+------------+--------------+------------------------------+
Run the below RDAF command to check infra healthcheck status
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| Name | Check | Status | Reason | Host | Container Id |
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| haproxy | Port Connection | OK | N/A | 192.168.133.97 | 340d7ce361e0 |
| haproxy | Service Status | OK | N/A | 192.168.133.97 | 340d7ce361e0 |
| haproxy | Firewall Port | OK | N/A | 192.168.133.97 | 340d7ce361e0 |
| haproxy | Port Connection | OK | N/A | 192.168.133.98 | 4a6015c9362a |
| haproxy | Service Status | OK | N/A | 192.168.133.98 | 4a6015c9362a |
| haproxy | Firewall Port | OK | N/A | 192.168.133.98 | 4a6015c9362a |
| keepalived | Service Status | OK | N/A | 192.168.133.97 | N/A |
| keepalived | Service Status | OK | N/A | 192.168.133.98 | N/A |
| nats | Port Connection | OK | N/A | 192.168.133.97 | 991873bb3420 |
| nats | Service Status | OK | N/A | 192.168.133.97 | 991873bb3420 |
| nats | Firewall Port | OK | N/A | 192.168.133.97 | 991873bb3420 |
| nats | Port Connection | OK | N/A | 192.168.133.98 | 016438fe2d17 |
| nats | Service Status | OK | N/A | 192.168.133.98 | 016438fe2d17 |
| nats | Firewall Port | OK | N/A | 192.168.133.98 | 016438fe2d17 |
| minio | Port Connection | OK | N/A | 192.168.133.93 | 0c3c86e896c6 |
| minio | Service Status | OK | N/A | 192.168.133.93 | 0c3c86e896c6 |
| minio | Firewall Port | OK | N/A | 192.168.133.93 | 0c3c86e896c6 |
| minio | Port Connection | OK | N/A | 192.168.133.97 | 604fc5ce14a3 |
| minio | Service Status | OK | N/A | 192.168.133.97 | 604fc5ce14a3 |
| minio | Firewall Port | OK | N/A | 192.168.133.97 | 604fc5ce14a3 |
| minio | Port Connection | OK | N/A | 192.168.133.98 | 0c2ae986076e |
| minio | Service Status | OK | N/A | 192.168.133.98 | 0c2ae986076e |
| minio | Firewall Port | OK | N/A | 192.168.133.98 | 0c2ae986076e |
| minio | Port Connection | OK | N/A | 192.168.133.99 | 67a7681a40b4 |
| minio | Service Status | OK | N/A | 192.168.133.99 | 67a7681a40b4 |
| minio | Firewall Port | OK | N/A | 192.168.133.99 | 67a7681a40b4 |
| mariadb | Port Connection | OK | N/A | 192.168.133.97 | 40e9915a3cf4 |
| mariadb | Service Status | OK | N/A | 192.168.133.97 | 40e9915a3cf4 |
| mariadb | Firewall Port | OK | N/A | 192.168.133.97 | 40e9915a3cf4 |
+----------------+-----------------+--------+------------------------------+----------------+--------------+
2.8.3.1.2 Upgrade RDAF Infra Services
- Upgrade haproxy service using below command
- Please use the below mentioned command to see haproxy is up and in Running state.
Warning
Please verify RDAF portal access to make sure it is accessible after haproxy service is ugpraded before proceeding to the next step.
- Upgrade nats service using below command
- Please use the below mentioned command and wait till all of the nats pods are in Running state and Ready status is 2/2
Tip
If the nats service upgrade is failed with PodDisruptionBudget policy version error message, please update the below file with apiVersion to policy/v1beta1
vi /home/rdauser/.local/lib/python3.7/site-packages/rdaf/deployments/helm/rda-nats/files/pod-disruption-budget.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
{{- include "nats.metadataNamespace" $ | nindent 2 }}
name: {{ .Values.podDisruptionBudget.name }}
labels:
{{- include "nats.labels" $ | nindent 4 }}
....
Run the nats service upgrade command.
- Upgrade minio service using below command
- Please use the below mentioned command and wait till all of the minio pods are in Running state and Ready status is 1/1
- Upgrade redis service using below command
- Please use the below mentioned command and wait till all of the redis pods are in Running state and Ready status is 1/1
- Upgrade opensearch service using below command
- Please use the below mentioned command and wait till all of the opensearch pods are in Running state and Ready status is 1/1
Run the below command to get RDAF Infra services details
Danger
Upgrading both kafka and mariadb infra services require a downtime to the RDAF platform and application services.
Please proceed to the below steps only after scheduled downtime is approved.
Please download the MariaDB upgrade scripts:
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/AP_7.4.1_migration_ddl_version_from_20_to_22.ql
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/AP_7.4.1_copy_history_data_version_from_20_to_22.ql
Stop RDAF Application Services:
- To stop
rda-webhook-server
application service and wait for 60 seconds. This step is to help stop receiving the incoming webhook alerts and allow rest of the application services complete processing the in-transit alerts.
- To stop all of the Application services.
- Check the Application services status. When all of the application services are stopped, it will show an empty output.
Upgrade kafka Service:
- Please run the below upgrade script
rdaf_upgrade_1110_121.py
. This script will clear all the data of Kafka and Zookeeper services under the mount points /kafka-logs and /zookeeper, and delete Kubernetes (k8s) pods, Helm charts, persistent volumes (pv), and persistent volume claims (pvc) configuration. After this step, it will uninstall the Kafka and Zookeeper services.
- Please run the below command to check kafka and zookeeper services are uninstalled.
- Install kafka service using below command.
- Please run the below command and wait till all of the kafka pods are in Running state and the Ready status is 1/1
- Please run the below command to create necessary Kafka Topics and corresponding configuration.
Upgrade mariadb Service:
- To stop mariadb services, run the below command. Wait until all of the services are stopped.
- Please run the below command to check mariadb pods are down
- Upgrade mariadb service using the below command
- Please run the below command and wait till all of the mariadb pods are in Running state and Ready status is 1/1
Warning
Please wait till all of the Kafka and MariaDB infra serivce pods are in Running state and Ready status is 1/1
- Run the below commands to check the status of the mariadb cluster. Please verify that the cluster state is in Synced state.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "show status like 'wsrep_local_state_comment';"
+---------------------------+--------+
| Variable_name | Value |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+
Run the below commands to check the cluster size of the mariadb cluster. Please verify that the cluster size is 3.
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
- Please run the below commands to drop the indexes on two alert tables of AIOps application services.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`
TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor -e "DROP INDEX IF EXISTS AlertAlternateKey on alert;"
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor -e "DROP INDEX IF EXISTS AlertHistoryAlternateKey on alerthistory;"
Warning
Please make sure above commands are executed successfully, before continuing to the below step.
- Please run the below command to upgrade the DB schema configuration of the mariadb serivce post the 1.0.3 version upgrade.
- Please run the below RDAF command to check infra services status
+--------------------------+----------------+-----------------+--------------+------------------------------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+----------------+-----------------+--------------+------------------------------+
| haproxy | 192.168.131.41 | Up 16 hours | e2b3b46f702d | 1.0.3.1 |
| haproxy | 192.168.131.42 | Up 5 hours | a89fdd2c5299 | 1.0.3.1 |
| keepalived | 192.168.131.41 | active | N/A | N/A |
| keepalived | 192.168.131.42 | active | N/A | N/A |
| rda-nats | 192.168.131.41 | Up 16 Hours ago | 3682271b3b58 | 1.0.3 |
| rda-nats | 192.168.131.42 | Up 4 Hours ago | 1f3599cf7193 | 1.0.3 |
| rda-minio | 192.168.131.41 | Up 16 Hours ago | 80a865d27b2c | RELEASE.2023-09-30T07-02-29Z |
| rda-minio | 192.168.131.42 | Up 4 Hours ago | 22c7da5bc030 | RELEASE.2023-09-30T07-02-29Z |
| rda-minio | 192.168.131.43 | Up 3 Weeks ago | 1af5abda3061 | RELEASE.2023-09-30T07-02-29Z |
| rda-minio | 192.168.131.48 | Up 3 Weeks ago | 7eec14f4ce0e | RELEASE.2023-09-30T07-02-29Z |
| rda-mariadb | 192.168.131.41 | Up 16 Hours ago | 2596eaddb435 | 1.0.3 |
| rda-mariadb | 192.168.131.42 | Up 4 Hours ago | c004da615516 | 1.0.3 |
| rda-mariadb | 192.168.131.43 | Up 2 Weeks ago | b49f33d491d6 | 1.0.3 |
| rda-opensearch | 192.168.131.41 | Up 16 Hours ago | 5595347d56d6 | 1.0.3 |
...
...
+--------------------------+--------------+-----------------+--------------+--------------------------------+
- Please run the below commands to create a copy of alert and alerthistory tables of
rda-alert-processor
service DB as a backup and update the schema.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`
TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor < AP_7.4.1_migration_ddl_version_from_20_to_22.ql
- Please run the below commands to copy the data from alert_bak and alerthistory_bak backup tables of
rda-alert-processor
service DB back to primary alert and alerthistory tables.
Note
The copy process would take sometime depends on the historical data in alerthistory table. Please continue with rest of the steps while the data is being copied.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`
TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor < AP_7.4.1_copy_history_data_version_from_20_to_22.ql
Installing GraphDB Service:
Tip
Please skip the below step if GraphDB service is NOT going to be installed.
Warning
For installing GraphDB service, please add additional disk to RDA Fabric Infrastructure VM. Clicking Here
It is a pre-requisite and this step need to be completed before installing the GraphDB service.
- Please use the below mentioned command and wait till all of the arangodb pods are in Running state.
2.8.3.2 Upgrade RDAF Platform Services
Step-1: Run the below command to initiate upgrading RDAF Platform services.
As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating
state and newer version PODs into Pending
state.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating
state.
Step-3: Run the below command to put all Terminating
RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance
command that required to be put in maintenance mode.
Note
If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.
Step-6: Run the below command to delete the Terminating
RDAF platform service PODs
for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.
Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.4.1
version.
+---------------------+----------------+-----------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+---------------------+----------------+-----------------+--------------+-------+
| rda-api-server | 192.168.131.46 | Up 21 Hours ago | 98ec9561d787 | 3.4.1 |
| rda-api-server | 192.168.131.45 | Up 21 Hours ago | e7b7cdb7d3d2 | 3.4.1 |
| rda-registry | 192.168.131.44 | Up 21 Hours ago | bc2fed4a15f3 | 3.4.1 |
| rda-registry | 192.168.131.46 | Up 21 Hours ago | 1b6da7ff3ce2 | 3.4.1 |
| rda-identity | 192.168.131.45 | Up 21 Hours ago | 30053cf6667e | 3.4.1 |
| rda-identity | 192.168.131.46 | Up 21 Hours ago | 6ee2e6a861f7 | 3.4.1 |
| rda-fsm | 192.168.131.44 | Up 21 Hours ago | c014e84bf197 | 3.4.1 |
| rda-fsm | 192.168.131.46 | Up 21 Hours ago | 6a609f8ab579 | 3.4.1 |
+---------------------+----------------+-----------------+--------------+-------+
Run the below command to check rda-fsm
service is up and running and also verify that one of the rda-scheduler
service is elected as a leader under Site
column.
+-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| Infra | api-server | True | rda-api-server | 5081891f | | 0 :29:54 | 8 | 31.33 | | |
| Infra | api-server | True | rda-api-server | 9fc5db97 | | 0 :29:52 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | f9b6a00d | | 0 :30:00 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | 0a4eb8cd | | 0 :30:01 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-7 | 758fc2cb | | 0 :30:51 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-7 | 3d56a31f | | 0 :28:49 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | 8b570be5 | | 0 :30:44 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | 44930ac7 | *leader* | 0 :30:47 | 8 | 31.33 | | |
| Infra | worker | True | rda-worker-69d | 91615244 | rda-site-01 | 0 :25:30 | 8 | 31.33 | 0 | 9 |
| Infra | worker | True | rda-worker-69d | af99d199 | rda-site-01 | 0 :25:31 | 8 | 31.33 | 2 | 14 |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
Warning
For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
-
To stop application services, run the below command. Wait until all of the services are stopped.
- To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
- To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
Run the below command to initiate upgrading RDAF Platform services.
Please wait till all of the new platform service are in Up state and run the below command to verify their status and make sure all of them are running with 3.3 version.
+--------------------------+----------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+----------------+------------+--------------+-------+
| rda_api_server | 192.168.133.92 | Up 2 hours | 6366c9717f07 | 3.4.1 |
| rda_api_server | 192.168.133.93 | Up 2 hours | d5b8c2722f72 | 3.4.1 |
| rda_registry | 192.168.133.92 | Up 2 hours | 47f722aab97b | 3.4.1 |
| rda_registry | 192.168.133.93 | Up 2 hours | f5ce662af82f | 3.4.1 |
| rda_scheduler | 192.168.133.92 | Up 2 hours | 28b597777069 | 3.4.1 |
| rda_scheduler | 192.168.133.93 | Up 2 hours | 2d70a4ac184e | 3.4.1 |
| rda_collector | 192.168.133.92 | Up 2 hours | 637a07f4df17 | 3.4.1 |
| rda_collector | 192.168.133.93 | Up 2 hours | 478167b3952a | 3.4.1 |
| rda_asset_dependency | 192.168.133.92 | Up 2 hours | c910651896fe | 3.4.1 |
| rda_asset_dependency | 192.168.133.93 | Up 2 hours | c1ddfde81b13 | 3.4.1 |
| rda_identity | 192.168.133.92 | Up 2 hours | f70beaa486a6 | 3.4.1 |
| rda_identity | 192.168.133.93 | Up 2 hours | a726b0f154c8 | 3.4.1 |
| rda_fsm | 192.168.133.92 | Up 2 hours | 87b26529566a | 3.4.1 |
| rda_fsm | 192.168.133.93 | Up 2 hours | 13891be75c05 | 3.4.1 |
+--------------------------+----------------+------------+--------------+-------+
Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-status | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | kafka-connectivity | ok | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=1, Brokers=[1, 2, 3] |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-status | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | minio-connectivity | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-initialization-status | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | kafka-connectivity | ok | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=2, Brokers=[1, 2, 3] |
| rda_app | alert-processor | 2afde67935ac | 33170bc7 | | service-status | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.8.3.3 Upgrade rdac
CLI
2.8.3.4 Upgrade OIA Application Services
Step-1: Run the below commands to initiate upgrading RDAF OIA Application services
Step-2: Run the below command to check the status of the newly upgraded PODs.
Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.3 version.
+-------------------------------+----------------+----------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+-------------------------------+----------------+----------------+--------------+-------+
| rda-alert-ingester | 192.168.131.50 | Up 4 Hours ago | 013e6fb89274 | 7.4.1 |
| rda-alert-ingester | 192.168.131.49 | Up 4 Hours ago | ce269889fe6c | 7.4.1 |
| rda-alert-processor-companion | 192.168.131.49 | Up 4 Hours ago | b4bca9347589 | 7.4.1 |
| rda-alert-processor-companion | 192.168.131.50 | Up 4 Hours ago | 1c530b32c563 | 7.4.1 |
| rda-alert-processor | 192.168.131.47 | Up 4 Hours ago | b0e25a38c72d | 7.4.1 |
| rda-alert-processor | 192.168.131.46 | Up 4 Hours ago | 2a5b0f764cfd | 7.4.1 |
| rda-app-controller | 192.168.131.50 | Up 4 Hours ago | 0261820f6e01 | 7.4.1 |
| rda-app-controller | 192.168.131.46 | Up 4 Hours ago | 134844ff7208 | 7.4.1 |
| rda-collaboration | 192.168.131.50 | Up 4 Hours ago | e5e196b74462 | 7.4.1 |
| rda-collaboration | 192.168.131.46 | Up 4 Hours ago | ed4ec37435b7 | 7.4.1 |
| rda-configuration-service | 192.168.131.46 | Up 4 Hours ago | 74e22e5ddee1 | 7.4.1 |
| rda-configuration-service | 192.168.131.50 | Up 4 Hours ago | b09637691cbd | 7.4.1 |
+-------------------------------+----------------+----------------+--------------+-------+
cfxdimensions-app-irm_service
has leader status under Site
column.
+-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App | alert-ingester | True | rda-alert-inge | 7861bd4f | | 4:20:52 | 8 | 31.33 | | |
| App | alert-ingester | True | rda-alert-inge | 4abc521f | | 4:20:52 | 8 | 31.33 | | |
| App | alert-processor | True | rda-alert-proc | 9bf94e67 | | 4:20:50 | 8 | 31.33 | | |
| App | alert-processor | True | rda-alert-proc | 4e679139 | | 4:20:48 | 8 | 31.33 | | |
| App | alert-processor-companion | True | rda-alert-proc | 745dfbb9 | | 4:20:39 | 8 | 31.33 | | |
| App | alert-processor-companion | True | rda-alert-proc | 02f6bce0 | | 4:20:41 | 8 | 31.33 | | |
| App | asset-dependency | True | rda-asset-depe | fc6c7a60 | | 4:28:00 | 8 | 31.33 | | |
| App | asset-dependency | True | rda-asset-depe | d3ca4c11 | | 4:27:07 | 8 | 31.33 | | |
| App | authenticator | True | rda-identity-6 | 4cd59d9c | | 4:27:01 | 8 | 31.33 | | |
| App | authenticator | True | rda-identity-6 | 174298c3 | | 4:25:53 | 8 | 31.33 | | |
| App | cfx-app-controller | True | rda-app-contro | 4d923832 | | 4:20:42 | 8 | 31.33 | | |
| App | cfx-app-controller | True | rda-app-contro | b16deafa | | 4:20:25 | 8 | 31.33 | | |
| App | cfxdimensions-app-access-manager | True | rda-access-man | 09d1fada | | 4:27:56 | 8 | 31.33 | | |
| App | cfxdimensions-app-access-manager | True | rda-access-man | e0af2bcc | | 4:27:54 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | rda-collaborat | 9e7f7bcb | | 4:20:31 | 8 | 31.33 | | |
| App | cfxdimensions-app-collaboration | True | rda-collaborat | 38db5386 | | 4:20:25 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | rda-file-brows | 589e18f8 | | 4:20:20 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | rda-file-brows | 853545f8 | | 4:19:59 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | rda-irm-servic | d17f8dcd | | 4:20:06 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | rda-irm-servic | 44decaa7 | *leader* | 4:19:41 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | rda-notificati | 74e58855 | | 4:20:14 | 8 | 31.33 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | minio-connectivity | ok | |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | service-initialization-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 4abc521f | | kafka-connectivity | ok | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=0, Brokers=[0, 1, 2] |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | minio-connectivity | ok | |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | service-initialization-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 7861bd4f | | kafka-connectivity | ok | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=2, Brokers=[0, 1, 2] |
| rda_app | alert-processor | rda-alert-pr | 4e679139 | | service-status | ok | |
| rda_app | alert-processor | rda-alert-pr | 4e679139 | | minio-connectivity | ok | |
| rda_app | alert-processor | rda-alert-pr | 4e679139 | | service-dependency:cfx-app-controller | ok | 2 pod(s) found for cfx-app-controller |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
Run the below commands to initiate upgrading the RDA Fabric OIA Application services.
Please wait till all of the new OIA application service containers are in Up state and run the below command to verify their status and make sure they are running with 7.4.1 version.
+-----------------------------------+----------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+-----------------------------------+----------------+------------+--------------+-------+
| cfx-rda-app-controller | 192.168.133.96 | Up 2 hours | deab59a554f6 | 7.4.1 |
| cfx-rda-app-controller | 192.168.133.92 | Up 2 hours | 7e3cbfc6d899 | 7.4.1 |
| cfx-rda-reports-registry | 192.168.133.96 | Up 2 hours | 934ef236dde2 | 7.4.1 |
| cfx-rda-reports-registry | 192.168.133.92 | Up 2 hours | 8749187dfb82 | 7.4.1 |
| cfx-rda-notification-service | 192.168.133.96 | Up 2 hours | eaaa0116b25c | 7.4.1 |
| cfx-rda-notification-service | 192.168.133.92 | Up 2 hours | 7f5b91f6b166 | 7.4.1 |
| cfx-rda-file-browser | 192.168.133.96 | Up 2 hours | 62ba48307a89 | 7.4.1 |
| cfx-rda-file-browser | 192.168.133.92 | Up 2 hours | ad83ab7f2611 | 7.4.1 |
| cfx-rda-configuration-service | 192.168.133.96 | Up 2 hours | 6f24b3296c44 | 7.4.1 |
| cfx-rda-configuration-service | 192.168.133.92 | Up 2 hours | ad93c6ddf2bc | 7.4.1 |
| cfx-rda-alert-ingester | 192.168.133.96 | Up 2 hours | 9132494ea9ab | 7.4.1 |
| cfx-rda-alert-ingester | 192.168.133.92 | Up 2 hours | f5312c1fc474 | 7.4.1 |
+-----------------------------------+----------------+------------+--------------+-------+
cfxdimensions-app-irm_service
has leader status under Site
column.
+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App | alert-ingester | True | 9132494ea9ab | ad43cf79 | | 1:56:34 | 4 | 31.21 | | |
| App | alert-ingester | True | f5312c1fc474 | 2a129b31 | | 1:56:21 | 4 | 31.21 | | |
| App | alert-processor | True | 2afde67935ac | 33170bc7 | | 1:54:29 | 4 | 31.21 | | |
| App | alert-processor | True | f289e1088a16 | 831fe5c3 | | 1:54:14 | 4 | 31.21 | | |
| App | alert-processor-companion | True | 83ebf4300ac5 | c9dba0df | | 1:47:44 | 4 | 31.21 | | |
| App | alert-processor-companion | True | 9b1b55d78d1a | a66ecf29 | | 1:47:29 | 4 | 31.21 | | |
| App | asset-dependency | True | c1ddfde81b13 | 985fc496 | | 2:20:03 | 4 | 31.21 | | |
| App | asset-dependency | True | c910651896fe | 9c355c7d | | 2:20:06 | 4 | 31.21 | | |
| App | authenticator | True | f70beaa486a6 | 955eb254 | | 2:19:59 | 4 | 31.21 | | |
| App | authenticator | True | a726b0f154c8 | 898c36b4 | | 2:19:57 | 4 | 31.21 | | |
| App | cfx-app-controller | True | 7e3cbfc6d899 | 2097a877 | | 1:58:49 | 4 | 31.21 | | |
| App | cfx-app-controller | True | deab59a554f6 | 3bd4ce27 | | 1:59:02 | 4 | 31.21 | | |
| App | cfxdimensions-app-access-manager | True | f47c6cab13f1 | e0636eea | | 2:19:32 | 4 | 31.21 | | |
| App | cfxdimensions-app-access-manager | True | 02b526adf7f9 | 7a286ce7 | | 2:19:23 | 4 | 31.21 | | |
| App | cfxdimensions-app-collaboration | True | b602c2cddd90 | 836e0134 | | 1:53:02 | 4 | 31.21 | | |
| App | cfxdimensions-app-collaboration | True | 2f02987f249d | c4d4720d | | 1:48:31 | 4 | 31.21 | | |
| App | cfxdimensions-app-file-browser | True | 62ba48307a89 | 48d1d0d2 | | 1:57:34 | 4 | 31.21 | | |
| App | cfxdimensions-app-file-browser | True | ad83ab7f2611 | 93078496 | | 1:57:14 | 4 | 31.21 | | |
| App | cfxdimensions-app-irm_service | True | 56dffc7d6501 | 672ff70a | *leader* | 1:53:57 | 4 | 31.21 | | |
| App | cfxdimensions-app-irm_service | True | b40a96601c73 | 25fe51f5 | | 1:53:42 | 4 | 31.21 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-status | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 9132494ea9ab | ad43cf79 | | kafka-connectivity | ok | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=1, Brokers=[1, 2, 3] |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-status | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | minio-connectivity | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | service-initialization-status | ok | |
| rda_app | alert-ingester | f5312c1fc474 | 2a129b31 | | kafka-connectivity | ok | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=3, Brokers=[1, 2, 3] |
| rda_app | alert-processor | 2afde67935ac | 33170bc7 | | service-status | ok | |
| rda_app | alert-processor | 2afde67935ac | 33170bc7 | | minio-connectivity | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
2.8.3.5 Upgrade RDA Worker Services
Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating
state.
NAME READY STATUS RESTARTS AGE
rda-worker-69d485f476-99tnv 1/1 Running 0 45h
rda-worker-69d485f476-gwq4f 1/1 Running 0 45h
Step-3: Run the below command to put all Terminating
RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance
command that is required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance
command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.
Step-6: Run the below command to delete the Terminating
RDAF worker service PODs
for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Wait for 120 seconds between each RDAF worker service upgrade by repeating above steps from Step-2 to Step-6 for rest of the RDAF worker service PODs.
Step-6: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.
+------------+----------------+-----------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+------------+----------------+-----------------+--------------+-------+
| rda-worker | 192.168.131.45 | Up 19 Hours ago | 6360f61b4249 | 3.4.1 |
| rda-worker | 192.168.131.44 | Up 19 Hours ago | 806b7b334943 | 3.4.1 |
+------------+----------------+-----------------+--------------+-------+
Step-7: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.
- Upgrade RDA Worker Services
Please run the below command to initiate upgrading the RDA Worker service PODs.
Note
If the worker is deployed in proxy environment, please add the required environment proxy variables in /opt/rdaf/deployment-scripts/values.yaml
, under the section rda_worker
-> env:
, instead of making changes to worker.yaml
(this is needed only if there are any new changes needed for worker)
Please wait for 120 seconds to let the newer version of RDA Worker service containers join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service containers.
+------------+----------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+------------+----------------+------------+--------------+-------+
| rda_worker | 192.168.133.96 | Up 2 hours | 03061dd8dfcc | 3.4.1 |
| rda_worker | 192.168.133.92 | Up 2 hours | cbb31b875cf6 | 3.4.1 |
+------------+----------------+------------+--------------+-------+
2.8.4.Post Upgrade Steps
2.8.4.1 OIA
1. Deploy latest Alerts and Incidents Dashboard configuration
Go to Main Menu --> Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle
and Click on Deploy action to deploy the latest Dashboards configuration for Alerts and Incidents.
Warning
It is mandatory to deploy the oia_l1_l2_bundle
(Alerts and Incidents Dashboards configuration) as the existing dashboard configuration for the same has been deprecated.
After deploying the oia_l1_l2_bundle
, within each Incident dashboard page, below pages are enabled by default irrespective of the corresponding features are configured or not.
- Alerts
- Topology
- Metrics
- Insights
- Collaboration
- Diagnostics
- Remediation
- Activities
Within each Incident page, the Alerts and Collaboration pages are mandatory, while the rest of the pages are optional until they are configured within the system.
If you need to remove these optional pages from the default Incident's view dashboard, please follow the below steps.
Go to Main Menu --> Configuration --> RDA Administration --> Dashboards --> User Dashboards --> Edit JSON config of incident-details-app dashboard and delete the below highlighted JSON configuration blocks.
....
....
"dashboard_pages": [
{
"name": "incident-details-alerts",
"label": "Alerts",
"icon": "alert.svg"
},
{
"name": "incident-details-topology",
"label": "Topology",
"icon": "topology.svg"
},
{
"name": "incident-details-metrics",
"label": "Metrics",
"icon": "metrics.svg"
},
{
"name": "incident-details-insights",
"label": "Insights",
"icon": "nextSteps.svg"
},
{
"name": "incident-details-collaboration",
"label": "Collaboration",
"icon": "collaboration.svg"
},
{
"name": "incident-details-diagnostics",
"label": "Diagnostics",
"icon": "diagnostic.svg"
},
{
"name": "incident-details-remediation",
"label": "Remediation",
"icon": "remedial.svg"
},
{
"name": "incident-details-activities",
"label": "Activities",
"icon": "activities.svg"
}
....
....
Note
Please note that, these deleted configuration blocks of Topology, Metrics, Insights, Diagnostics, Remediation and Activities can be added back once the corresponding features are configured within the system.
1. Deploy latest Alerts and Incidents Dashboard configuration
Go to Main Menu --> Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle
and Click on Deploy action to deploy the latest Dashboards configuration for Alerts and Incidents.
Warning
It is mandatory to deploy the oia_l1_l2_bundle
(Alerts and Incidents Dashboards configuration) as the existing dashboard configuration for the same has been deprecated.
After deploying the oia_l1_l2_bundle
, within each Incident dashboard page, below pages are enabled by default irrespective of the corresponding features are configured or not.
- Alerts
- Topology
- Metrics
- Insights
- Collaboration
- Diagnostics
- Remediation
- Activities
Within each Incident page, the Alerts and Collaboration pages are mandatory, while the rest of the pages are optional until they are configured within the system.
If you need to remove these optional pages from the default Incident's view dashboard, please follow the below steps.
Go to Main Menu --> Configuration --> RDA Administration --> Dashboards --> User Dashboards --> Edit JSON config of incident-details-app dashboard and delete the below highlighted JSON configuration blocks.
....
....
"dashboard_pages": [
{
"name": "incident-details-alerts",
"label": "Alerts",
"icon": "alert.svg"
},
{
"name": "incident-details-topology",
"label": "Topology",
"icon": "topology.svg"
},
{
"name": "incident-details-metrics",
"label": "Metrics",
"icon": "metrics.svg"
},
{
"name": "incident-details-insights",
"label": "Insights",
"icon": "nextSteps.svg"
},
{
"name": "incident-details-collaboration",
"label": "Collaboration",
"icon": "collaboration.svg"
},
{
"name": "incident-details-diagnostics",
"label": "Diagnostics",
"icon": "diagnostic.svg"
},
{
"name": "incident-details-remediation",
"label": "Remediation",
"icon": "remedial.svg"
},
{
"name": "incident-details-activities",
"label": "Activities",
"icon": "activities.svg"
}
....
....
Note
Please note that, these deleted configuration blocks of Topology, Metrics, Insights, Diagnostics, Remediation and Activities can be added back once the corresponding features are configured within the system.