Bots From Extension: cfxai_regression
CloudFabrix ML - Regression
This extension provides 5 bots.
Bot @cfxml:regression
Bot Position In Pipeline: Sink
ML Regression for a single timeseries dataset.
This bot expects a Restricted CFXQL.
Each parameter may be specified using '=' operator and AND logical operation
Following are the parameters expected for this Bot
Parameter Name | Type | Default Value | Description |
---|---|---|---|
frequency | Text | 1H | Bucketization interval (or frequency). Default is 1 hour. |
ts_column* | Text | timestamp | Timestamp column name |
ts_format | Text | auto | Timestamp column format. Valid values 'auto', s', 'ms', 'ns', 'datetimestr' |
value_column | Text | Numerical value column name. If no value column is provided it will be time column aggregated by 'agg_func' in that interval |
|
job_name | Text | default | Name of the Job to be created. Default job name is 'default' |
action | Text | train | 'train' or 'predict' |
bucketing | Text | True | Bucket data into 'frequency' intervals. Valid values 'True', 'False'. Default is 'True' |
agg_func | Text | sum | Aggregation function to aggregate value column for each bucket. Example values: 'mean', 'sum', 'count' |
prediction_duration | Text | 7D | Duration to forecast. Values accepted in frequency terms '1H', '1D', '1W', '1M' |
lower_threshold | Text | Static lower threshold value for detecting anomalies | |
upper_threshold | Text | Static upper threshold value for detecting anomalies | |
lower_threshold_factor | Text | 1.0 | Divides lowerBound with this factor while computing anomalies |
upper_threshold_factor | Text | 1.0 | Multiplies upperBound with this factor while computing anomalies |
ignore_anomaly | Text | Ignore upper or lower anomalies. Accepted values: 'upper' and 'lower' | |
changepoint | Text | 0.01 | Parameter to tweak sensitivity towards trend change. Use higher values to make it more sensitive |
interval_width | Text | 0.8 | Parameter to tweak upper and lower bounds range. Higher the value wider the range. Use values between 0-1 |
live_data_label | Text | Label for plotting live data, if used timestamp in model wont be marked as predicted | |
timeseries_y_axis_label | Text | Value | Label for Y axis of timeseries chart |
skip_errors | Text | no | Specify 'yes' or 'no'. If 'yes', do not bailout if regression results in error. Check 'reason' field when it continues with an error. |
Bot @cfxml:regression-bulk-anomalies
Bot Position In Pipeline: Sink
ML regression anomaly prediction for multiple timeseries datasets. Input can be multiple time serieses identified by data_label column. Training rules must be specified via regression_rules_dataset parameter. Produces output to column: anomaly_status.
This bot expects a Restricted CFXQL.
Each parameter may be specified using '=' operator and AND logical operation
Following are the parameters expected for this Bot
Parameter Name | Type | Default Value | Description |
---|---|---|---|
regression_rules_dataset* | Text | Regression rules dataset name. Must contain columns frequency, ts_column, value_column, lower_min, lower_max |
|
trained_model_dataset* | Text | timestamp | Name of the dataset where trained model will be loaded from. |
Bot @cfxml:regression-bulk-train
Bot Position In Pipeline: Sink
ML regression training for multiple timeseries datasets. Input can be multiple time serieses identified by data_label column. Training rules must be specified via regression_rules_dataset parameter.
This bot expects a Restricted CFXQL.
Each parameter may be specified using '=' operator and AND logical operation
Following are the parameters expected for this Bot
Parameter Name | Type | Default Value | Description |
---|---|---|---|
regression_rules_dataset* | Text | Regression rules dataset name. Must contain columns frequency, ts_column, value_column, lower_min, lower_max |
|
output_model_dataset* | Text | timestamp | Name of the dataset where trained model will be saved. |
output_status_dataset | Text | Name of the dataset where training status will be saved |
Bot @cfxml:regression-multi-proc
Bot Position In Pipeline: Sink
ML Regression for multiple timeseries datasets using Parallel Processing.
This bot expects a Restricted CFXQL.
Each parameter may be specified using '=' operator and AND logical operation
Following are the parameters expected for this Bot
Parameter Name | Type | Default Value | Description |
---|---|---|---|
frequency | Text | 1H | Bucketization interval (or frequency). Default is 1 hour. |
ts_column* | Text | timestamp | Timestamp column name |
value_column | Text | Numerical value column name. If no value column is provided it will be time column aggregated by 'agg_func' in that interval |
|
groupby* | Text | Column name that has unique value for timeseries for a group | |
keep_columns | Text | Comma separated list of column names to keep in output from input | |
job_name_column | Text | model_name | Column name consisting of job/model names |
action | Text | train | 'train' or 'predict' |
bucketing | Text | True | Bucket data into 'frequency' intervals. Valid values 'True', 'False'. Default is 'True' |
agg_func | Text | sum | Aggregation function to aggregate value column for each bucket. Example values: 'mean', 'sum', 'count' |
prediction_duration | Text | 7D | Duration to forecast. Values accepted in frequency terms '1H', '1D', '1W', '1M' |
lower_threshold | Text | Static lower threshold value for detecting anomalies | |
upper_threshold | Text | Static upper threshold value for detecting anomalies | |
lower_threshold_factor | Text | 1.0 | Divides lowerBound with this factor while computing anomalies |
upper_threshold_factor | Text | 1.0 | Multiplies upperBound with this factor while computing anomalies |
ignore_anomaly | Text | Ignore upper or lower anomalies. Accepted values: 'upper' and 'lower' | |
changepoint | Text | 0.01 | Parameter to tweak sensitivity towards trend change. Use higher values to make it more sensitive |
interval_width | Text | 0.8 | Parameter to tweak upper and lower bounds range. Higher the value wider the range. Use values between 0-1 |
live_data_label | Text | Label for plotting live data, if used timestamp in model wont be marked as predicted | |
timeseries_y_axis_label | Text | Value | Label for Y axis of timeseries chart |
skip_errors | Text | no | Specify 'yes' or 'no'. If 'yes', do not bailout if regression results in error. Check 'reason' field when it continues with an error. |
num_procs | Text | 2 | Maximum number of CPUs to use. 0 means all available CPUs. Value should be >= 0 |
Bot @cfxml:update-regression-chart
Bot Position In Pipeline: Sink
Updates existing regression chart with 'action' providedfor action 'add_marker' provide 'timestamp','message', 'color' and 'job_name'for action 'add_timeseries' provide 'ts_column', 'value_column', 'chart_type', 'color', 'label' and 'job_name'
This bot expects a Restricted CFXQL.
Each parameter may be specified using '=' operator and AND logical operation
Following are the parameters expected for this Bot
Parameter Name | Type | Default Value | Description |
---|---|---|---|
action* | Text | timestamp | Specify the action to perform. Example: 'add_marker', 'add_timeseries' |
job_name* | Text | default | Name of the job or model to be updated. Default job name is 'default' |
ts_column | Text | timestamp | Timestamp column name |
ts_format | Text | auto | Timestamp column format. Valid values 'auto', s', 'ms', 'ns', 'datetimestr' |
value_column | Text | Numerical value column name | |
label | Text | value_column | Chart component label |
color | Text | #FFAC33 | HEX color code |
chart_type | Text | line | Chart type for the new component. valid values: 'line','points' |
timestamp | Text | now | Marker timestamp |
message | Text | Message to display on marker |