Skip to content

Bots From Extension: cfxai_regression

CloudFabrix ML - Regression

This extension provides 5 bots.





Bot @cfxml:regression

Bot Position In Pipeline: Sink

ML Regression for a single timeseries dataset.

This bot expects a Restricted CFXQL.

Each parameter may be specified using '=' operator and AND logical operation
Following are the parameters expected for this Bot

Parameter Name Type Default Value Description
frequency Text 1H Bucketization interval (or frequency). Default is 1 hour.
ts_column* Text timestamp Timestamp column name
ts_format Text auto Timestamp column format. Valid values 'auto', s', 'ms', 'ns', 'datetimestr'
value_column Text Numerical value column name. If no value column is provided it will be time column aggregated
by 'agg_func' in that interval
job_name Text default Name of the Job to be created. Default job name is 'default'
action Text train 'train' or 'predict'
bucketing Text True Bucket data into 'frequency' intervals. Valid values 'True', 'False'. Default is 'True'
agg_func Text sum Aggregation function to aggregate value column for each bucket. Example values: 'mean', 'sum',
'count'
prediction_duration Text 7D Duration to forecast. Values accepted in frequency terms '1H', '1D', '1W', '1M'
lower_threshold Text Static lower threshold value for detecting anomalies
upper_threshold Text Static upper threshold value for detecting anomalies
lower_threshold_factor Text 1.0 Divides lowerBound with this factor while computing anomalies
upper_threshold_factor Text 1.0 Multiplies upperBound with this factor while computing anomalies
ignore_anomaly Text Ignore upper or lower anomalies. Accepted values: 'upper' and 'lower'
changepoint Text 0.01 Parameter to tweak sensitivity towards trend change. Use higher values to make it more sensitive
interval_width Text 0.8 Parameter to tweak upper and lower bounds range. Higher the value wider the range. Use values
between 0-1
live_data_label Text Label for plotting live data, if used timestamp in model wont be marked as predicted
timeseries_y_axis_label Text Value Label for Y axis of timeseries chart
skip_errors Text no Specify 'yes' or 'no'. If 'yes', do not bailout if regression results in error. Check 'reason'
field when it continues with an error.







Bot @cfxml:regression-bulk-anomalies

Bot Position In Pipeline: Sink

ML regression anomaly prediction for multiple timeseries datasets. Input can be multiple time serieses identified by data_label column. Training rules must be specified via regression_rules_dataset parameter. Produces output to column: anomaly_status.

This bot expects a Restricted CFXQL.

Each parameter may be specified using '=' operator and AND logical operation
Following are the parameters expected for this Bot

Parameter Name Type Default Value Description
regression_rules_dataset* Text Regression rules dataset name. Must contain columns frequency, ts_column, value_column, lower_min,
lower_max
trained_model_dataset* Text timestamp Name of the dataset where trained model will be loaded from.







Bot @cfxml:regression-bulk-train

Bot Position In Pipeline: Sink

ML regression training for multiple timeseries datasets. Input can be multiple time serieses identified by data_label column. Training rules must be specified via regression_rules_dataset parameter.

This bot expects a Restricted CFXQL.

Each parameter may be specified using '=' operator and AND logical operation
Following are the parameters expected for this Bot

Parameter Name Type Default Value Description
regression_rules_dataset* Text Regression rules dataset name. Must contain columns frequency, ts_column, value_column, lower_min,
lower_max
output_model_dataset* Text timestamp Name of the dataset where trained model will be saved.
output_status_dataset Text Name of the dataset where training status will be saved







Bot @cfxml:regression-multi-proc

Bot Position In Pipeline: Sink

ML Regression for multiple timeseries datasets using Parallel Processing.

This bot expects a Restricted CFXQL.

Each parameter may be specified using '=' operator and AND logical operation
Following are the parameters expected for this Bot

Parameter Name Type Default Value Description
frequency Text 1H Bucketization interval (or frequency). Default is 1 hour.
ts_column* Text timestamp Timestamp column name
value_column Text Numerical value column name. If no value column is provided it will be time column aggregated
by 'agg_func' in that interval
groupby* Text Column name that has unique value for timeseries for a group
keep_columns Text Comma separated list of column names to keep in output from input
job_name_column Text model_name Column name consisting of job/model names
action Text train 'train' or 'predict'
bucketing Text True Bucket data into 'frequency' intervals. Valid values 'True', 'False'. Default is 'True'
agg_func Text sum Aggregation function to aggregate value column for each bucket. Example values: 'mean', 'sum',
'count'
prediction_duration Text 7D Duration to forecast. Values accepted in frequency terms '1H', '1D', '1W', '1M'
lower_threshold Text Static lower threshold value for detecting anomalies
upper_threshold Text Static upper threshold value for detecting anomalies
lower_threshold_factor Text 1.0 Divides lowerBound with this factor while computing anomalies
upper_threshold_factor Text 1.0 Multiplies upperBound with this factor while computing anomalies
ignore_anomaly Text Ignore upper or lower anomalies. Accepted values: 'upper' and 'lower'
changepoint Text 0.01 Parameter to tweak sensitivity towards trend change. Use higher values to make it more sensitive
interval_width Text 0.8 Parameter to tweak upper and lower bounds range. Higher the value wider the range. Use values
between 0-1
live_data_label Text Label for plotting live data, if used timestamp in model wont be marked as predicted
timeseries_y_axis_label Text Value Label for Y axis of timeseries chart
skip_errors Text no Specify 'yes' or 'no'. If 'yes', do not bailout if regression results in error. Check 'reason'
field when it continues with an error.
num_procs Text 2 Maximum number of CPUs to use. 0 means all available CPUs. Value should be >= 0







Bot @cfxml:update-regression-chart

Bot Position In Pipeline: Sink

Updates existing regression chart with 'action' providedfor action 'add_marker' provide 'timestamp','message', 'color' and 'job_name'for action 'add_timeseries' provide 'ts_column', 'value_column', 'chart_type', 'color', 'label' and 'job_name'

This bot expects a Restricted CFXQL.

Each parameter may be specified using '=' operator and AND logical operation
Following are the parameters expected for this Bot

Parameter Name Type Default Value Description
action* Text timestamp Specify the action to perform. Example: 'add_marker', 'add_timeseries'
job_name* Text default Name of the job or model to be updated. Default job name is 'default'
ts_column Text timestamp Timestamp column name
ts_format Text auto Timestamp column format. Valid values 'auto', s', 'ms', 'ns', 'datetimestr'
value_column Text Numerical value column name
label Text value_column Chart component label
color Text #FFAC33 HEX color code
chart_type Text line Chart type for the new component. valid values: 'line','points'
timestamp Text now Marker timestamp
message Text Message to display on marker