NAV
curl

Introduction

Sandbox Base URL

https://sandbox.hydrogenplatform.com/ion/v1

Production Base URL

https://api.hydrogenplatform.com/ion/v1

The Hydrogen Ion API is designed to help your team execute machine learning techniques on complex data sets with limited data science expertise required. This API automatically generates a data analysis pipeline that can include data pre-processing, feature selection, and feature engineering methods to train and evaluate models with an automated machine learning framework (“AutoML”). Hyperparameter optimization can be added to the data to automatically determine what parameters to optimize for.

Ion is built on REST principles, with resource oriented URLs and HTTP response codes. All API responses are returned in JSON format.

Use Cases

Some of the various scenarios a user of the Ion API may consider and a step by step guide on how to achieve them.

Find your dataset’s optimal model and parameters

Who is this for? Users unsure of which models and parameters to use on a dataset
  1. Upload your dataset using the POST /dataset endpoint and retrieve the dataset_id. If the dataset is split into a train and test, upload two datasets.
  2. Create an AutoML job using the POST /automl endpoint with the dataset_id received from step 1. If the dataset is split, pass in a train dataset_id and a test_dataset_id.
  3. Start the AutoML job with GET /automl/{automl_id}?command=start using the automl_id from step 1.
  4. Retrieve the results of the AutoML job using GET /automl/{automl_id}. Returns an optimized_workflow_id along with metrics and predictions.

Optimize a model(s) with both training and testing data

Who is this for? Users familiar with sci-kit learn models and their parameters
  1. Upload a dataset using the POST /dataset endpoint and retrieve a dataset_id
  2. View a list of all the models available in the API using the GET /model endpoint. Determine which models will be trained.
  3. Use the POST /workflow endpoint to specify the target_pipeline, feature_pipeline, and model_pipeline to be used with the dataset_id from step 1. This returns a workflow_id.
  4. Create a model optimization job using the POST /optimize endpoint which requires the workflow_id from step 3. Returns an optimize_id.
  5. Start the optimization job with GET /optimize/{optimize_id}?command=start using the optimize_id from step 4.
  6. Retrieve the results of the evaluation job using GET /optimize/{optimize_id}.

Optimize a model(s) with separate training and testing data

Who is this for? Users familiar with sci-kit learn models and their parameters
  1. Upload your training dataset using the POST /dataset endpoint and retrieve the training dataset_id.
  2. View a list of all the models available in the API using the GET /model endpoint. Determine which models will be trained.
  3. Use the POST /workflow endpoint to specify the target_pipeline, feature_pipeline, and model_pipeline to be used with the dataset_id from step 1. This returns a workflow_id
  4. Create a model optimization job using the POST /optimize endpoint which requires the workflow_id from step 3. Returns an optimize_id.
  5. Start the optimization job with GET /optimize/{optimize_id}?command=start using the optimize_id from step 4.
  6. Retrieve the results of the evaluation job using GET /optimize/{optimize_id}.
  7. Upload the test dataset using the POST /dataset endpoint and retrieve the test dataset_id.
  8. Create an evaluation job with POST /evaluate using the workflow_id associated with the trained model. Returns an evaluate_id.
  9. Start the evaluation job with GET /evaluate/{evaluate_id}?command=start using the evaluate_id from step 8.
  10. Retrieve the results of the evaluation job using GET /evaluate/{evaluate_id}.

Retesting a trained model with new data

Who is this for? User who have trained models using /automl and /optimize
  1. Upload the new dataset using the POST /dataset endpoint and retrieve a dataset_id.
  2. Create an evaluation job with POST /evaluate using the optimized_workflow_id associated with the trained model. An optimized_workflow_id generated by /automl or /optimize can be used. Returns an evaluate_id.
  3. Start the evaluation job with GET /evaluate/{evaluate_id}?command=start using the evaluate_id from step 2.
  4. Retrieve the results of the evaluation job using GET /evaluate/{evaluate_id}.

Authentication

API Authentication

After successful registration of your application, you will be provided a client_id and client_secret which will be used to identify your application when calling any Hydrogen API.

We require all API calls to be made over HTTPS connections.

OAuth2 Authorization

Example Request

curl -X POST https://api.hydrogenplatform.com/authorization/v1/oauth/token?grant_type=client_credentials \
  -H "Authorization: Basic aHlkcm9nZW5faWQ6aHlkcm9nZW5fc2VjcmV0"

Example Response

{
  "access_token": "ac6b8213-2a77-4ecc-89fd-68c9f2aff256",
  "token_type": "bearer",
  "expires_in": 7200,
  "scope": "all"
}

All subsequent API calls will then be made like the following example:

curl -X GET https://api.hydrogenplatform.com/ion/v1/dataset \
-H "Authorization: Bearer ac6b8213-2a77-4ecc-89fd-68c9f2aff256"

Hydrogen uses OAuth 2.0 to facilitate authorization on the API, an industry standard framework for authorization. The client credentials flow is used by your application to obtain permission to act on its own behalf. A call will be made to our OAuth server to exchange your client_id, client_secret, and grant_type=client_credentials for an access_token, which can then be used to make calls to Hydrogen on behalf of the application.


REQUEST ARGUMENTS

Parameter Type Required Description
client_id string required Application id for identification, which will be given to you when you are onboarded.
client_secret string required Application secret, which will be given to you only once when you are onboarded. Please keep this in a safe place.
grant_type string required Must be set to client_credentials.


RESPONSE

Field Description
access_token Token that will be used for all subsequent API calls
expires_in When the token expires in seconds and will need to be called again. Default is 7200 or 12 hours.
token_type Always will be bearer
scope The scope your user has been granted in the application

Token Refresh

An access_token is short lived and will need to be refreshed to continue being authorized for the app. Access tokens are short lived: 1 hour. ​The Client ​Credentials ​grant ​type doesn’t ​return ​a ​refresh ​token. ​When your ​access_​token ​expires, ​the ​app ​has ​to ​simply ​request ​a new ​token which will invalidate the previous token.

Errors

ERROR CODES

Code Description
400 Bad Request
401 Unauthorized. Occurs when you are using an invalid or expired access token.
403 Forbidden. The request was valid but you are not authorized to access the resource.
404 Not Found. Occurs when you are requesting a resource which doesn’t exist such as an incorrect URL, incorrect ID, or empty result.
429 Too Many Requests. Exceeded the rate limit set. Currently, there is no rate limit on the APIs.
500 Internal Server Error.
503 Service Unavailable. If the API is down for maintenance you will see this error.


STATUS CODES

Code Description
200 Ok. The request was successful.
204 No Content. The request was successful but there is no additional content to send in the response body. This will occur on a successful DELETE.

Versioning

The Ion API is currently in major version 1.0. All features which are not backwards compatible will be pushed as a major version release. Features that we consider to be backwards compatible include the following:

Endpoints

Dataset

Create a dataset_id to store dataset metadata which includes information on how to collect the dataset and columns within the dataset. It is not possible to store a dataset using this endpoint, only dataset metadata. A dataset is required to create optimize, evaluate, and AutoML jobs.

List all datasets

Example Request

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
  "https://api.hydrogenplatform.com/ion/v1/dataset"

Example Response

{
    "dataset": [
    {
      "protocol": "http",
      "data_format": "csv",
      "location": "https://raw.githubusercontent.com/hydrogen-dev/ion-sample-data/master/classification/iris/iris_no_headers.csv",
      "is_header": false,
      "columns": [
        "sepal_length",
        "sepal_width",
        "petal_length",
        "petal_width",
        "class"],
      "dataset_id": "08b4bda9-3c0e-44fe-9ad6-56419fc4e769"
    },
    {
      "protocol": "http",
      "data_format": "csv",
      "location": "https://raw.githubusercontent.com/hydrogen-dev/ion-sample-data/master/regression/housing/housing.csv",
      "is_header": true,
      "columns": [],
      "dataset_id": "6c34f87f-452d-4092-9f21-1f92a4d9f011"
    }
  ]
 }

List all the datasets created by the user. This endpoint returns a list of datasets including the dataset metadata associated with each dataset_id.

HTTP REQUEST

GET /dataset

Create a dataset

Example Request with Column Names in Header

curl -X POST -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
     -H "Content-Type: application/json" \ 
     -d '{
        "protocol": "http",
        "location": "https://raw.githubusercontent.com/hydrogen-dev/ion-sample-data/master/classification/iris/iris.csv",
        "is_header": true
        }' "https://api.hydrogenplatform.com/ion/v1/dataset"

Example Request with Specified Column Names

curl -X POST -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
     -H "Content-Type: application/json" \
     -d '{
      "protocol": "http",
      "data_format": "csv",
      "location": "https://raw.githubusercontent.com/hydrogen-dev/ion-sample-data/master/classification/iris/iris_no_headers.csv",
      "is_header": false,
      "columns": [
        "sepal_length",
        "sepal_width",
        "petal_length",
        "petal_width",
        "class"]
    }' "https://api.hydrogenplatform.com/ion/v1/dataset"

Example Response

{
  "dataset_id": "efa289b2-3565-42e6-850b-8dad25727e99"
}

Create a dataset to store dataset metadata. The endpoint returns a dataset_id which can be used to run optimize, evaluate, and AutoML jobs.

HTTP REQUEST

POST /dataset

ARGUMENTS

Parameter Type Required Description
protocol string required The protocol used to fetch the data
data_format string required The source data format
location string required The data location
is_header boolean optional Indicates the first row of the dataset identifies the columns of the dataset. Defaults to false. If is_header is true, columns is not required
columns string optional A list identifying the data in each column of the dataset. If is_header is false, columns is required

Retrieve a dataset

Example Request

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/dataset/efa289b2-3565-42e6-850b-8dad25727e99?head=5"

Example Response

  "dataset": {
    "protocol": "http",
    "data_format": "csv",
    "location": "https://raw.githubusercontent.com/hydrogen-dev/ion-sample-data/master/classification/iris/iris.csv",
    "is_header": true,
    "columns": [],
    "dataset_id": "efa289b2-3565-42e6-850b-8dad25727e99",
    "head_code": 200,
    "head": [
                {
                    "sepal_length":5.1,
                    "sepal_width":3.5,
                    "petal_length":1.4,
                    "petal_width":0.2,
                    "class":"Iris-setosa"
                },
                {
                    "sepal_length":4.9,
                    "sepal_width":3.0,
                    "petal_length":1.4,
                    "petal_width":0.2,
                    "class":"Iris-setosa"
                },
                {
                    "sepal_length":4.7,
                    "sepal_width":3.2,
                    "petal_length":1.3,
                    "petal_width":0.2,
                    "class":"Iris-setosa"
                },
                {
                    "sepal_length":4.6,
                    "sepal_width":3.1,
                    "petal_length":1.5,
                    "petal_width":0.2,
                    "class":"Iris-setosa"
                },
                {
                    "sepal_length":5.0,
                    "sepal_width":3.6,
                    "petal_length":1.4,
                    "petal_width":0.2,
                    "class":"Iris-setosa"
                }
            ]
  }
}

Retrieve the dataset metadata for a specific dataset_id. If the head argument is present, its value specifies the number of records to display. The records are returned in JSON format.

HTTP REQUEST

GET /dataset/{dataset_id}

ARGUMENTS

Parameter Type Required Description
head integer optional Specifies the number of sample rows to display. Value may be 5 or 10.

RESPONSE

Field Description
head_code Integer representing the status code when trying to retrieve the data from location.
head A list of records returned in Json format. If head_code represents an error code head will be null.

Delete a dataset

Example Request

curl -X DELETE -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/dataset/efa289b2-3565-42e6-850b-8dad25727e99"

Response (204 No Content)

Delete the dataset metadata for a specific dataset_id. This does not delete the workflows used with a specific dataset_id and any metrics associated with the dataset which have been generated using POST /optimize/{optimize_id}, POST /evaluate/{evalute_id}, and POST /automl/{automl_id}.

HTTP REQUEST

DELETE /dataset/{dataset_id}

Workflow

Create a machine learning workflow describing the data preprocessing steps and the model to be optimized and fitted over a dataset. A machine learning workflow consists of a target_pipeline describing the pre-processing steps to prepare the target variable for the model pipeline; a feature_pipeline describing the pre-processing steps to prepare the feature (input) variables for the model pipeline; and a model_pipeline that describes the pre-processing steps and the machine learning model used to predict the target given the features.

Both the pre-processing steps and the machine learning model are represented by a model. Note: The target_pipeline and model_pipeline are a list of models since both of these pipelines have a single output. The feature_pipeline could have multiple outputs and is a list of list of models so that different pre-processing steps can be specified for sets of features (e.g. categorical features and numerical features should be processed differently).

A workflow can be optimized using an optimization job.

List all workflows

Example Request

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/workflow?dataset_id=19677148-edde-42f4-a4c7-3309202a0953"

Example Response

{
    "workflow":
    [
        {
            "workflow_id": "324b82cd-6281-4e94-b6b6-e12733d2763a",
            "dataset_id": "19677148-edde-42f4-a4c7-3309202a0953",
            "target_pipeline":
            [
                {
                    "model_name": "DataFrameSelector",
                    "model_parameters":
                    {
                        "attribute_names":
                        [
                            "class"
                        ]
                    }
                },
                {
                    "model_name": "CategoricalEncoder",
                    "model_parameters":
                    {
                        "categories": "auto",
                        "encoding": "ordinal",
                        "handle_unknown": "error"
                    }
                }
            ],
            "feature_pipeline":
            [
                [
                    {
                        "model_name": "DataFrameSelector",
                        "model_parameters":
                        {
                            "attribute_names":
                            [
                               "sepal_length",
                               "sepal_width",
                               "petal_length",
                               "petal_width"
                            ]
                        }
                    },
                    {
                        "model_name": "NumpyArrayEncoder",
                        "parameters": null
                    }
            ]
        ],
            "model_pipeline":
            [
                {
                    "model_name": "RandomForestClassifier",
                    "model_parameters":
                    {
                        "bootstrap": true,
                        "class_weight": null,
                        "criterion": "gini",
                        "max_depth": null,
                        "max_features": "auto",
                        "max_leaf_nodes": null,
                        "min_impurity_decrease": 0,
                        "min_impurity_split": null,
                        "min_samples_leaf": 1,
                        "min_samples_split": 2,
                        "min_weight_fraction_leaf": 0,
                        "n_estimators": 100,
                        "n_jobs": 1,
                        "oob_score": false,
                        "random_state": null,
                        "verbose": 0,
                        "warm_start": false
                    }
                }
            ]
        },
        {
            "workflow_id": "6856a276-84e8-447a-a4d0-ec1eae17b2b6",
            "dataset_id": "19677148-edde-42f4-a4c7-3309202a0953",
            "target_pipeline":
            [
                {
                    "model_name": "DataFrameSelector",
                    "model_parameters":
                    {
                        "attribute_names":
                        [
                            "class"
                        ]
                    }
                },
                {
                    "model_name": "CategoricalEncoder",
                    "model_parameters":
                    {
                        "categories": "auto",
                        "encoding": "ordinal",
                        "handle_unknown": "error"
                    }
                }
            ],
            "feature_pipeline":
            [
                [
                    {
                        "model_name": "DataFrameSelector",
                        "model_parameters":
                        {
                            "attribute_names":
                            [
                               "sepal_length",
                               "sepal_width",
                               "petal_length",
                               "petal_width"
                            ]
                        }
                    },
                    {
                        "model_name": "NumpyArrayEncoder",
                        "parameters": null
                    }
            ]
        ],
            "model_pipeline":
            [
                {
                    "model_name": "LogisticRegressionClassifier",
                    "model_parameters":
                    {
                        "C": 1,
                        "class_weight": null,
                        "dual": false,
                        "fit_intercept": true,
                        "intercept_scaling": 1,
                        "max_iter": 100,
                        "multi_class": "ovr",
                        "n_jobs": 1,
                        "penalty": "l2",
                        "random_state": null,
                        "solver": "liblinear",
                        "tol": 0.0001,
                        "verbose": 0,
                        "warm_start": false
                    }
                }
            ]
        }
    ]
}

Retrieve all workflows created by the user.

HTTP REQUEST

GET /workflow

QUERY PARAMETERS

Parameter Type Required Description
dataset_id string optional Filters the workflows to those associated with a specific dataset.

Create a workflow

Example Request

curl -X POST -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
     -H "Content-Type: application/json" \ 
     -d '{
            "dataset_id": "19677148-edde-42f4-a4c7-3309202a0953",
            "target_pipeline":
            [
                {
                    "model_name": "DataFrameSelector",
                    "model_parameters":
                    {
                        "attribute_names":
                        [
                            "class"
                        ]
                    }
                },
                {
                    "model_name": "CategoricalEncoder",
                    "model_parameters":
                    {
                        "encoding": "ordinal"
                    }
                }
            ],
            "feature_pipeline":
            [
                [
                    {
                        "model_name": "DataFrameSelector",
                        "model_parameters":
                        {
                            "attribute_names":
                            [
                               "sepal_length",
                               "sepal_width",
                               "petal_length",
                               "petal_width"
                            ]
                        }
                    },
                    {
                        "model_name": "NumpyArrayEncoder",
                        "parameters": null
                    }
            ]
        ],
        "model_pipeline":
        [
            {
                "model_name": "RandomForestClassifier",
                "model_parameters":
                {
                    "n_estimators": 100
                }
            }
        ]
     }' "https://api.hydrogenplatform.com/ion/v1/workflow"

Example Response

{
    "workflow_id": "324b82cd-6281-4e94-b6b6-e12733d2763a"
}

The target_pipeline is a list of models that specify the preprocessing steps applied to a dataset to prepare the target variable for the machine learning algorithm.

The feature_pipeline is a list of list of models. Each list specifies the preprocessing steps applied to a dataset to prepare a group of feature variables for the machine learning algorithm.

The model_pipeline is a list of models that preprocesses the data generated by the target_pipeline and feature_pipeline and feeds the result into a classifier or regressor model.

Any unspecified model_parameters will be set to their default value.

HTTP REQUEST

POST /workflow

ARGUMENTS

Parameter Type Required Description
dataset_id string required A dataset the workflow will be used for.
target_pipeline array required List of models used to pre-process target variable
      model_name string required model(s) used on target variable
      model_parameters string required dataset column(s) modified by target_pipeline
feature_pipeline array required List of models used to pre-process input variable
      model_name string optional model(s) used on input variable
      model_parameters string optional dataset column(s) modified by feature_pipeline
model_pipeline array optional List of models to be trained to predict output variable, given input variable
      model_name string required model(s) used to predict output variable
      model_parameters string required dataset column(s) used to predict output variable

Retrieve a workflow

Example Request

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/workflow/efa289b2-3565-42e6-850b-8dad25727e99"

Retrieve the dataset_id, target_pipeline, feature_pipeline and model_pipeline for a specific workflow_id. The response will match the post request, but also include unspecified model_parameters which are set to their default values.

HTTP REQUEST

GET /workflow/{workflow_id}

Optimize

Optimize a workflow for a given dataset.

List all optimization jobs

Retrieve all optimization jobs created by the user.

Example Request

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/optimize?dataset_id=48f924cb-996f-48e2-b9fb-832ee430e3bb"

Example Response

{
    "optimize":
    [
        {
            "optimize_id": "0f5ba35c-f8db-4c32-9dd5-d56a9ae79f66",
            "status": "RUNNING",
            "progress": 26,
            "execution_date": "2018-04-18T14:50:35.726641",
            "result": null,
            "metadata":
            {
                "dataset_id": "48f924cb-996f-48e2-b9fb-832ee430e3bb",
                "workflow_id": "6f3f397d-c870-45ec-a191-f5d3d9a1be81",
                "percent_test": 0.2,
                "random_state": 12345,
                "optimization_type": "random",
                "n_folds": 5,
                "time_limit": 600
            }
        },
        {
            "optimize_id": "8cb4e2be-4adb-4bfd-9416-44982b508ba0",
            "status": "PENDING",
            "execution_date": null,
            "result": null,
            "metadata":
            {
                "dataset_id": "48f924cb-996f-48e2-b9fb-832ee430e3bb",
                "workflow_id": "4ea731a0-d707-4b54-b970-65055fd4ad1f",
                "percent_test": 0.2,
                "random_state": 12345,
                "optimization_type": "random",
                "n_folds": 5,
                "time_limit": 600
            }
        }
    ]
}

HTTP REQUEST

GET /optimize

QUERY PARAMETERS

Parameter Type Required Description
dataset_id string optional Filters the optimize jobs to those associated with a specific dataset.
workflow_id string optional Filters the optimize jobs to those associated with a specific workflow.

Create an optimization job

Example Request Using a Randomized Test Set

curl -X POST -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
     -H "Content-Type: application/json" \
     -d '{
            "dataset_id": "efa289b2-3565-42e6-850b-8dad25727e99",
            "workflow_id: "bdef14c5-2116-44cb-91ce-3edbddc9c494",
            "percent_test": 0.2,
            "random_state": 12345,
            "optimization_type": "random",
            "n_folds": 5,
            "time_limit": 600
         }' "https://api.hydrogenplatform.com/ion/v1/optimize"

Example Request Using an Explicit Test Set

curl -X POST -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
     -H "Content-Type: application/json" \
     -d '{
            "dataset_id": "efa289b2-3565-42e6-850b-8dad25727e99",
            "test_dataset_id": "3cd95d38-5499-48a4-8b46-a3cd977ebd64",
            "workflow_id": "6e0451c9-edc3-40df-80dc-2073cd0951d9",
            "random_state": 12345,
            "optimization_type": "random",
            "n_folds": 5,
            "time_limit": 600
         }' "https://api.hydrogenplatform.com/ion/v1/optimize"

Example Response

{
    "optimize_id": "7e07f562-95af-4f5b-a347-d9e523d45bd0"
}

Create a workflow optimization job

HTTP REQUEST

POST /optimize

ARGUMENTS

Parameter Type Required Description
workflow_id string required Workflow to be optimized.
dataset_id string optional Dataset for which the workflow will be optimized. If not specified, dataset_id will be retrieved from the workflow.
percent_test double optional Percentage of the dataset to be used for testing the workflow.
random_state integer required An integer used as a seed for the random number generator to replicate results.
n_folds integer optional Number of folds for cross validation.
optimization_type string optional Optimization method to use for hyper-parameter optimization. Values can be random, grid, or bayesian.
time_limit string optional Seconds elapsed before the job is stopped.

Retrieve an optimization job

Retrieve the status of an optimization job

HTTP REQUEST

GET /optimize/{optimize_id}

Example Request

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/optimize/7e07f562-95af-4f5b-a347-d9e523d45bd0"

Example Response While Running

{
    "optimize":
    {
        "optimize_id": "6015a032-edf2-4c12-89f5-1f0de5aaf005",
        "status": "RUNNING",
        "progress": 63,
        "execution_date": "2018-04-09T15:22:10.859405",
        "result": null,
        "metadata":
        {
            "dataset_id": "efa289b2-3565-42e6-850b-8dad25727e99",
            "workflow_id": "bdef14c5-2116-44cb-91ce-3edbddc9c494",
            "percent_test": 0.2,
            "random_state": 12345,
            "optimization_type": "random",
            "n_folds": 5,
            "time_limit": 600
         }
    }
}

Example While Successful

{
    "optimize":
    {
        "optimize_id": "6015a032-edf2-4c12-89f5-1f0de5aaf005",
        "status": "SUCCESS",
        "progress": 100,
        "execution_date": "2018-04-09T15:22:10.859405",
        "result":
         {
             "optimized_workflow_id": "33d9f604-e9d1-4bab-93dd-7b24df3c25d9",
             "metric":
             [
                 {
                     "metric_name": "accuracy",
                     "metric_type": "cross-validation",
                     "metric_value": 0.982
                 },
                 {
                     "metric_name": "accuracy",
                     "metric_type": "test",
                     "metric_value": 0.945
                 }
             ],
             "index":
             {
                 "train":
                 [
                     1,
                     4,
                     5,
                     6,
                     ...
                 ],
                 "test":
                 [
                     2,
                     3,
                     ...
                 ]
             },
             "prediction":
             {
                 "train":
                 [
                     "Iris-versicolor",
                     "Iris-virginica",
                     "Iris-versicolor",
                     "Iris-versicolor",
                     ...
                 ],
                 "test":
                 [
                     "Iris-setosa",
                     "Iris-virginica",
                     ...
                 ]
             }
         },
        "metadata":
        {
            "dataset_id": "efa289b2-3565-42e6-850b-8dad25727e99",
            "workflow_id": "bdef14c5-2116-44cb-91ce-3edbddc9c494",
            "percent_test": 0.2,
            "random_state": 12345,
            "optimization_type": "random",
            "n_folds": 5,
            "time_limit": 600
        }
    }
}

Example Response While Failed

{
    "optimize":
    {
        "optimize_id": "6015a032-edf2-4c12-89f5-1f0de5aaf005",
        "execution_date": "2018-04-09T15:22:10.859405",
        "duration": 2.014,
        "status": "FAILED",
        "progress": 100,
        "result":
        {
            "reason": "Workflow with ID bdef14c5-2116-44cb-91ce-3edbddc9c494 was not found.",
            "stacktrace": null
        },
        "metadata":
        {
            "dataset_id": "efa289b2-3565-42e6-850b-8dad25727e99",
            "workflow_id": "bdef14c5-2116-44cb-91ce-3edbddc9c494",
            "percent_test": 0.2,
            "random_state": 12345,
            "optimization_type": "random",
            "n_folds": 5,
            "time_limit": 600
        }
    }
}

QUERY PARAMETERS

Parameter Type Required Description
command string optional Command to manage an optimize job. Values may be start which starts a job; abort which aborts a job; or status which retreives the status of the job. The default for command is status.

RESPONSE

Field Description
status Status of the optimize job. Values may be PENDING, RUNNING, SUCCESS, FAILED.
excecution_date String marking the start of the job execution. Not present when the job is PENDING.
duration Time in seconds that the job took to complete. Not present when the job is PENDING or RUNNING.
progress An integer represnting the percent of the optimize job that has been completed.
result Result of an optimize job which depends on its status. On PENDING or RUNNING, result is null. On SUCCESS, result is a key-value pair with a metric key whose associated values are test and cross-validated metrics, an index key whose associated values are the row indices used for training and test, and a prediction key whose associated values are the model predictions for both training and test in the order of index. Note that index is only returned if a randomized test set was used, otherwise the arrays associated with the prediction key are in the original order of the training and test datasets. On FAILED, result is a key-value pair containing a reason key whose associated value is an error message and a stacktrace key whose associated value is the stacktrace of the program if an unknown error occurred.
metadata The input arguments used to create the optimize job with unspecified parameters set to their defaults.

Delete an optimization job

Example Request

curl -X DELETE -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/optimize/efa289b2-3565-42e6-850b-8dad25727e99"

Response (204 No Content)

Delete an optimization job for a specific optimize_id. Deleting an optimization job also deletes the results of the job. Consequently, the optimized_workflow_id associated with the deleted optimization job cannot be used in an evaluation job.

HTTP REQUEST

DELETE /optimize/{optimize_id}

Evaluate

Evaluate an optimized workflow on a new test dataset. Use this endpoint to evaluate an optimized workflow returned as the result of an optimize job on a new test dataset.

List all evaluation jobs

Retrieve all evaluation jobs created by the user.

Example Request

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/evaluate?dataset_id=efa289b2-3565-42e6-850b-8dad25727e99

Example Response

{
    "evalulate":
    [
        {
            "evaluate_id": "5f06f70c-d0c4-4812-bb6c-706329988e79",
            "status": "PENDING",
            "execution_date": null,
            "result": null,
            "metadata":
            {
                "dataset_id":  "efa289b2-3565-42e6-850b-8dad25727e99",
                "optimized_workflow_id": "6ad92a48-b0de-4030-80ed-b7f2ec27e031"
            }
        },
        {
            "evaluate_id": "5f06f70c-d0c4-4812-bb6c-706329988e79",
            "status": "RUNNING",
            "execution_date": "2018-04-18T14:11:50.104700",
            "result": null,
            "metadata":
            {
                "dataset_id":  "efa289b2-3565-42e6-850b-8dad25727e99",
                "optimized_workflow_id": "bbac6c54-6ebe-419d-b899-ccfa373a6770'"
            }
        }
    ]
}

HTTP REQUEST

GET /evaluate

QUERY PARAMETERS

Parameter Type Required Description
dataset_id string optional Filters the evaluation jobs to those associated with a specific dataset.
optimized_workflow_id string optional Filters the evaluation jobs to those associated with a specific optimized workflow.

Create an evaluation job

Example Request

curl -X POST -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
     -H "Content-Type: application/json" \ 
     -d '{
            "dataset_id":  "efa289b2-3565-42e6-850b-8dad25727e99",
            "optimized_workflow_id": "d078d8ae-dce2-4b29-8d4f-5986f206e328"
        }' "https://api.hydrogenplatform.com/ion/v1/evaluate"

Example Response

{
    "evaluate_id": "5f06f70c-d0c4-4812-bb6c-706329988e79"
}

Create an optimized workflow evaluation job on a new test dataset.

HTTP REQUEST

POST /evaluate

ARGUMENTS

Parameter Type Required Description
optimized_workflow_id string required Id of an optimized workflow returned as a result of an optimize job.
dataset_id string required Dataset to use for evaluation.

Retrieve an evaluation job

Evaluate an optimized workflow on a test dataset.

Example Request

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/evaluate/efa289b2-3565-42e6-850b-8dad25727e99"

Example While Successful

{
    "evalulate":
    {
        "evaluate_id": "5f06f70c-d0c4-4812-bb6c-706329988e79",
        "status": "SUCCESS",
        "progress": 100,
        "execution_date": "2018-04-09T15:22:10.859405",
        "duration": 2.67,
        "result":
         {
             "metric":
             [
                 {
                     "metric_name": "accuracy",
                     "metric_type": "test",
                     "metric_value": 0.945
                 }
             ],
             "prediction":
             {
                 "test":
                 [
                     "Iris-setosa",
                     "Iris-virginica",
                     ...
                 ]
             }
         },
        "metadata":
        {
            "dataset_id":  "efa289b2-3565-42e6-850b-8dad25727e99",
            "optimized_workflow_id": "d078d8ae-dce2-4b29-8d4f-5986f206e328"
        }
    }
}

HTTP REQUEST

GET /evaluate/{evaluate_id}

QUERY PARAMETERS

Parameter Type Required Description
command string optional Command to manage an evaluate job. Values may be start which starts a job; abort which aborts a job; or status which retreives the status of the job. The default for command is status.

RESPONSE

Field Description
status Status of the evaluate job. Values may be PENDING, RUNNING, SUCCESS, FAILED.
excecution_date String marking the start of the job execution. Not present when the job is PENDING.
duration Time in seconds that the job took to complete. Not present when the job is PENDING or RUNNING.
progress An integer represnting the percent of the evaluate job that has been completed.
result Result of an evaluate job which depends on its status. On PENDING or RUNNING, result is null. On SUCCESS, result is a key-value pair with a metric key whose associated values are test metrics and a prediction key whose associated values are the model predictions for the test set. On FAILED, result is a key-value pair containing a reason key whose associated value is an error message and a stacktrace key whose associated value is the stacktrace of the program if an unknown error occurred.
metadata The input arguments used to create the evaluate job.

Delete an evaluation job

Example Request

curl -X DELETE -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/evaluate/efa289b2-3565-42e6-850b-8dad25727e99"

Response (204 No Content)

Delete an evaluation job for a specific evaluate_id. Deleting an evaluation job also deletes the results of the job.

HTTP REQUEST

DELETE /evaluate/{evaluate_id}

AutoML

Find and fit the best workflow for a given dataset.

List all AutoML jobs

Retrieve all AutoML jobs created by the user.

Example Request

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/automl?dataset_id=744a943d-82ed-41f1-a9f7-a214674ffd62"

Example Response

{
    "automl":
    [
        {
            "automl_id": "10252c77-963c-442f-aec9-555a801f24c1",
            "status": "RUNNING",
            "progress": 51,
            "execution_date": "2018-04-18T14:38:50.487704",
            "result": null,
            "metadata":
            {
                "dataset_id": "744a943d-82ed-41f1-a9f7-a214674ffd62",
                "automl_type": "classification",
                "target_column_name": "class",
                "feature_column_names":
                [
                    "sepal_length",
                    "sepal_width",
                    "petal_length",
                    "petal_width"
                ],
                "percent_test": 0.2,
                "random_state": 12345,
                "optimization_type": "random",
                "n_folds": 5,
                "time_limit": 600
            }
        },
        {
            "automl_id": "6f8ba792-8ee4-4461-96b2-9451eddb7315",
            "status": "PENDING",
            "execution_date": null,
            "result": null,
            "metadata":
            {
                "dataset_id": "744a943d-82ed-41f1-a9f7-a214674ffd62",
                "automl_type": "classification",
                "target_column_name": "class",
                "feature_column_names":
                [
                    "sepal_length",
                    "sepal_width",
                    "petal_length",
                    "petal_width"
                ],
                "percent_test": 0.2,
                "random_state": 12345,
                "optimization_type": "grid",
                "n_folds": 5,
                "time_limit": 600
            }
        }
    ]
}

HTTP REQUEST

GET /automl

QUERY PARAMETERS

Parameter Type Required Description
dataset_id string optional Filters the automl jobs to those associated with a specific dataset.

Create an AutoML job

Example Request Using a Randomized Test Set

curl -X POST -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
     -H "Content-Type: application/json" \ 
     -d '{
            "dataset_id": "59d85e2e-ea02-4c7b-af80-8d31932deaca",
            "automl_type": "classification",
            "target_column_name": "class",
            "feature_column_names":
            [
                "sepal_length",
                "sepal_width",
                "petal_length",
                "petal_width"
            ],
            "percent_test": 0.2,
            "random_state": 12345,
            "optimization_type": "random",
            "n_folds": 5,
            "time_limit": 600
         }' "https://api.hydrogenplatform.com/ion/v1/automl"

Example Request Using an Explicit Test Set

curl -X POST -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
     -H "Content-Type: application/json" \
     -d '{
            "dataset_id": "59d85e2e-ea02-4c7b-af80-8d31932deaca",
            "test_dataset_id": "e9779871-bdb4-4a53-9879-108a93709ca0",
            "automl_type": "classification",
            "target_column_name": "class",
            "feature_column_names":
            [
                "sepal_length",
                "sepal_width",
                "petal_length",
                "petal_width"
            ],
            "random_state": 12345,
            "optimization_type": "random",
            "n_folds": 5,
            "time_limit": 600}' "https://api.hydrogenplatform.com/ion/v1/automl"

Example Response

{
    "automl_id": "fc183c3b-cc51-40d5-b21a-2fa5d99360c9"
}

Create an AutoML job to generate metrics.

HTTP REQUEST

POST /automl

ARGUMENTS

Parameter Type Required Description
dataset_id string required Dataset to run AutoML job on.
target_column_name string required Column containing target variable.
feature_column_names string optional Column containing feature variable(s).
percent_test double optional Percentage of the dataset to be used for testing the model.
random_state integer required A random number which can be used to get the same results with this AutoML job.
n_folds integer optional Number of folds for cross validation.
automl_type string required Type of AutoML optimization.
optimization_type string optional Type of hyperparameter optimization.
time_limit string optional Time elapsed before AutoML job is terminated. If not specified, defaults to 10 minutes.
pipeline_time_limit string optional Time elapsed before pipeline is terminated.

Retrieve an AutoML job

Retrieve the status of an AutoML job.

Example Request

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/automl/efa289b2-3565-42e6-850b-8dad25727e99"

Example Response While Running

{
    "automl":
    {
        "automl_id": "fc183c3b-cc51-40d5-b21a-2fa5d99360c9",
        "status": "RUNNING",
        "progress": 30,
        "execution_date": "2018-04-09T15:22:10.859405",
        "result": null,
        "metadata":
        {
            "dataset_id": "59d85e2e-ea02-4c7b-af80-8d31932deaca",
            "automl_type": "classification",
            "target_column_name": "class",
            "feature_column_names":
            [
                "sepal_length",
                "sepal_width",
                "petal_length",
                "petal_width"
            ],
            "percent_test": 0.2,
            "random_state": 12345,
            "optimization_type": "random",
            "n_folds": 5,
            "time_limit": 600
        }
    }
}

Example Response While Successful

{
    "automl":
    {
        "automl_id": "fc183c3b-cc51-40d5-b21a-2fa5d99360c9",
        "status": "SUCCESS",
        "progress": "100",
        "execution_date": "2018-04-09T15:22:10.859405",
        "duration": 502.32,
        "result":
        {
            "optimized_workflow_id": "fd3011fd-4c53-483b-a0e6-47f0a71cc0bb",
            "metric":
            [
                {
                    "metric_name": "accuracy",
                    "metric_type": "cross-validation",
                    "metric_value": 0.9854
                },
                {
                    "metric_name": "accuracy",
                    "metric_type": "test",
                    "metric_value": 0.9345
                }
            ],
            "index":
            {
                "train":
                [
                    1,
                    4,
                    5,
                    6,
                    ...

                ],
                "test":
                [
                    2,
                    3,
                    ...
                ]
            },
            "prediction":
            {
                "train":
                [
                    "Iris-versicolor",
                    "Iris-virginica",
                    "Iris-versicolor",
                    "Iris-versicolor",
                    ...
                ],
                "test":
                [
                    "Iris-setosa",
                    "Iris-virginica",
                    ...
                ]

            }
        },
        "metadata":
        {
            "dataset_id": "59d85e2e-ea02-4c7b-af80-8d31932deaca",
            "automl_type": "classification",
            "target_column_name": "class",
            "feature_column_names":
            [
                "sepal_length",
                "sepal_width",
                "petal_length",
                "petal_width"
            ],
            "percent_test": 0.2,
            "random_state": 12345,
            "optimization_type": "random",
            "n_folds": 5,
            "time_limit": 600
        }
    }
}

Example Response While Failed

{
    "automl":
    {
        "automl_id": "fc183c3b-cc51-40d5-b21a-2fa5d99360c9",
        "execution_date": "2018-04-09T15:22:10.859405",
        "duration": 2.014,
        "status": "FAILED",
        "result":
        {
            "reason": "Dataset with ID 59d85e2e-ea02-4c7b-af80-8d31932deaca was not found.",
            "stacktrace": null
        },
        "metadata":
        {
            "dataset_id": "59d85e2e-ea02-4c7b-af80-8d31932deaca",
            "automl_type": "classification",
            "target_column_name": "class",
            "feature_column_names":
            [
                "sepal_length",
                "sepal_width",
                "petal_length",
                "petal_width"
            ],
            "percent_test": 0.2,
            "random_state": 12345,
            "optimization_type": "random",
            "n_folds": 5,
            "time_limit": 600
        }
    }
}

HTTP REQUEST

GET /automl/{automl_id}

QUERY PARAMETERS

Parameter Type Required Description
command string optional Command to manage an automl job. Values may be start which starts a job; abort which aborts a job; or status which retreives the status of the job. The default for command is status.

RESPONSE

Field Description
status Status of the AutoML job. Values may be PENDING, RUNNING, SUCCESS, FAILED.
excecution_date String marking the start of the job execution. Not present when the job is PENDING.
duration Time in seconds that the job took to complete. Not present when the job is PENDING or RUNNING.
progress An integer represnting the percent of the AutoML job that has been completed.
result Result of an AutoML job which depends on its status. On PENDING or RUNNING, result is null. On SUCCESS, result is a key-value pair with a metric key whose associated values are test and cross-validated metrics, an index key whose associated values are the row indices used for training and test, and a prediction key whose associated values are the model predictions for both training and test in the order of index. Note that index is only returned if a randomized test set was used, otherwise the arrays associated with the prediction key are in the original order of the training and test datasets. On FAILED, result is a key-value pair containing a reason key whose associated value is an error message and a stacktrace key whose associated value is the stacktrace of the program if an unknown error occurred.
metadata The input arguments used to create the AutoML job with unspecified parameters set to their defaults.

Delete an AutoML job

Example Request

curl -X DELETE -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/automl/efa289b2-3565-42e6-850b-8dad25727e99"

Response (204 No Content)

Delete an AutoML job for a specific automl_id. Deleting an AutoML job also deletes the results of the job. Consequently, the optimized_workflow_id associated with the deleted AutoML job cannot be used in an evaluation job.

HTTP REQUEST

DELETE /automl/{automl_id}

Model

A model is the building block of the target_pipeline, feature_pipeline, and model_pipeline that make up a machine learning workflow. See the model documentation for details on which models can be used in the AI API.

List all models

Example Request To Get A Model By Name

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/model?model_name=LinearRegressor"

Example Response

{
    "model":
    [
        {
            "model_name": "LinearRegressor",
            "model_type": "regressor",
            "default_model_parameters":
            {
                "alpha": 1,
                "copy_X": true,
                "fit_intercept": true,
                "max_iter": null,
                "normalize": false,
                "random_state": null,
                "solver": "auto",
                "tol": 0.001
            }
        }
    ]
}

Example Request To Get All Models of a Type

curl -X GET -H "Authorization: Bearer e7cf805b-4307-41e9-8b58-90b6359fa900" \
    "https://api.hydrogenplatform.com/ion/v1/model?model_type=classifier"

Example Response

{
  "model": [
    {
      "model_name": "GaussianNBClassifier",
      "model_type": "classifier",
      "default_model_parameters": {
        "priors": null
      }
    },
    {
      "model_name": "KNeighborsClassifier",
      "model_type": "classifier",
      "default_model_parameters": {
        "algorithm": "auto",
        "leaf_size": 30,
        "metric": "minkowski",
        "metric_params": null,
        "n_jobs": 1,
        "n_neighbors": 5,
        "p": 2,
        "weights": "uniform"
      }
    },
    {
      "model_name": "LogisticRegressionClassifier",
      "model_type": "classifier",
      "default_model_parameters": {
        "C": 1,
        "class_weight": null,
        "dual": false,
        "fit_intercept": true,
        "intercept_scaling": 1,
        "max_iter": 100,
        "multi_class": "ovr",
        "n_jobs": 1,
        "penalty": "l2",
        "random_state": null,
        "solver": "liblinear",
        "tol": 0.0001,
        "verbose": 0,
        "warm_start": false
      }
    },
    {
      "model_name": "SVMClassifier",
      "model_type": "classifier",
      "default_model_parameters": {
        "C": 1,
        "cache_size": 200,
        "class_weight": null,
        "coef0": 0,
        "decision_function_shape": "ovr",
        "degree": 3,
        "gamma": "auto",
        "kernel": "rbf",
        "max_iter": -1,
        "probability": false,
        "random_state": null,
        "shrinking": true,
        "tol": 0.001,
        "verbose": false
      }
    },
    {
      "model_name": "DecisionTreeClassifier",
      "model_type": "classifier",
      "default_model_parameters": {
        "class_weight": null,
        "criterion": "gini",
        "max_depth": null,
        "max_features": null,
        "max_leaf_nodes": null,
        "min_impurity_decrease": 0,
        "min_impurity_split": null,
        "min_samples_leaf": 1,
        "min_samples_split": 2,
        "min_weight_fraction_leaf": 0,
        "presort": false,
        "random_state": null,
        "splitter": "best"
      }
    },
    {
      "model_name": "RandomForestClassifier",
      "model_type": "classifier",
      "default_model_parameters": {
        "bootstrap": true,
        "class_weight": null,
        "criterion": "gini",
        "max_depth": null,
        "max_features": "auto",
        "max_leaf_nodes": null,
        "min_impurity_decrease": 0,
        "min_impurity_split": null,
        "min_samples_leaf": 1,
        "min_samples_split": 2,
        "min_weight_fraction_leaf": 0,
        "n_estimators": 10,
        "n_jobs": 1,
        "oob_score": false,
        "random_state": null,
        "verbose": 0,
        "warm_start": false
      }
    },
    {
      "model_name": "ExtraTreesClassifier",
      "model_type": "classifier",
      "default_model_parameters": {
        "bootstrap": false,
        "class_weight": null,
        "criterion": "gini",
        "max_depth": null,
        "max_features": "auto",
        "max_leaf_nodes": null,
        "min_impurity_decrease": 0,
        "min_impurity_split": null,
        "min_samples_leaf": 1,
        "min_samples_split": 2,
        "min_weight_fraction_leaf": 0,
        "n_estimators": 10,
        "n_jobs": 1,
        "oob_score": false,
        "random_state": null,
        "verbose": 0,
        "warm_start": false
      }
    },
    {
      "model_name": "BaggingClassifier",
      "model_type": "classifier",
      "default_model_parameters": {
        "base_estimator": null,
        "bootstrap": true,
        "bootstrap_features": false,
        "max_features": 1,
        "max_samples": 1,
        "n_estimators": 10,
        "n_jobs": 1,
        "oob_score": false,
        "random_state": null,
        "verbose": 0,
        "warm_start": false
      }
    },
    {
      "model_name": "AdaBoostClassifier",
      "model_type": "classifier",
      "default_model_parameters": {
        "algorithm": "SAMME.R",
        "base_estimator": null,
        "learning_rate": 1,
        "n_estimators": 50,
        "random_state": null
      }
    },
    {
      "model_name": "XGBClassifier",
      "model_type": "classifier",
      "default_model_parameters": {
        "base_score": 0.5,
        "colsample_bylevel": 1,
        "colsample_bytree": 1,
        "fmap": "",
        "gamma": 0,
        "learning_rate": 0.1,
        "max_delta_step": 0,
        "max_depth": 3,
        "min_child_weight": 1,
        "missing": null,
        "n_estimators": 100,
        "n_jobs": 1,
        "objective": "binary:logistic",
        "random_state": 0,
        "reg_alpha": 0,
        "reg_lambda": 1,
        "scale_pos_weight": 1,
        "silent": true,
        "subsample": 1
      }
    }
  ]
}

Retrieve the models that can be used in the target_pipeline, feature_pipeline, and model_pipeline defined in workflow.

HTTP REQUEST

GET /model

QUERY PARAMETERS

Parameter Type Required Description
model_type string optional Filters the models of a specified type. Values can be regressor, classifier, pre-processor, and sampler. If not specified, returns all models
model_name string optional Filters the models of a specified name.

RESPONSE

Field Description
model_name Name of the model class.
model_type Represents the type of model. Values can be classifier,regressor,pre-processor,sampler.
default_model_parameters The default parameters the model takes. This is what an optimize job seeks to optimize.