API - Tutorial

How to use this Tutorial

The Python code in this tutorial relies on the popular Requests framework, and several other Python utilities, to keep the code simple, but they are not required for interacting with the Redbird API, which consists only of standard RESTful endpoints accessible via HTTP.

The tutorial is centered around a hypothetical account, project, and user. To execute the code examples provided, please substitute your own credentials, project names, collection names, etc. See "Required Setup" below for details.

Additionally, we will be foregoing error handling in an effort to keep the examples concise. See "Error Handling" in the main API documentation for more information about how errors are communicated by the API.

Finally, in this tutorial, we will be building a series of functions centered around each API endpoint, both to keep each example distinct and self-contained, and also so we can reuse them to fetch request parameters for subsequent API calls to other endpoints. You, however, may structure your code however you want. A sandbox script containing all of the examples below, as well as examples of how they might interact with each other, is available here.

Required Setup

Via the Redbird UI, you must have the following entities created and configured before you can begin using the Redbird API:

For all API usage:

A Project
- In the below examples, we just use the default Project Folder for the Project. You may specify any subfolder you wish, so long as your user is a collaborator on that folder.
A User who is a collaborator on that Project
- API authentication will require this User's username and password

For Collection Exploration and Augmentation:

A Collection created in the above Project through the UI
Optional (for collection augmentation from a remote CSV): A CSV hosted at a publicly accessible URL

Dataset Exploration and Slicing:

A Dataset created in the above Project through the UI

Additionally, the below examples will assume the existence of several constants attached to the above entities:

MY_USERNAME = <string> The username of the above User
MY_PASSWORD = <string> The password of the above User
MY_PROJECT_FOLDER_NAME = <string> The name of the above Project Folder
MY_COLLECTION_NAME = <string> The name of the above Collection
MY_DATASET_NAME = <string> The name of the above Dataset

The full sandbox script uses placeholder values for each of these constants that you should update to suit your needs. For the below examples, it is assumed that these have been set to something suitable for your account.

Authentication

The Redbird API uses tokens to authenticate and authorize calls to each endpoint. With an existing username and password, we can request a token to be used with subsequent requests. The following function shows us how to hit the token endpoint, given a username and password:

@functools.lru_cache()
def get_authentication_header(username, password):
  api_url = 'https://app.redbird.io/api/api_token_auth'

  credentials = {
      'username': username, 
      'password': password
  }

  token_response = requests.post(api_url, data=credentials)
  token_str = token_response.json()['token']

  return {
      'Authorization': f'Token {token_str}'
  }

The return value of a function is simply a Python dict, which can be serialized into JSON by the Requests library without any additional work on our end. Please note the specific formatting of the string used as the "Authorization" value, and ensure you prefix your own token the same way.

We use Python's standard library functools.lru_cache decorator to perform memoization on this function. Tokens only need to be retrieved once at the start of execution and can be re-used for each subsequent API call. But in order to keep each of the following examples self-contained, we will invoke the above function each time. Memoization gives us the best of both worlds, but you can choose to instead pass the headers in, or make them accessible some other way.

Project Folder Exploration

The below functions illustrate how to retrieve a list of available Project Folders, and how to list the Collections and Datasets contained therein.

First, let's get all of the Project Folders for our user:

def get_all_project_folders_for_user():
  api_url = 'https://app.redbird.io/api/project_folders'
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)

  response = requests.get(api_url, headers=headers)
  return response.json()['project_folders']

Feel free to print out and explore the JSON returned above, but for our purposes here, we are actually just interested in getting the ID of a specific Project Folder to pull data from. Let's do that now, utilizing the above function, and get a taste of navigating the response JSON in the process:

def get_project_folder_id_by_name(project_folder_name):
  all_user_folders = get_all_project_folders_for_user()

  for p in all_user_folders:
      if p['name'] == project_folder_name:
          return p['id']

  # Otherwise, we didn't match on the name
  raise KeyError(f'No Project Folder named {project_folder_name}!')

Now that we can get the ID of a specific Project Folder, let's use it to list out the Collections and Datasets within. First, our collections:

def get_project_folder_collections(project_folder_id):
  api_url = f'https://app.redbird.io/api/project_folders/{project_folder_id}/collections'
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  response = requests.get(api_url, headers=headers)
  return response.json()['collections']

And now, our Datasets:

def get_project_folder_datasets(project_folder_id):
  api_url = f'https://app.redbird.io/api/project_folders/{project_folder_id}/datasets'
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  response = requests.get(api_url, headers=headers)
  return response.json()['datasets']

Let's see how each of these functions (and the endpoints they hit) might work together:

my_project_folder_id = get_project_folder_id_by_name(MY_PROJECT_FOLDER_NAME)

my_collections = get_project_folder_collections(my_project_folder_id)
my_datasets = get_project_folder_datasets(my_project_folder_id)

print(my_collections)
print(my_datasets)

And with that, you've taken your first steps in using the Redbird API! Next, let's dig deeper into how to interact with specific Collections and Datasets.

Collection Exploration

Building on the examples from "Project Folder Exploration", we can start by putting together a function to extract a specific collection from our Project Folder calls:

def get_collection_id_by_name(project_folder_id, collection_name):
  project_folder_collections = get_project_folder_collections(project_folder_id)

  for c in project_folder_collections:
      if c['name'] == collection_name:
          return c['id']

  # Otherwise, we didn't match on the name
  raise KeyError(f'No Collection with name {collection_name} found!')

With the ID returned by this function, we can request a summary of our specific Collection, as follows:

def get_collection_summary_by_id(collection_id):
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  api_url = f'https://app.redbird.io/api/collections/{collection_id}' 
  collection_summary = requests.get(api_url, headers=headers)
  return collection_summary.json()

With that same ID, we can also request a listing of all columns on the Collection. This might be useful for ensuring that you are supplying the correct columns when attempting to add rows to the Collection via the API (see "Adding Rows to a Collection").

def get_collection_columns_by_id(collection_id):
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  api_url = f'https://app.redbird.io/api/collections/{collection_id}/columns'
  collection_columns = requests.get(api_url, headers=headers)
  return collection_columns.json()['columns']

And here is a quick look at how they might all tie together:

my_collection_id = get_collection_id_by_name(my_project_folder_id, MY_COLLECTION_NAME)

my_collection_summary = get_collection_summary_by_id(my_collection_id)
print(my_collection_summary)

my_collection_columns = get_collection_columns_by_id(my_collection_id)
print(my_collection_columns)

These functions give us everything we need to explore individual Collections within our Project Folder. Later on, in "Adding Rows to a Collection", we'll dive into how we can use a Collection ID to add rows to a Collection. But first, let's take a quick look at how to explore Datasets since those endpoints almost exactly mirror the ones we just used.

Dataset Exploration

Let's dive right into the code for exploring Datasets, since it largely mirrors the functions we defined in "Collection Exploration", just with some changes to the URLs we point our requests towards:

def get_dataset_id_by_name(project_folder_id, dataset_name):
  project_folder_datasets = get_project_folder_datasets(project_folder_id)

  for ds in project_folder_datasets:
      if ds['name'] == dataset_name:
          return ds['id']

  # Otherwise, we didn't match on the name
  raise KeyError(f'No Dataset with name {dataset_name} found!')
  
 
def get_dataset_summary_by_id(dataset_id):
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  api_url = f'https://app.redbird.io/api/datasets/{dataset_id}'
  response = requests.get(api_url, headers=headers)
  return response.json()

def get_dataset_columns_by_id(dataset_id):
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  api_url = f'https://app.redbird.io/api/datasets/{dataset_id}/columns'
  response = requests.get(api_url, headers=headers)
  return response.json()['columns']

As you can see, these functions are almost identical to the ones we implemented before. And they can be used in largely the same way, as seen here:

my_dataset_id = get_dataset_id_by_name(my_project_folder_id, MY_DATASET_NAME)

my_dataset_summary = get_dataset_summary_by_id(my_dataset_id)
print(my_dataset_summary)

my_dataset_columns = get_dataset_columns_by_id(my_dataset_id)
print(my_dataset_columns)

As with Collections, however, that is just the beginning of what we can do with Datasets. In the next section, "Retrieving Dataset Rows", we'll look at how we can query existing Datasets to retrieve a subset of columns and rows.

Retrieving Dataset Rows

There are two main ways to get to the rows within a Dataset: CSV downloads and querying. If you just want to get all of the data in a Dataset, the simplest way is to download a CSV with all of your rows. Building off of the functionality we built in "Dataset Exploration", we can use the Dataset summary to pull down the underlying CSV with just a bit of additional Python code:

def download_dataset_csv(dataset_id, destination_path):
  dataset_summary = get_dataset_summary_by_id(dataset_id)
  csv_path = dataset_summary['csv_path']
  response = requests.get(csv_path)

  with open(destination_path, 'wb') as f_out:
      f_out.write(response.content)

Here we see this new function in action, using some of the other code we've built in this tutorial:

my_project_folder_id = get_project_folder_id_by_name(MY_PROJECT_FOLDER_NAME)
my_dataset_id = get_dataset_id_by_name(my_project_folder_id, MY_DATASET_NAME)
download_dataset_csv(my_dataset_id, '/tmp/download.csv.zip')

From there, we can just unzip /tmp/download.csv.zip, and examine its contents.

However, for a more programmatic and customizable approach, the query endpoint gives us access to a subset of rows, complete with pagination. Let's take a look at a function that sets all of that up for a given Dataset ID:

def query_dataset_by_id(dataset_id, data_format, limit, offset, column_ids, sort_column_id):
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  api_url = f'https://app.redbird.io/api/datasets/{dataset_id}/query'

  # These values will be used in the query string of our HTTP request
  query_params = {
      'format': data_format,  # Only "json" is supported right now
      'limit': limit,         # Max 2000
      'offset': offset,
      'col_id': column_ids,   # A list results in repeated col_id=<id> pairs
      'sort_id': sort_column_id,
  }

  dataset_results = requests.get(api_url, params=query_params, headers=headers)
  return dataset_results.json()['results']

With a hypothetical Dataset ID of "3", a limit of 10, an offset of 0, and column IDs "1" and "2" selected, the request URL created by the Requests framework would look something like this:

https://app.redbird.io/api/datasets/3/query?format=json&limit=10&offset=0&col_id=1&col_id=2

The full description of the query endpoint and how its query string parameters can be used are available in the "Dataset Navigation and Querying" section of the Redbird API specification, under "Querying a Dataset". Fortunately, we don't have to worry about manually constructing the query string thanks to Requests, which will take the dictionary we create and properly append the parameters to our URL.

We can see our query function in action once we combine it with some of our previously created functions:

my_project_folder_id = get_project_folder_id_by_name(MY_PROJECT_FOLDER_NAME)  
my_dataset_id = get_dataset_id_by_name(my_project_folder_id, MY_DATASET_NAME)
my_dataset_columns = get_dataset_columns_by_id(my_dataset_id)

# Let's just pull the first, second, and last columns in the Dataset
columns_to_request = \[my_dataset_columns[0], my_dataset_columns[1], my_dataset_columns[-1]]
column_ids_to_request = \[c['id'] for c in columns_to_request]

# We'll arbitrarily sort by the last column
sort_column_id = my_dataset_columns[-1]['id']

# Let's just pull 10 rows, starting at the beginning of the sort.
results = query_dataset_by_id(my_dataset_id, 'json', 10, 0, column_ids_to_request, sort_column_id)
print(results)

In this example, we just sort of pick some columns to work with arbitrarily, but by inspecting and filtering the columns of a Dataset, you can determine the shape of your results however you wish. By incrementing the offset while maintaining consistent sorting, you can perform pagination to cover as much of the Dataset as you like across multiple requests.

Adding Rows to a Collection

In this last section, we'll take a look at how you can programmatically add rows to an existing collection. As covered in "Collection Augmentation" under the main API documentation, there are three main steps to adding rows to a Collection:

Creating an Update Set
Adding rows to the Update Set
Committing the Update Set

Let's take a look at what each of those steps might look like in a Python program. First, let's create an Update Set:

def create_update_set(collection_id):
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  api_url = f'https://app.redbird.io/api/collections/{collection_id}/update_sets'
  response = requests.post(api_url, headers=headers)
  update_set_id = response.json().get('update_set_id')
  return update_set_id

As you can see, nothing too complex. We just have to make sure to hold on to the ID of the Update Set we just created when we call the function. Next, let's take a look at what adding some rows via JSON might look like:

def add_rows_to_update_set(collection_id, update_set_id, rows):
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  api_url = f'https://app.redbird.io/api/update_sets/{update_set_id}/rows'

  post_data = {
      'row_data': rows,
  }

  return requests.post(api_url, headers=headers, json=post_data)

The final example in this section will give a clear example of the two-dimensional list we are passing in as the row data. For now, the main thing to note is the shape of the JSON POST request we are sending the server. The Requests library takes care of serializing basic Python types into their JSON equivalents (dicts into objects, lists into arrays, etc.). As long as our rows can be serialized, we're good to go.

We can also add rows from a remote CSV, so let's look at a function that does that:

def add_csv_rows_to_update_set(update_set_id, csv_url, has_header):
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  api_url = f'https://app.redbird.io/api/update_sets/{update_set_id}/rows'

  post_data = {
      'csv_url': csv_url,
      'has_header': has_header,
  }

  return requests.post(api_url, headers=headers, json=post_data)

As noted in the "Add Rows to an Update Set" section of "Collection Augmentation", the URL provided must point to a publicly accessible CSV file. We'll use a placeholder in the final example code below to showcase how this function would be used in tandem with the rest of the endpoints.

Finally, let's look at what the call to the commit endpoint looks like:

def commit_update_set(update_set_id):
   headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
   api_url = f'<https://app.redbird.io/api/update_sets/{update_set_id}/commit'>
  
   return requests.post(api_url, headers=headers)

Again, nothing particularly complicated here, we just hit the endpoint and let the API take care of the rest.

Bringing all of these functions together (and some of the ones we built elsewhere in this tutorial), a complete flow to add a row to a collection might look something like this:

my_update_set_id = create_update_set(my_collection_id)

# Add rows from JSON
row_data = \[[1, 2, 3, 4, 5, 6, 7]]
add_rows_to_update_set(my_update_set_id, row_data)

# Add some more rows from a CSV
my_csv_url = '<https://my.domain.com/path/to/my/csv/file.csv'>
add_csv_rows_to_update_set(my_update_set_id, my_csv_url, True)

commit_update_set(my_update_set_id)

This last example shows us each of the main steps outlined at the beginning. First, we create our Update Set. Then we add two sets of rows to it, one via JSON as a two-dimensional list, the other from a publicly hosted CSV. Finally, we commit the entire set of changes to our collection, and we're done!

Workflow Canvas Exploration

In the following sections, we'll look at how to navigate and interact with Workflow Canvases — the pipelines you build in the Redbird UI. To run the examples below, you'll need an additional constant representing the name of a Workflow Canvas in your project:

MY_WORKFLOW_CANVAS_NAME = <string> The name of the above Workflow Canvas

First, let's retrieve a list of all Workflow Canvases accessible to our user:

def get_all_workflow_canvases():
  api_url = 'https://app.redbird.io/api/workflow_canvases'
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  response = requests.get(api_url, headers=headers)
  return response.json()['workflow_canvases']

And a helper to look up a specific canvas by name:

def get_workflow_canvas_id_by_name(canvas_name):
  all_canvases = get_all_workflow_canvases()

  for c in all_canvases:
      if c['name'] == canvas_name:
          return c['id']

  raise KeyError(f'No Workflow Canvas named {canvas_name}!')

Once we have a canvas ID, we can list the Nodes it contains:

def get_workflow_canvas_nodes(workflow_canvas_id):
  api_url = f'https://app.redbird.io/api/workflow_canvases/{workflow_canvas_id}/nodes'
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  response = requests.get(api_url, headers=headers)
  return response.json()['nodes']

We can also list the Automated Runs configured on that canvas:

def get_workflow_canvas_automated_runs(workflow_canvas_id):
  api_url = f'https://app.redbird.io/api/workflow_canvases/{workflow_canvas_id}/automated_runs'
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  response = requests.get(api_url, headers=headers)
  return response.json()['automated_runs']

If you want a flat list of all Automated Runs across every canvas you have access to, there is a top-level endpoint for that too:

def get_all_automated_runs():
  api_url = 'https://app.redbird.io/api/automated_runs'
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  response = requests.get(api_url, headers=headers)
  return response.json()['automated_runs']

Here is how these functions might work together:

my_canvas_id = get_workflow_canvas_id_by_name(MY_WORKFLOW_CANVAS_NAME)

my_nodes = get_workflow_canvas_nodes(my_canvas_id)
print(my_nodes)

my_automated_runs = get_workflow_canvas_automated_runs(my_canvas_id)
print(my_automated_runs)

Running Workflows and Checking Status

With the IDs we collected above, we can now trigger runs programmatically. Both execution endpoints are asynchronous — they return a run_id immediately, and the actual work happens in the background. We can then poll for status using that ID.

Let's start with a function to run a single Node:

def run_workflow_node(workflow_canvas_id, node_id):
  api_url = f'https://app.redbird.io/api/workflow_canvases/{workflow_canvas_id}/nodes/{node_id}/run'
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  response = requests.post(api_url, headers=headers)
  return response.json()['run_id']

And a function to trigger an Automated Run on demand:

def run_workflow_automated_run(workflow_canvas_id, automated_run_id):
  api_url = f'https://app.redbird.io/api/workflow_canvases/{workflow_canvas_id}/automated_runs/{automated_run_id}/run'
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  response = requests.post(api_url, headers=headers)
  return response.json()['run_id']

Both functions return a run_id. We can pass that to a status-polling function to track progress:

def get_workflow_run_status(run_id):
  api_url = f'https://app.redbird.io/api/workflow_runs/{run_id}/status'
  headers = get_authentication_header(MY_USERNAME, MY_PASSWORD)
  response = requests.get(api_url, headers=headers)
  return response.json()

Putting it all together — here is an example that triggers a node run, then polls until it is no longer running:

import time

my_canvas_id = get_workflow_canvas_id_by_name(MY_WORKFLOW_CANVAS_NAME)
my_nodes = get_workflow_canvas_nodes(my_canvas_id)

# Run the first node on the canvas
node_id = my_nodes[0]['id']
run_id = run_workflow_node(my_canvas_id, node_id)
print(f'Started run {run_id}')

# Poll until the run is no longer active
while True:
  status_info = get_workflow_run_status(run_id)
  status = status_info['status']
  print(f'Status: {status}')
  if status not in ('Running', 'Pending'):
      break
  time.sleep(5)

print(f'Run finished with status: {status}')
print(f'Started at: {status_info["run_time_start"]}')
print(f'Ended at:   {status_info["run_time_end"]}')

The same pattern applies when triggering an Automated Run — just swap run_workflow_node for run_workflow_automated_run and supply the appropriate automated_run_id.