Rorodata Platform

Rorodata is a cloud platform that lets data science teams prototype, build and deploy machine learning applications faster by abstrcting away the non data science activities and streamlining the data science activities.

In a nutshell, the platform takes care of:

  • provisioning hardware instances on demand
  • managing software environments
  • running scripts, services and notebooks and managing their URL endpoints
  • scheduling periodic tasks
  • managing data volumes
  • managing multiple versions of machine learning models

Quick Start

The rorodata platform is modeled around projects. Each project is independent unit of work with its own code, data, services and machine learning models.

The primary interface to work with the rorodata platform is using a command-line tool called roro. It can be installed using pip:

$ pip install roro

It is suggested to use Python 3 when installing the client.

You can verify the version of the client, using:

$ roro version
roro, version 0.1.6

Once installed, make sure you log in to the platform using:

$ roro login
Email address: anand@rorodata.com
Password:
Login successful.

It is prompt for your email and password. If you don’t already have an account, please sign up at http://www.rorodata.com/.

The list of available projects can be found using:

$ roro projects
hello-world
credit-risk

And you can create a new project using:

$ roro create my-new-project
Created project: my-new-project

You can find the available commands using:

$ roro --help

And help about any particular command using:

$ roro <command-name> --help

Project Organization

Each project in the rorodata platform contains a special file named roro.yml. It specifies project-name, the runtime, the services to run and the periodic tasks.

Sample organization of a project looks something like this:

credit-risk/
├── predict.py
├── requirements.txt
├── roro.yml
└── train.py

The roro.yml file looks something like this:

project: credit-risk
runtime: python3

services:
  - name: default
    function: predict.predict

The field project indicates the name of the project. Project name is unique. The field runtime indicates the software runtime to use. The default runtime is python3. The available runtimes are described later in this section.

The field services indicates the services to run. The file format of the roro.yml file is described in detail in section below.

Deploying the Project

Once the code and the roro.yml file are ready, you can deploy the project using the deploy command.

$ roro deploy
Deploying project credit-risk. This may take a few moments ...
Building docker image... done.
Updating scheduled jobs... done.
Restarted one service.
  default: https://credit-risk.rorocloud.io/

Deployed version 5 of credit-risk project.

Please remember that the deploy command must the run from the project directory, the directory where the roro.yml file is present.

The deploy command all the contents of the project directory and submit it to the platform. The platform looks at the roro.yml file and creates a new docker image with the latest code using the specified runtime as the base image and installing any python packages specified in the requirements.txt file, if present.

After creating the docker image, it continues to run the specified services and expose them at an URL end point. The service with name default is considered special and that service will be exposed at https://<project-name>.rorocloud.io/ and all other services will be exposed as https://<project-name>--<service-name>.rorocloud.io/.

Running Scripts

The roro run command is used run any script on the rorodata platform.

$ roro run python training.py Started new job 4fa27081

That starts a new job and that runs on the platform. You can look at the logs of the job using the roro logs command, which contains the all logs printed by the script.:

$ roro logs 4fa27081
starting the job
training decision tree model...
training complete.
the model is saved to /volumes/data/model.pkl

Please remember that it uses the code that is last deployed. If you have any changes to the code that you want to run, you need to deploy before running the script.

Running notebooks

Notebooks can be run using the roro run:notebook command.

$ roro run:notebook
starting the job
Jupyter notebook is available at:
https://517832f3.rorocloud.io/?token=rorocloud

The jupyter notebook server can be stopped using:
    roro stop 517832f3

It starts a new notebook in the project’s software environment created using the previous deploy and provides a URL endpoint to access it.

Please remember that the notebook server continues to run until it is stopped.