The Rorodata CLI¶

The primary interface to the rorodata platform is the command-line interface.

The command-line client can be installed using:

pip install roro-client

Make sure you have at least version 0.1.6 of the client.

$ roro version
roro, version 0.1.6

Once installed, make sure you log in to the platform using:

$ roro login
Email address: anand@rorodata.com
Password:
Login successful.

It is prompt for your email and password. If you don’t already have an account, please sign up at http://www.rorodata.com/.

Most of the commands work in the context of a project and they must be executed from the project directory. These commands look at the roro.yml file to find the project name.

Projects¶

The list of projects can be found using the roro projects command.

$ roro projects
hello-world
credit-risk

New project can be created using:

$ roro create my-new-project
Created project: my-new-project

The project name is unique across the platform and a project name once used by any user can not be used by anyone else.

Support for deleting a project is in progress and will be available in future versions.

Deploy¶

The roro deploy command is used to deploy a project. Deploy is the only way to send any changes in the project to the platform, including code changes, changes to roro.yml to add/delete more services/periodic tasks.

While deploy is just a single command, lot of things happen behind the scenes.

all the files in the project directory are archived and send to the platform
A docker image is created with the specified runtime as base image and all the dependencies in the requirements.txt file installed, if present
All the services specified in the roro.yml are (re)started and end points are created
The scheduled tasks are updated

The deploy command prints the summary of changes and end points for each service in the project.

$ roro deploy
Deploying project credit-risk. This may take a few moments ...
Building docker image... done.
Updating scheduled jobs... done.
Restarted one service.
  default: https://credit-risk.rorocloud.io/

Deployed version 5 of credit-risk project.

Scripts & Notebooks¶

The roro run command is used to run any script, typically a python program in the rorodata platform.

$ roro run python train.py
Started new job 4fa27081

That starts a new job and that runs on the platform. You can look at the logs of the job using the roro logs command, which contains the all logs printed by the script.:

$ roro logs 4fa27081
starting the job
training decision tree model...
training complete.
the model is saved to /volumes/data/model.pkl

Please remember that it uses the code that is last deployed. If you have any changes to the code that you want to run, you need to deploy before running the script.

Notebooks can be run using the roro run:notebook command.

$ roro run:notebook
starting the job
Jupyter notebook is available at:
https://517832f3.rorocloud.io/?token=rorocloud

The jupyter notebook server can be stopped using:
    roro stop 517832f3

It starts a new notebook in the project’s software environment created using the previous deploy and provides a URL endpoint to access it.

Please remember that the notebook server continues to run until it is stopped.

Processes & Logs¶

The list of processes currently running in a project can be found using the roro ps command.

$ roro ps
JOBID     STATUS    WHEN           TIME     INSTANCE TYPE    CMD
--------  --------  -------------  -------  ---------------  -------------------
c19f745b  running   7 seconds ago  0:00:07  C1               python train.py
137f3d2a  running   9 seconds ago  0:00:07  C1               [notebook]

A process can be stopped using roro stop command.

$ roro stop 137f3d2a

The logs of any process can be seen using the roro logs command.

$ roro logs c19f745b
started training
iteration 100 - accuracy 0.57
iteration 200 - accuracy 0.65
iteration 300 - accuracy 0.68
iteration 400 - accuracy 0.69

The roro ps command shows only the active processes. To see all processes ever run in the project, call with -a flag.

JOBID     STATUS    WHEN           TIME     INSTANCE TYPE  CMD
--------  --------  -------------  -------  -------------  ---------------
c19f745b  running   7 seconds ago  0:00:07  C1             python train.py
137f3d2a  running   9 seconds ago  0:00:07  C1             [notebook]
18cb1ce2  success   1 day ago      0:00:01  C1             python task.py
d75e8553  success   1 day ago      0:00:01  C1             python task.py
f95b01a1  success   2 days ago     0:00:02  C1             python task.py
71fe89cc  success   2 days ago     0:00:02  C1             python task.py
b46cbb8e  success   3 days ago     0:00:02  C1             python task.py
dd75b3fb  success   3 days ago     0:00:02  C1             python task.py

Volumes¶

The rorodata platform has built-in support for volumes for storing persistent data. By default, two volumes data and notebooks are created for every project when the project is created. The volumes used to store any input data, intermediate results, checkpoints and final results.

Volumes can also be used for storing machine learning models, but the model management system provided by the rorodata platform offers much better capabilities.

To roro volumes command can be used to list the volumes in a project.

$ roro volumes
data
notebooks

New volumes can be created using the roro volumes:add command.

$ roro volumes:add new-volume-name
Volume new-volume-name added to the project credit-risk

To list files in a volume:

$ roro volumes:ls notebooks
credit-risk.ipynb

Files can copied to and from a volume.

For example, to copy a local file to data volume:

$ roro cp dataset.csv data:dataset.csv

Or the other way:

$ roro cp data:dataset.csv dataset.csv

Config¶

The rorodata platform provides support for storing the project secrets like database urls, access and secret keys for various third-party services, etc.

The config variables are set in the environment of every process that is run in the project.

The roro config comamnd lists all the available config variables.

$ roro config
=== credit-risk Config Vars
DATABSE_URL: postgres://yxulQ5Ib9:QOJoFJZwv5FYIM0y@db1.example.com

One or more config variables can be added using the roro config:set command.

$ roro config:set X=1 Y=2
Updated config vars

$ roro config
=== credit-risk Config Vars
DATABSE_URL: postgres://yxulQ5Ib9:QOJoFJZwv5FYIM0y@db1.example.com
X: 1
Y: 2

The roro config:unset command is used to unset config vars.

$ roro config:unset X
Updated config vars

$ roro config
=== credit-risk Config Vars
DATABSE_URL: postgres://yxulQ5Ib9:QOJoFJZwv5FYIM0y@db1.example.com
Y: 2

Please remember that the services are not restarted after config:set or config:unset. They may have to be restarted using the roro deploy command to use the new configuration.