The Good, the Bad and the Ugly aka Ruby && Data Science

About

I’ve switched from Web Development to Data Science and today it automatically means that I had to become a Python programmer.

That’s good (I know for this, I don’t get likes from Rubyists), but Ruby is also trying:

https://github.com/arbox/machine-learning-with-ruby

Ruby is (for now) not a Data Science centric language with a very large established library. We don’t have such ecosystem as SciPy in the Python. I do not see for this any serious reasons, except for historical. This is my pain.

The reason is very simple. Because I love to use Ruby and love his syntax. So I suppose that in some simple cases Ruby is not so bad instrument.

The Good

What was made:

  1. Some time ago Practical AI organization released a series of articles about AI and Machine learning in Ruby, but the posts were very naive with “Hello World” level tasks, and the project has ended his life at October 2017.
  2. There’re some things like tensorflow.rb, gems based on PyCall, divorced libraries or articles about very narrow solutions like captcha solving, image recognition and etc.

So they are capable of solving a single, isolated task without any means to integrate into something bigger.

But all of them just solve one task and then will die when it would be a question about integration with another library. There’s no interface to rule them all.

The Ugly

Ruby + Daru + Numo-Narray != Python + NumPy + SciPy + Pandas

One day I decided to play with data and solve this kaggle task without using Python or R languages. Only Ruby — only hardcore.

Quite a stupid idea, but that evolved into my first PR in Ruby/CSV and more questions about the interface which connects all Ruby gems. So the experience was useful.

I’ve found an easy tutorial using Python and started to reproduce it step-by-step in Ruby. Each step in Ruby definitely a failure, comparing to Py. Much worse and much slower to perform. In cases when Python needs 1 LOC, Ruby needs 3–4 lines.

Some interesting link about comparison Python and Ruby for Data refinement to prove my words with real code.

The Bad

Framework for everything.

As I mentioned above the critical issue is broken and unrelated structure of DS gems.

It is how proper Ruby gems work: they solve one task, and solve it well (or die trying). Our community is not building some “framework for everything”.

© Ruby Community

Our problem is an unreasonable fear of dependency hell or realization overhead, idk exactly.

Python solved this problem easily. SciPy is a bunch of little compatible libraries, but they are stitched by one common interface — NumPy.

Right way: We just need to build a meta-gem which combines everything and installs everything, some kind of interface which bundles them. That’s the solution which makes things simpler in my view of this world :)

Current way: If one needs only a part of a library and sees the whole bunch of other implementations as overhead he can to extract the code and build a targeted gem.

What we can do?

All the pain is that no one uses Ruby for mathematics.

In my opinion, the interface for NumPy in this plan is the most successful of those. In Ruby, we just need to re-implement the same interface.
Because for instance pulling the matrix out of distribution is accounted for among a tons of other code, and this should be done as quickly as possible.

Okay, even if we won’t make the super newfangled framework for Data Science, it would be great if each of the future commentators translates one method from sklearn into rubies — this world will become much better.

Just for comparison

(1) Python today:

import numpy as np
np.random.normal(mu, sigma, size)

(1) Ruby today:

require 'daru'
require 'distribution'
rng = Distribution::Normal.rng(mu, sigma)
Daru::Vector.new(size.times.map { rng.call })

(1) Ruby tomorrow:

require 'daru'
Daru::Vector.new(size, normal(mu, sigma)) #your syntax can be here

P.S.

I have already reconciled and have long been writing on the python and even found many advantages in it.

this bad boy can fit so many behaviors in it

Sooner or later you have to get hooked on the machine learning needle so you can start right now:
https://www.youtube.com/watch?v=T1nFQ49TyeA

Or at least if you are already involved in DS or just start your path try to go through this blitz about deep learning:

http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

24 y.o. Lead Software Engineer; github: wowinter13