Some awesome things I ran through in GSoC'21
As a part of GSoC with probml, I have learnt about some new things that are pretty interesting, useful and experimented with some stuff that I found exciting.
In this post I will briefly mention about a bunch of them!
Making our own tfrecords for a dataset
I was not aware of tensorflow records(which are used in preprocessing pipeline for tfds) before, these are a great way of maintaining data as a whole datapoint will be stored as a tf-string. So as to have a shot, I tried converting Imagenet2012 dataset to tfrecords. I did it for the validation dataset for the practice.
This is a glimpse of the encoding and decoding of a sample⇔tfrecord:
For a full example, you can checkout this notebook in colab.
Me and my mentor also thought of making a generic version of this tfrecord convertor pipeline that can take care of any type of Image dataset but we couldn't get the time to make it more generic.(However tensorflow has pretty good builders for almost every famous dataset π)
Knowing about wandb(Weights & Biases)
During the project, I got to know about this awesome platform.
wandb is a combination of experiment tracking, hparam tuning, dataset versioning and many other things with being able to integrate with most of the ML frameworks like pytorch, keras etc.
It is very simple to start using, all you need to do is to init the wandb instance by logging-in and you can log all your parameters like any other logger. You get great visualizations of your experiment along with your system usage metrics.
Click on the pics to checkout the source
- You can also do hyperparam sweeps! to your model by choosing your own tweaks!.
- You can make reports with the selected charts(you can create your own ones too!).
- You can version control both your model and dataset using wandb artifacts
Its great!, if you already knew about this, if not I would highly suggest you to try this for your next ML workπ.
Pytorch loaders ⇔ tensorflow-ds
Tmux - ssh connection
But did this solve the problem i mentioned before?, yes!, tmux sessions have an attach/detach facility, i.e you can leave a session with the currently running process without getting interrupted. To solve the above problem all we need to do is to start a tmux session on the ssh server and run our process (a model training in my case) and you can disconnect from the ssh without any worries. Your process still runs on your remote server and you check its status by logging in it again.

Comments
Post a Comment