Some awesome things I ran through in GSoC'21

As a part of GSoC with probml, I have learnt about some new things that are pretty interesting, useful and experimented with some stuff that I found exciting.

In this post I will briefly mention about a bunch of them!


Making our own tfrecords for a dataset



I was not aware of tensorflow records(which are used in preprocessing pipeline for tfds) before, these are a great way of maintaining data as a whole datapoint will be stored as a tf-string. So as to have a shot, I tried converting Imagenet2012 dataset to tfrecords. I did it for the validation dataset for the practice.

This is a glimpse of the encoding and decoding of a sampletfrecord:




For a full example, you can checkout this notebook in colab.

Me and my mentor also thought of making a generic version of this tfrecord convertor pipeline that can take care of any type of Image dataset but we couldn't get the time to make it more generic.(However tensorflow has pretty good builders for almost every famous dataset πŸ˜€)


Knowing about wandb(Weights & Biases)

During the project, I got to know about this awesome platform.

wandb is a combination of experiment tracking, hparam tuning, dataset versioning and many other things with being able to integrate with most of the ML frameworks like pytorch, keras etc.

It is very simple to start using, all you need to do is to init the wandb instance by logging-in and you can log all your parameters like any other logger. You get great visualizations of your experiment along with your system usage metrics. 

  




Click on the pics to checkout the source

  • You can also do hyperparam sweeps! to your model by choosing your own tweaks!.
  • You can make reports with the selected charts(you can create your own ones too!).
  • You can version control both your model and dataset using wandb artifacts
and many more!.

Its great!, if you already knew about this, if not I would highly suggest you to try this for your next ML workπŸ™Œ.


Pytorch loaders ⇔ tensorflow-ds

During the project, I faced the following issue, An Image-dataset which is in the form of tfds has to be trained/infered with a pre-trained pyorch model, and this model has a preprocessing stage which is in pytorch.

For this case, I had to make a torch dataset from the tfds numpy iterator and then make a mapping function for the preprocessing, which is then feeded to a torch loader.

As a try, I made a generic torch⇔tensorflow dataloader convertors, by making use of tensorflow's nest module, from_generator and torch's Dataset inheriting.

A glimpse of the code I made:

Click on the pics to checkout the source


This seems to be simple superficially but this becomes tricky when we are needed to do multi-process loading and also data parallelism on GPUs/TPUs for training. We have to be careful while passing multi-worker argument in this pipeline, improper usage may stuck your training/inference job. 
        
        You can avoid this by using the multi-workers only once in the pipeline either at the start or at the end of the pipeline. You can refer to this example in probml repo to check that out where we use a jax model to infer a tfds on TPUs

Tmux - ssh connection

When I was wondering how to keep a process in ssh server alive, eventhough I get disconnect from it, my mentor suggested tmux. It is a terminal multiplexer, i.e it allows you to have multiple windows within a single terminal which again can have multiple panes. So you can do run multiprocesses simultaneously in the same environment!.






But did this solve the problem i mentioned before?, yes!, tmux sessions have an attach/detach facility, i.e you can leave a session with the currently running process without getting interrupted. To solve the above problem all we need to do is to start a tmux session on the ssh server and run our process (a model training in my case) and you can disconnect from the ssh without any worries. Your process still runs on your remote server and you check its status by logging in it again.

You can refer to this post to know how to do this!


Comments

Popular posts from this blog

Summary of GSoC'21 (under Tensorflow) with pyprobml