No need to input a password when creating an ssh client to an ssh server.

Any machine can become an ssh server by install:

$ sudo apt-get install openssl-server

you can check ssh server running status by:

$ sudo service ssh status

Note: the ssh server will be auto-run after the above installation.

Any machine can be an ssh client.

On the client machine, we will generate a pair of RSA private/public keys by:

$ ssh-keygen -r rsa
$ scp /home/bxiong/.ssh/jetson_rsa.pub bxiong@192.168.1.74:/home/bxiong/.ssh

On the server machine, add the public key to the authorized file by:

$ mv ~/.ssh/jetson_rsa.pub ~/.ssh/authorized_keys

Thne we should be able to make a ssh connection w/o prompt of inputting password now by:

$ ssh bxiong@192.168.1.74

In the previous post, I described how to implement “loss” for Sweeps, this post will describe how to implement “epoch” for Sweeps.

We need to update epoch in every training, so we define a customer logger for Lightning. The logger will be executed by every training.


This post is for how to implement loss. Next post will do epoch.

Lightning is “The ultimate PyTorch research framework. Scale your models, without the boilerplate.”

Lightly tells companies which subset of their data to label to have the biggest impact on model accuracy.

Wandb is a tool/platform to “Build better models faster with experiment tracking, dataset versioning, and model management.”

Sweeps is one of wandb’s products for “Scalable, customizable hyperparameter tuning.”

I recently used it with Lightly/Lightning to adjust my self-supervised learning Moco/SimCLR v2 for a Kaggle competition of Covid-19.

ps: the “wandb/sweeps” comes to my attention because a…


Learning from Kaggle competition of Covid-19.

I used Moco to try my model with the following data loader:

First, I think tiff_loader can be used by .png, .jpeg, etc… It is true. I proved it by loaded the image and display it.

Second, I thought the size of the image may cause a training error, by giving it a different size of the image, I proved that it is NOT this case.

After running out the idea, It came to this only most unlikely cause: the training process cannot read images from the system input.

To prove that, I copied the image from ../input to my ./kaggle/working folder,


Is the self supervised learning really better than supervised learning? here is a real case: I am using SSL (Moco) to compete with others (Yolo, Efficient Net, RestNet, etc)

The competition

https://www.kaggle.com/c/siim-covid19-detection:

How many teams?


Summary and note from great article https://towardsdatascience.com/whats-init-for-me-d70a312da583

Total 3 methods with what method they use for popular packages like pandas, NumPy, etc.

A often used example (borrowed from PyTorch-lightning):


One Diagram worths ten pictures.

Why?

Sometimes, as experienced programmers, we don’t want to be able to coding by reading long formal documentations.

Solution

For myself, I like to make & keep a short-annotated diagram for future reference as below:

If you have NOT used PyTorch-lightning, keep reading, otherwise, we are done.

step 1: define data set path in dataset_train_simclr

step 2: define dataset_train_simclr in dataloader_train_simclr

step 3: oops, my bad, it is the same as step 2

step 4: define data_set_train_simclr in encoder

step 5: encoder is doing the training job.


The semi-self supervise annotation is the key to train a product-ready model from my two years of medical cell’s AI experience, and DINO may be the best.

I just want to make my note as short as possible:

  • SimCLR NOT only requests negative cells, which is much easier to get, also requests positive cells, which is hard to get. DINO doesn’t request positive cells.
  • DINO has similarity measurement, which can even be identifying the copy of images, SimCLR doesn’t have this feature.
    Why this is important: 1) We’ve countered such cases: cells dataset is duplicated by the sources, labeling doctors, etc; 2) Detecting similarities of synthesized cells. 3) Obtained non-duplicated features of partial cells.


No answer in StackOverflow (yet), when you relocate a middle-large go project, you will encounter this issue.

Background: I am moving the built environment of our AI Cervical Cancer project from Virtual Box to Vmware Fusion, the project is using:

  • Github (public/private) for source code
  • Ubuntu 16.04 (guest operating system)
  • Mac OS (Big Sur)
  • VMware Fusion (v12.04, free version)
  • Swag (v1.7.0, for generating Restful API)
  • Vue.js (GUI, front end)
  • Golang (v1.13.3, back end)

Problem

The github.com/qiniu/x revision 7.0.8 doe NOT exist anymore. Most advises are suggesting to modify the version in the go.mod, here is our go.mod:


The problem:

$ swag init
...
go get github.com/paulxiong/cervical@1b2d2657e8dab3ba41226f02bbc79fac089290c6:downloaded zip file too larg

To background:

Many engineers have the same issue, posted on stocker overflow and Reddit… The only solution, I did not try, is to separate the repository into smaller repositories(each <500M).

My solution:

Since this limitation is defined by Golang, I will build a go command, called go_new, without the 500M limitation.

(To save the reader’s time, I will NOT write the debug steps here, directly jump to where the code should be modified and how to build it. )

  • copy $GOROOT to a new place and make…

Paul Xiong

Medical AI, computer vision, interest:self-supervised learning+annotation, imbalanced datasets, AI in embedded devices. Personal AI project: cervic.hopto.org

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store