2018-08-02

Why does unsupervised pre-training help deep learning?

日本語で書いていくことにした。英語で書くのが煩わしさに繋がっていた気がする、そもそも英語で書く意味あんまなかった。

今回の論文↓

タイトル : Why does unsupervised pre-training help deep learning? (2010)

http://www.jmlr.org/papers/volume11/erhan10a/erhan10a.pdf

著者はBengioさんのグループ。 2010年とこの業界的にはかなり前のやつだが、読まないといけない事情があるので読んでみる、ジャーナルなので結構長いが頑張って読む。

概要

最近ディープラーニングが、特に画像とか言語とかでまじで強いけど、いい結果のものは大抵教師あり学習のタスクであっても、教師なしの過程も含んでいる。で、なぜ教師なし学習が有効なのか？ってことを調べたい。モデルの構造、容量、訓練データの数等について、いくつか仮説を立てて実験してみる。実験によって、汎化性能が高くなるような最適解により行きつきやすくなること、教師なし学習は正則化と考えられるだろうことを示した。ということらしい。

↑
この間2日ぐらい
↓
全部読むのは疲れてやめました。
ちなみに今回の教師なしの事前学習と同じ効果がReLUによって行えていて、さらに学習時間も速くなるというのが以下。
いわゆるみんな大好きなReLUが出てきた論文。

proceedings.mlr.press

自分が機械学習とかに触れ始めたころはReLUがファーストチョイスで当たり前のように使われていたがこういった流れだったのかと。ちなみに上の論文はLecun、Hinton、Bengioさんというオールスターによる2015年のNatureの論文の中で触れられている。歴史を感じた。

Deep learning | Nature

2018-06-04

seq2seq

[1409.3215] Sequence to Sequence Learning with Neural Networks

github.com

The above github is the implementation of seq2seq with Chainer.

2018-05-20

Learning to See in the Dark

http://web.engr.illinois.edu/~cchen156/SID.html

CVPR 2018

2018-04-27

mini-batch size on deep learning

Training with large minibatches is bad for your health.
More importantly, it's bad for your test error.
Friends dont let friends use minibatches larger than 32. https://t.co/hxx2rGhIG1
— Yann LeCun (@ylecun) April 26, 2018

[1804.07612] Revisiting Small Batch Training for Deep Neural Networks

According to this paper, mini-batch size is recommended to be less than 32.

2018-04-09

Semi-Supervised Classification with Graph Convolutional Networks

[1609.02907] Semi-Supervised Classification with Graph Convolutional Networks

ICLR 2017

2018-03-23

A good slide

This is Jure's good slide about graph representaiton learning.

This slide helps me understanding of recent embedding learning and do survey about it.

It is very useful.

Graph Representation Learning from Jure Leskovec

www.slideshare.net

2018-03-20

Inductive Representation Learning on Large Graphs

[1706.02216] Inductive Representation Learning on Large Graphs (NIPS 2017)

This paper's contribution is mainly to generate a function, which extracts a feature represeantation on unseen nodes and graphs.

They trained model by a graph, and at an inference time, they use the trained system (i.e., the learned function) to give feature embedding on each node.

Moreover, they invent a unsupervied way of learning, so without specifik label or something like that, they can train model. Of cource they do in supervised manner.

In some places, I find ideas similar to node2vec( https://cs.stanford.edu/~jure/pubs/node2vec-kdd16.pdf ). Jure is the last author in this paper, too, so probably they are thanks to his advice.