The summer after my junior year, I interned at LISA, the machine learning lab of Universite de Montreal. I worked on different aspects of scaling up neural networks.
- I worked on an idea called conditional computation, where additonal neural networks called gaters determine which units in the original network should be computed corresponding to a particular input. We can exploit the sparsity of activated units, by calculating only for these units and save computation. I built models based on this with different architectures and investigated their performance.
- When dealing with very large dictionaries for neural language models, the normalization factor in the output softmax layer becomes intractable. I implemented Hierarchial softmax and Noise contrastive estimation to overcome this issue and compared their efficiency against training with a regular softmax layer.
- I also built a system to generate n-grams on the fly (at run time) for very large datasets. For such datasets, it becomes infeasible to generate and store all possible n-grams because of the memory size required for that.
- The development was done in Python, using the libraries Theano and Pylearn2.