Download PDFOpen PDF in browserCapacity Visual Attention Networks9 pages•Published: September 29, 2016AbstractInspired by recent work in machine translation and object detection, we introduce an attention-based model that automatically learns to extract information from an image by adaptively assigning its capacity across different portions of the input data and only processing the selected regions of different sizes at high resolution. This is achieved by combining two modules: an attention sub-network which uses a mechanism to model a human-like counting process and a capacity sub-network. This sub-network efficiently identifies input regions for which the attention model output is most sensitive and to which we should devote more capacity and dynamically adapt the size of the region. We focus our evaluation on the Cluttered MNIST, SVHN, and Cluttered GTSRB image datasets. Our findings indicate that the proposed model is able to drastically reduce the number of computations, compared with traditional convolutional neural networks, while maintaining similar or better performance.Keyphrases: deep learning, image recognition, machine learning, neural networks, visual attention In: Christoph Benzmüller, Geoff Sutcliffe and Raul Rojas (editors). GCAI 2016. 2nd Global Conference on Artificial Intelligence, vol 41, pages 72-80.
|