Selected Publications

Several techniques have been developed to make it easier for non-experts to create animations by leveraging a database of existing cartoons. These methods typically require registering a deformable model to each frame in the database, and then using the deformation parameters to infer the subspace of plausible deformations. We address challenges of performing this registration on cartoon data, that usually lacks texture and exhibits strong expressive articulations. We leverage deep neural networks to learn registering a deformable mesh model to images of cartoon characters. Characters are parametrized as deformable 2.5D meshes with a given initial template and a learned per-instance deformation. Layering is used to handle occlusions and moving body parts. Using experiments on various characters, we demonstrate that our model successfully learns to deform the template based on input images. We also show applications of our model such as inbetweening, user-constrained deformation and correspondence estimation.
Under Review, 2019.

In this paper, we propose novel generative models for creating adversarial examples, slightly perturbed images resembling natural images but maliciously crafted to fool pre-trained models. We present trainable deep neural networks for transforming images to adversarial perturbations. Our proposed models can produce image-agnostic and image-dependent perturbations for both targeted and non-targeted attacks. We also demonstrate that similar architectures can achieve impressive results in fooling both classification and semantic segmentation models, obviating the need for hand-crafting attack methods for each task. We improve the state-of-the-art performance in universal perturbations by leveraging generative models in lieu of current iterative methods. Moreover, we are the first to present effective targeted universal perturbations. Our attacks are considerably faster than iterative and optimization-based methods at inference time. We can generate perturbations in the order of milliseconds.
CVPR, 2018.

Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a result, it is difficult for these methods to handle image pairs with large occlusion or significantly different camera poses. In this paper, we propose novel neural network architectures to estimate fundamental matrices in an end-to-end manner without relying on point correspondences. New modules and layers are introduced in order to preserve mathematical properties of the fundamental matrix as a homogeneous rank-2 matrix with seven degrees of freedom. We analyze performance of the proposed models using various metrics on the KITTI dataset, and show that they achieve competitive performance with traditional methods without the need for extracting correspondences.
ECCV, 2018.

We propose a novel generative model named Stacked Generative Adversarial Networks (SGAN), which is trained to invert the hierarchical representations of a bottom-up discriminative network. Our model consists of a top-down stack of GANs, each learned to generate lower-level representations conditioned on higher-level representations. A representation discriminator is introduced at each feature hierarchy to encourage the representation manifold of the generator to align with that of the bottom-up discriminative network, leveraging the powerful discriminative representations to guide the generative model. Unlike the original GAN that uses a single noise vector to represent all the variations, our SGAN decomposes variations into multiple levels and gradually resolves uncertainties in the top-down generative process. Based on visual inspection, Inception scores and visual Turing test, we demonstrate that SGAN is able to generate images of much higher quality than GANs without stacking.
CVPR, 2017.

Several online real estate database companies provide automatic estimation of market values for houses using a proprietary formula. Although these estimates are often close to the actual sale prices, in some cases they are highly inaccurate. One of the key factors that affects the value of a house is its interior and exterior appearance, which is not considered in calculating these estimates. In this paper, we evaluate the impact of visual characteristics of a house on its market value. Using deep convolutional neural networks on a large dataset of photos of home interiors and exteriors, we develop a method for estimating the luxury level of real estate photos. We also develop a novel framework for automated value assessment using the above photos in addition to home characteristics including size, offered price and number of bedrooms. Finally, by applying our proposed method for price estimation to a new dataset of real estate photos and metadata, we show that it outperforms Zillow’s estimates.
Machine Vision and Applications, 2017.



Reviewer for CVPR, ICCV, IEEE TPAMI, IEEE Transactions on Multimedia, IEEE Transactions on Industrial Electronics


  • Object animation using Generative Neural Networks, with Vladimir Kim, Jun Saito and Eli Shectman (pending)

  • Generating stock images from street-fashion photos via a stack of Generative Adversarial Neural Networks, with Shuai Zheng, Hadi Kiapour and Robinson Piramuthu (pending)