Accepted Papers

  1. Research Track Papers
  2. Applied Data Science Track Papers

Research Track Papers

A free energy based approach for distance metric learning
Sho Inaba (University of Massachusetts Boston); Carl Fakhry (University of Massachusetts Boston); Rahul Kulkarni (University of Massachusetts Boston); Kourosh Zarringhalam (University of Massachusetts Boston);

A Hierarchical Career-Path-Aware Neural Network for Job Mobility Prediction
Qingxin Meng (Rutgers- the State University of New Jersey); Hengshu Zhu (Baidu); Keli Xiao (Stony Brook University); Le Zhang (School of Computer Science, University of Science and Technology of China); Hui Xiong (Rutgers- the State University of New Jersey);

A Memory-Efficient Sketch Method for Estimating High Similarities in Streaming Sets link
Pinghui Wang (Xi'an Jiaotong University); Yiyan Qi (Xi'an Jiaotong University); Yuanming Zhang (Xi'an Jiaotong University); Chenxu Wang (Xi'an Jiaotong University); Qiaozhu Zhai (Xi'an Jiaotong University); Xiaohong Guan (Xi'an Jiaotong University); John C.S. Lui ( Tsinghua University);

Estimating set similarity and detecting highly similar sets are fundamental problems in areas such as databases, machine learning, and information retrieval. MinHash is a well-known technique for approximating Jaccard similarity of sets and has been successfully used for many applications such as similarity search and large scale learning. Its two compressed versions, b-bit MinHash and Odd Sketch, can significantly reduce the memory usage of the original MinHash method, especially for estimating high similarities (i.e., similarities around 1). Although MinHash can be applied to static sets as well as streaming sets, of which elements are given in a streaming fashion and cardinality is unknown or even infinite, unfortunately, b-bit MinHash and Odd Sketch fail to deal with streaming data. To solve this problem, we design a memory efficient sketch method, MaxLogHash, to accurately estimate Jaccard similarities in streaming sets. Compared to MinHash, our method uses smaller sized registers (each register consists of less than 7 bits) to build a compact sketch for each set. We also provide a simple yet accurate estimator for inferring Jaccard similarity from MaxLogHash sketches. In addition, we derive formulas for bounding the estimation error and determine the smallest necessary memory usage (i.e., the number of registers used for a MaxLogHash sketch) for the desired accuracy. We conduct experiments on a variety of datasets, and experimental results show that our method MaxLogHash is about 5 times more memory efficient than MinHash with the same accuracy and computational cost for estimating high similarities.

A Minimax Game for Instance based Selective Transfer Learning
Bo Wang (alibaba); Minghui Qiu (Alibaba); Xisen Wang (USTC); Yaliang Li (Alibaba); Yu Gong (Alibaba); Xiaoyi Zeng (Alibaba); Jun Huang (alibaba); Bo Zheng (Alibaba); Deng Cai (Zhejiang University); Jingren Zhou (Alibaba);

A Multiscale Scan Statistic for Adaptive Submatrix Localization link
Yuchao Liu (Microsoft); Ery Arias-Castro (University of California San Diego);

We consider the problem of localizing a submatrix with larger-than-usual entry values inside a data matrix, without the prior knowledge of the submatrix size. We establish an optimization framework based on a multiscale scan statistic, and develop algorithms in order to approach the optimizer. We also show that our estimator only requires a signal strength of the same order as the minimax estimator with oracle knowledge of the submatrix size, to exactly recover the anomaly with high probability. We perform some simulations that show that our estimator has superior performance compared to other estimators which do not require prior submatrix knowledge, while being comparatively faster to compute.

A permutation approach to assess confounding in machine learning applications for digital health
Elias Chaibub Neto (Sage Bionetworks); Abhishek Pratap (Sage Bionetworks); Thanneer Perumal (Sage Bionetworks); Meghasyam Tummalacherla (Sage Bionetworks); Brian Bot (Sage Bionetworks); Lara Mangravite (Sage Bionetworks); Larsson Omberg (Sage Bionetworks);

A Representation Learning Framework for Property Graphs
Yifan Hou (The Chinese University of Hong Kong); Hongzhi Chen (The Chinese University of Hong Kong); Changji Li (The Chinese University of Hong Kong); James Cheng (The Chinese University of Hong Kong); Ming-Chang Yang (The Chinese University of Hong Kong);

Adaptive Deep Models for Incremental Learning: Considering Capacity Scalability and Sustainability
Yang Yang (NanJing university); Da-Wei Zhou (Nanjing University); De-Chuan Zhan (Nanjing University); Hui Xiong (Rutgers University); Yuan Jiang (Nanjing University);

Adaptive Graph Guided Disambiguation for Partial Label Learning
Dengbao Wang (Southwest University); Li Li (Southwest University); Min-Ling Zhang (Southeast University);

Adaptive Unsupervised Feature Selection on Attributed Networks
Jundong Li (Arizona State University); Ruocheng Guo (Arizona State University); Chenghao Liu (Singapore Management University); Huan Liu (Arizona State University);

Adaptive-Halting Policy Network for Early Classification
Thomas Hartvigsen (Worcester Polytechnic Institute); Cansu Sen (Worcester Polytechnic Institute); Xiangnan Kong (Worcester Polytechnic Institute); Elke Rundensteiner (Worcester Polytechnic Institute);

Adversarial Learning on Heterogeneous Information Networks
Binbin Hu (Beijing University of Posts and Telecommunications && AI Department, Ant Financial Services Group); Yuan Fang (Singapore Management University); Chuan Shi (Beijing University of Posts and Telecommunications);

Adversarial Substructured Representation Learning for Mobile User Profiling
Pengyang Wang (Missouri University of Science and Technology); Yanjie Fu (Missouri University of Science and Technology); Xiaolin Li (Nanjing University); Hui Xiong (Rutgers University);

Adversarial Variational Embedding for Robust Semi-supervised Learning
Xiang Zhang (The University of New South Wales); Lina Yao (The University of New South Wales); Feng Yuan (The University of New South Wales);

Adversarially Robust Submodular Maximization under Knapsack Constraints link
Dmitrii Avdiukhin (Indiana University Bloomington); Slobodan Mitrovic (MIT); Grigory Yaroslavtsev (Indiana University Bloomington); Samson Zhou (Indiana University Bloomington);

We propose the first adversarially robust algorithm for monotone submodular maximization under single and multiple knapsack constraints with scalable implementations in distributed and streaming settings. For a single knapsack constraint, our algorithm outputs a robust summary of almost optimal (up to polylogarithmic factors) size, from which a constant-factor approximation to the optimal solution can be constructed. For multiple knapsack constraints, our approximation is within a constant-factor of the best known non-robust solution. We evaluate the performance of our algorithms by comparison to natural robustifications of existing non-robust algorithms under two objectives: 1) dominating set for large social network graphs from Facebook and Twitter collected by the Stanford Network Analysis Project (SNAP), 2) movie recommendations on a dataset from MovieLens. Experimental results show that our algorithms give the best objective for a majority of the inputs and show strong performance even compared to offline algorithms that are given the set of removals in advance.

An Visual Dialog Augmented Interactive Recommender System
Tong Yu (Samsung Electronics); Yilin Shen (Samsung Electronics); Hongxia Jin (Samsung Electronics);

Assessing The Factual Accuracy of Generated Text link
Vinay Rao (Google); Ben Goodrich (Google); Peter Liu (Google); Mohammad Saleh (Google);

We propose a model-based metric to estimate the factual accuracy of generated text that is complementary to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We introduce and release a new large-scale dataset based on Wikipedia and Wikidata to train relation classifiers and end-to-end fact extraction models. The end-to-end models are shown to be able to extract complete sets of facts from datasets with full pages of text. We then analyse multiple models that estimate factual accuracy on a Wikipedia text summarization task, and show their efficacy compared to ROUGE and other model-free variants by conducting a human evaluation study.

AtSNE: Efficient and Robust Visualization on GPU through Hierarchical Optimization
Cong Fu (Zhejiang University); Yonghui Zhang (Zhejiang University); Deng Cai (Zhejiang University); Xiang Ren (University of Southern California);

Attribute-Driven Backbone Discovery
Hanchao Ma (Washington State University); Sheng Guan (Washington State University); Yinghui Wu (Washington State University);

Auditing Data Provenance in Text-Generation Models link
Congzheng Song (Cornell University); Vitaly Shmatikov (Cornell University);

To help enforce data-protection regulations such as GDPR and detect unauthorized uses of personal data, we develop a new \emph{model auditing} technique that helps users check if their data was used to train a machine learning model. We focus on auditing deep-learning models that generate natural-language text, including word prediction and dialog generation. These models are at the core of popular online services and are often trained on personal data such as users' messages, searches, chats, and comments. We design and evaluate a black-box auditing method that can detect, with very few queries to a model, if a particular user's texts were used to train it (among thousands of other users). We empirically show that our method can successfully audit well-generalized models that are not overfitted to the training data. We also analyze how text-generation models memorize word sequences and explain why this memorization makes them amenable to auditing.

Automating Feature Subspace Exploration via Multi-Agent Reinforcement Learning
Kunpeng Liu (Missouri University of Science and Technology); Yanjie Fu (Missouri University of Science and Technology); Pengfei Wang (CNIC, Chinese Academy of Sciences); Le Wu (Hefei University of Technology); Rui Bo (Missouri University of Science and Technology); Xiaolin Li (Nanjing University);

AutoNRL: Hyperparameter Optimization for Massive Network Representation Learning
Ke Tu (Tsinghua University); Jianxin Ma (Tsinghua University); Peng Cui (Tsinghua University); Jian Pei (JD.com); Wenwu Zhu (Tsinghua University);

Axiomatic Interpretability for Multiclass Additive Models link
Xuezhou Zhang (University of Wisconsin-Madison); Sarah Tan (Cornell University); Paul Koch (Microsoft); Yin Lou (Ant Financial); Urszula Chajewska (Microsoft); Rich Caruana (Microsoft);

Generalized additive models (GAMs) are favored in many regression and binary classification problems because they are able to fit complex, nonlinear functions while still remaining interpretable. In the first part of this paper, we generalize a state-of-the-art GAM learning algorithm based on boosted trees to the multiclass setting, and show that this multiclass algorithm outperforms existing GAM learning algorithms and sometimes matches the performance of full complexity models such as gradient boosted trees. In the second part, we turn our attention to the interpretability of GAMs in the multiclass setting. Surprisingly, the natural interpretability of GAMs breaks down when there are more than two classes. Naive interpretation of multiclass GAMs can lead to false conclusions. Inspired by binary GAMs, we identify two axioms that any additive model must satisfy in order to not be visually misleading. We then develop a technique called Additive Post-Processing for Interpretability (API), that provably transforms a pre-trained additive model to satisfy the interpretability axioms without sacrificing accuracy. The technique works not just on models trained with our learning algorithm, but on any multiclass additive model, including multiclass linear and logistic regression. We demonstrate the effectiveness of API on a 12-class infant mortality dataset.

Beyond Personalization: Social Content Recommendation for Creator Equality and Consumer Satisfaction link
Wenyi Xiao (The Hong Kong University of Science and Technology); Huan Zhao (The Hong Kong University of Science and Technology); Haojie Pan (The Hong Kong University of Science and Technology); Yangqiu Song (The Hong Kong University of Science and Technology); Vincent W. Zheng (WeBank, China); Qiang Yang (The Hong Kong University of Science and Technology);

An effective content recommendation in modern social media platforms should benefit both creators to bring genuine benefits to them and consumers to help them get really interesting content. In this paper, we propose a model called Social Explorative Attention Network (SEAN) for content recommendation. SEAN uses a personalized content recommendation model to encourage personal interests driven recommendation. Moreover, SEAN allows the personalization factors to attend to users' higher-order friends on the social network to improve the accuracy and diversity of recommendation results. Constructing two datasets from a popular decentralized content distribution platform, Steemit, we compare SEAN with state-of-the-art CF and content based recommendation approaches. Experimental results demonstrate the effectiveness of SEAN in terms of both Gini coefficients for recommendation equality and F1 scores for recommendation performance.

Certifiable Robustness and Robust Training for Graph Convolutional Networks link
Daniel Zügner (Technical University of Munich); Stephan Günnemann (Technical University of Munich);

Recent works show that Graph Neural Networks (GNNs) are highly non-robust with respect to adversarial attacks on both the graph structure and the node attributes, making their outcomes unreliable. We propose the first method for certifiable (non-)robustness of graph convolutional networks with respect to perturbations of the node attributes. We consider the case of binary node attributes (e.g. bag-of-words) and perturbations that are L_0-bounded. If a node has been certified with our method, it is guaranteed to be robust under any possible perturbation given the attack model. Likewise, we can certify non-robustness. Finally, we propose a robust semi-supervised training procedure that treats the labeled and unlabeled nodes jointly. As shown in our experimental evaluation, our method significantly improves the robustness of the GNN with only minimal effect on the predictive accuracy.

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks
Wei-Lin Chiang (National Taiwan University); Xuanqing Liu (University of California, Los Angeles); Si Si (Google); Yang Li (Google); Samy Bengio (Google); Cho-Jui Hsieh (University of California, Los Angeles);

Clustering without Over-Representation
Sara Ahmadian (Google); Alessandro Epasto (Google); Ravi Kumar (Google); Mohammad Mahdian (Google);

Co-Prediction of Multiple Transportation Demands Based on Deep Spatio-Temporal Neural Network
Junchen Ye (SKLSDE Lab and BDBC Beihang University); Leilei Sun (SKLSDE Lab and BDBC Beihang University); Bowen Du (SKLSDE Lab and BDBC Beihang University); Yanjie Fu (Missouri University of Science and Technology); Xinran Tong (SKLSDE Lab and BDBC Beihang University); Hui Xiong (Rutgers Business School Rutgers University);

Conditional Random Field Enhanced Graph Convolutional Neural Networks
Hongchang Gao (University of Pittsburgh); Jian Pei (Simon Fraser University); Heng Huang (University of Pittsburgh);

Contextual Fact Ranking and Its Applications in Table Synthesis and Compression
Silu Huang (University of Illinois Urbana-Champaign); Jialu Liu (Google); Flip Korn (Google); Xuezhi Wang (Google); You Wu (Google); Dale Markowitz (Google); Cong Yu (Google);

Contrastive antichains in hierarchies
Anes Bendimerad (LIRIS); Jefrey Lijffijt (Dept. of Electronics and Information Systems, IDLab, Ghent University); Marc Plantevit (LIRIS); Celine Robardet (INSA Lyon); Tijl De Bie (Dept. of Electronics and Information Systems, IDLab, Ghent University);

Coresets for Minimum Enclosing Balls over Sliding Windows link
Yanhao Wang (National University of Singapore); Yuchen Li (Singapore Management University); Kian-Lee Tan (National University of Singapore);

\emph{Coresets} are important tools to generate concise summaries of massive datasets for approximate analysis. A coreset is a small subset of points extracted from the original point set such that certain geometric properties are preserved with provable guarantees. This paper investigates the problem of maintaining a coreset to preserve the minimum enclosing ball (MEB) for a sliding window of points that are continuously updated in a data stream. Although the problem has been extensively studied in batch and append-only streaming settings, no efficient sliding-window solution is available yet. In this work, we first introduce an algorithm, called AOMEB, to build a coreset for MEB in an append-only stream. AOMEB improves the practical performance of the state-of-the-art algorithm while having the same approximation ratio. Furthermore, using AOMEB as a building block, we propose two novel algorithms, namely SWMEB and SWMEB+, to maintain coresets for MEB over the sliding window with constant approximation ratios. The proposed algorithms also support coresets for MEB in a reproducing kernel Hilbert space (RKHS). Finally, extensive experiments on real-world and synthetic datasets demonstrate that SWMEB and SWMEB+ achieve speedups of up to four orders of magnitude over the state-of-the-art batch algorithm while providing coresets for MEB with rather small errors compared to the optimal ones.

CoSTCo: A Nonlinear Sparse Tensor Completion Model
Hanpeng Liu (Univeristy of Southern California); Yaguang Li (Univeristy of Southern California); Michael Tsang (University of Southern California); Yan Liu (University of Southern California);

Coupled Variational Recurrent Collaborative Filtering link
Qingquan Song (Texas A&M University); Shiyu Chang (IBM); Xia Hu (Texas A&M University);

We focus on the problem of streaming recommender system and explore novel collaborative filtering algorithms to handle the data dynamicity and complexity in a streaming manner. Although deep neural networks have demonstrated the effectiveness of recommendation tasks, it is lack of explorations on integrating probabilistic models and deep architectures under streaming recommendation settings. Conjoining the complementary advantages of probabilistic models and deep neural networks could enhance both model effectiveness and the understanding of inference uncertainties. To bridge the gap, in this paper, we propose a Coupled Variational Recurrent Collaborative Filtering (CVRCF) framework based on the idea of Deep Bayesian Learning to handle the streaming recommendation problem. The framework jointly combines stochastic processes and deep factorization models under a Bayesian paradigm to model the generation and evolution of users' preferences and items' popularities. To ensure efficient optimization and streaming update, we further propose a sequential variational inference algorithm based on a cross variational recurrent neural network structure. Experimental results on three benchmark datasets demonstrate that the proposed framework performs favorably against the state-of-the-art methods in terms of both temporal dependency modeling and predictive accuracy. The learned latent variables also provide visualized interpretations for the evolution of temporal dynamics.

DAML: Dual Attention Mutual Learning between Ratings and Reviews for Item Recommendation
Donghua Liu (School of Computer Science Wuhan University); Jing Li (School of Computer Science Wuhan University); Bo Du (School of Computer Science Wuhan University); Jun Chang (School of Computer Science Wuhan University); Rong Gao (School of Computer Science Hubei University of Technology);

Deep Anomaly Detection with Deviation Networks
Guansong Pang (The University of Adelaide); Chunhua Shen (The University of Adelaide); Anton van den Hengel (The University of Adelaide);

Deep Landscape Forecasting for Real-time Bidding Advertising link
Kan Ren (Shanghai Jiao Tong University); Jiarui Qin (Shanghai Jiao Tong University); Lei Zheng (Shanghai Jiao Tong University); Zhengyu Yang (Shanghai Jiao Tong University); Weinan Zhang (Shanghai Jiao Tong University); Yong Yu (Shanghai Jiao Tong University);

The emergence of real-time auction in online advertising has drawn huge attention of modeling the market competition, i.e., bid landscape forecasting. The problem is formulated as to forecast the probability distribution of market price for each ad auction. With the consideration of the censorship issue which is caused by the second-price auction mechanism, many researchers have devoted their efforts on bid landscape forecasting by incorporating survival analysis from medical research field. However, most existing solutions mainly focus on either counting-based statistics of the segmented sample clusters, or learning a parameterized model based on some heuristic assumptions of distribution forms. Moreover, they neither consider the sequential patterns of the feature over the price space. In order to capture more sophisticated yet flexible patterns at fine-grained level of the data, we propose a Deep Landscape Forecasting (DLF) model which combines deep learning for probability distribution forecasting and survival analysis for censorship handling. Specifically, we utilize a recurrent neural network to flexibly model the conditional winning probability w.r.t. each bid price. Then we conduct the bid landscape forecasting through probability chain rule with strict mathematical derivations. And, in an end-to-end manner, we optimize the model by minimizing two negative likelihood losses with comprehensive motivations. Without any specific assumption for the distribution form of bid landscape, our model shows great advantages over previous works on fitting various sophisticated market price distributions. In the experiments over two large-scale real-world datasets, our model significantly outperforms the state-of-the-art solutions under various metrics.

Deep Learning Alternating Direction Method of Multipliers
Junxiang Wang (George Mason University); Fuxun Yu (George Mason University); Xiang Chen (George Mason University); Liang Zhao (George Mason University);

Deep Mixture Point Processes: Spatio-temporal Event Prediction with Rich Contextual Information link
Maya Okawa (NTT); Tomoharu Iwata (NTT); Takeshi Kurashima (NTT); Yusuke Tanaka (NTT); Hiroyuki Toda (NTT); Naonori Ueda (NTT);

Predicting when and where events will occur in cities, like taxi pick-ups, crimes, and vehicle collisions, is a challenging and important problem with many applications in fields such as urban planning, transportation optimization and location-based marketing. Though many point processes have been proposed to model events in a continuous spatio-temporal space, none of them allow for the consideration of the rich contextual factors that affect event occurrence, such as weather, social activities, geographical characteristics, and traffic. In this paper, we propose \textsf{DMPP} (Deep Mixture Point Processes), a point process model for predicting spatio-temporal events with the use of rich contextual information; a key advance is its incorporation of the heterogeneous and high-dimensional context available in image and text data. Specifically, we design the intensity of our point process model as a mixture of kernels, where the mixture weights are modeled by a deep neural network. This formulation allows us to automatically learn the complex nonlinear effects of the contextual factors on event occurrence. At the same time, this formulation makes analytical integration over the intensity, which is required for point process estimation, tractable. We use real-world data sets from different domains to demonstrate that DMPP has better predictive performance than existing methods.

DeepGBM: A Deep Learning Framework Distilled by GBDT for Online Prediction Tasks
Guolin Ke (Microsoft); Zhenhui Xu (Peking University); Jia Zhang (Microsoft); Jiang Bian (Microsoft); Tie-Yan Liu (Microsoft);

dEFEND: Explainable Fake News Detection
Kai Shu (Arizona State University); Limeng Cui (The Pennsylvania State University); Suhang Wang (The Pennsylvania State University); Dongwon Lee (Penn State Univeristy); Huan Liu (Arizona State University);

DEMO-Net: Degree-specific Graph Neural Networks for Node and Graph Classification link
Jun Wu (Arizona State University); Jingrui He (Arizona State University); Jiejun Xu (HRL Laboratories, LLC);

Graph data widely exist in many high-impact applications. Inspired by the success of deep learning in grid-structured data, graph neural network models have been proposed to learn powerful node-level or graph-level representation. However, most of the existing graph neural networks suffer from the following limitations: (1) there is limited analysis regarding the graph convolution properties, such as seed-oriented, degree-aware and order-free; (2) the node's degree-specific graph structure is not explicitly expressed in graph convolution for distinguishing structure-aware node neighborhoods; (3) the theoretical explanation regarding the graph-level pooling schemes is unclear. To address these problems, we propose a generic degree-specific graph neural network named DEMO-Net motivated by Weisfeiler-Lehman graph isomorphism test that recursively identifies 1-hop neighborhood structures. In order to explicitly capture the graph topology integrated with node attributes, we argue that graph convolution should have three properties: seed-oriented, degree-aware, order-free. To this end, we propose multi-task graph convolution where each task represents node representation learning for nodes with a specific degree value, thus leading to preserving the degree-specific graph structure. In particular, we design two multi-task learning methods: degree-specific weight and hashing functions for graph convolution. In addition, we propose a novel graph-level pooling/readout scheme for learning graph representation provably lying in a degree-specific Hilbert kernel space. The experimental results on several node and graph classification benchmark data sets demonstrate the effectiveness and efficiency of our proposed DEMO-Net over state-of-the-art graph neural network models.

Disambiguation Enabled Linear Discriminant Analysis for Partial Label Dimensionality Reduction
Jing-Han Wu (Southeast University); Min-Ling Zhang (Southeast University);

Discovering Unexpected Local Nonlinear Interactions in Scientific Black-box Models
Michael Doron (Hebrew University of Jerusalem); Idan Segev (Hebrew University of Jerusalem); Dafna Shahaf (Hebrew University of Jerusalem);

Dual Averaging Method for Online Graph-structured Sparsity link
Baojian Zhou (University at Albany, SUNY); Feng Chen (University at Albany, SUNY); Yiming Ying (University at Albany, SUNY);

Online learning algorithms update models via one sample per iteration, thus efficient to process large-scale datasets and useful to detect malicious events for social benefits, such as disease outbreak and traffic congestion on the fly. However, existing algorithms for graph-structured models focused on the offline setting and the least square loss, incapable for online setting, while methods designed for online setting cannot be directly applied to the problem of complex (usually non-convex) graph-structured sparsity model. To address these limitations, in this paper we propose a new algorithm for graph-structured sparsity constraint problems under online setting, which we call \textsc{GraphDA}. The key part in \textsc{GraphDA} is to project both averaging gradient (in dual space) and primal variables (in primal space) onto lower dimensional subspaces, thus capturing the graph-structured sparsity effectively. Furthermore, the objective functions assumed here are generally convex so as to handle different losses for online learning settings. To the best of our knowledge, \textsc{GraphDA} is the first online learning algorithm for graph-structure constrained optimization problems. To validate our method, we conduct extensive experiments on both benchmark graph and real-world graph datasets. Our experiment results show that, compared to other baseline methods, \textsc{GraphDA} not only improves classification performance, but also successfully captures graph-structured features more effectively, hence stronger interpretability.

Dual Sequential Prediction Models Linking Sequential Recommendation and Information Dissemination
Qitian Wu (Shanghai Jiao Tong University); Yirui Gao (Shanghai Jiao Tong University); Xiaofeng Gao (Shanghai Jiao Tong University); Paul Weng (Shanghai Jiao Tong University); Guihai Chen (Shanghai Jiao Tong University);

Dynamic Modeling and Forecasting of Time-evolving Data Streams
Yasuko Matsubara (Kumamoto University); Yasushi Sakurai (Kumamoto University);

Dynamical Origins of Distribution Functions
Chengxi Zang (Tsinghua University); Peng Cui (Tsinghua University); Wenwu Zhu (Tsinghua University); Fei Wang (Cornell University);

Edit Similarity Joins Using Local Hash Minima
Haoyu Zhang (Indiana University Bloomington); Qin Zhang (Indiana University Bloomington);

EdMot: An Edge Enhancement Approach for Motif-aware Community Detection
Peizhen Li (School of Data and Computer Science,Sun Yat-sen University,Guangzhou, P. R. China); Ling Huang (Sun Yat-sen University); Chang-Dong Wang (Sun Yat-sen University); Jianhuang Lai (Sun Yat-sen University);

Effective and Efficient Reuse of Past Travel Behavior for Route Recommendation
Lisi Chen (Inception Institute of Artificial Intelligence); Shuo Shang (Inception Institute of Artificial Intelligence); Christian S. Jensen (Aalborg University); Bin Yao (Shanghai Jiao Tong University); Zhiwei Zhang (Hong Kong Baptist University); Ling Shao (Inception Institute of Artificial Intelligence);

Effective and Efficient Sports Play Retrieval with Deep Representation Learning
Zheng Wang (Nanyang Technological University); Cheng Long (Nanyang Technological University); Gao Cong (Nanyang Technological University); Ce Ju (IDG Baidu Inc.);

Efficient and Effective Express via Contextual Cooperative Reinforcement Learning
Yexin Li (The Hong Kong University of Science and Technology); Yu Zheng (Urban Computing Business Unit, JD Finance); Qiang Yang (The Hong Kong University of Science and Technology);

Efficient Global String Kernel with Random Features: Beyond Counting Substructures
Lingfei Wu (IBM); Ian En-Hsu Yen (Carnegie Mellon University); Siyu Huo (IBM); Liang Zhao (George Mason University); Kun Xu (Peking University); Liang Ma (IBM); Shouling Ji (Zhejiang University); Charu Aggarwal (IBM);

Efficient Maximum Clique Computation over Large Sparse Graphs
Lijun Chang (The University of Sydney);

Empirical Entropy Approximation via Subsampling: Theory and Application
Chi Wang (Microsoft); Bailu Ding (Microsoft);

Empowering A* Search Algorithms with Neural Networks for Personalized Route Recommendation
Jingyuan Wang (Beihang University); Ning Wu (Beihang University); Xin Zhao (Renmin University of China); Fanzhang Peng (Beihang University); Xin Lin (Beihang University);

End-to-end Modeling of High-order Relations in Knowledge Graph for Recommendation
Xiang Wang (National University of Singapore); Xiangnan He (University of Science and Technology of China); Yixin Cao (National University of Singapore); Meng Liu (Shandong University); Tat-Seng Chua (National University of Singapore);

Enhancing Collaborative Filtering with Generative Augmentation
Qinyong Wang (The University of Queensland); Hongzhi Yin (The University of Queensland); Hao Wang (The University of Tokyo); Quoc Viet Hung Nguyen (Griffith University); Zi Huang (The University of Queensland); Lizhen Cui (Shandong University);

Enhancing Domain Word Embedding via Latent Semantic Imputation link
Shibo Yao (New Jersey Institute of Technology); Dantong Yu (New Jersey Institute of Technology); Keli Xiao (Stony Brook University);

We present a novel method named Latent Semantic Imputation (LSI) to transfer external knowledge into semantic space for enhancing word embedding. The method integrates graph theory to extract the latent manifold structure of the entities in the affinity space and leverages non-negative least squares with standard simplex constraints and power iteration method to derive spectral embeddings. It provides an effective and efficient approach to combining entity representations defined in different Euclidean spaces. Specifically, our approach generates and imputes reliable embedding vectors for low-frequency words in the semantic space and benefits downstream language tasks that depend on word embedding. We conduct comprehensive experiments on a carefully designed classification problem and language modeling and demonstrate the superiority of the enhanced embedding via LSI over several well-known benchmark embeddings. We also confirm the consistency of the results under different parameter settings of our method.

Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation
Wenjie Shang (Nanjing University); Yang Yu (Nanjing University); Qingyang Li (AI Labs, Didi Chuxing); Zhiwei Qin (AI Labs, Didi Chuxing); Yiping Meng (AI Labs, Didi Chuxing); Jieping Ye (AI Labs, Didi Chuxing);

EpiDeep: Exploiting Embeddings for Epidemic Forecasting
Bijaya Adhikari (Virginia Tech); Xinfeng Xu (Virginia Tech); Naren Ramakrishnan (Virginia Tech); B. Aditya Prakash (Virginia Tech);

Estimating Graphlet Statistics via Lifting link
Dmitry Shemetov (University of California, Davis); James Sharpnack (University of California, Davis); Kirill Paramonov (University of California, Davis);

Exploratory analysis over network data is often limited by our ability to efficiently calculate graph statistics, which can provide a model-free understanding of macroscopic properties of a network. This work introduces a framework for estimating the graphlet count - the number of occurrences of a small subgraph motif (e.g. a wedge or a triangle) in the network. For massive graphs, where accessing the whole graph is not possible, the only viable algorithms are those which act locally by making a limited number of vertex neighborhood queries. We introduce a Monte Carlo sampling technique for graphlet counts, called lifting, which can simultaneously sample all graphlets of size up to $k$ vertices. We outline three variants of lifted graphlet counts: the ordered, unordered, and shotgun estimators. We prove that our graphlet count updates are unbiased for the true graphlet count, have low correlation between samples, and have a controlled variance. We compare the experimental performance of lifted graphlet counts to the state-of-the art graphlet sampling procedures: Waddling and the pairwise subgraph random walk.

Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks link
Namyong Park (Carnegie Mellon University); Andrey Kan (Amazon); Xin Luna Dong (Amazon); Tong Zhao (Amazon); Christos Faloutsos (Carnegie Mellon University and Amazon);

How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. While a number of approaches have been developed to address this problem for general graphs, they do not fully utilize information available in KGs, or lack flexibility needed to model complex relationship between entities and their importance. To address these limitations, we explore supervised machine learning algorithms. In particular, building upon recent advancement of graph neural networks (GNNs), we develop GENI, a GNN-based method designed to deal with distinctive challenges involved with predicting node importance in KGs. Our method performs an aggregation of importance scores instead of aggregating node embeddings via predicate-aware attention mechanism and flexible centrality adjustment. In our evaluation of GENI and existing methods on predicting node importance in real-world KGs with different characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.

Estimating Personalized Preferences Through Meta-Learning for User Cold-Start Recommendation
Hoyeop Lee (NCSOFT Co.); Jinbae Im (NCSOFT Co.); Seongwon Jang (NCSOFT Co.); Hyunsouk Cho (NCSOFT Co.); Sehee Chung (NCSOFT Co.);

ET-Lasso: A New Efficient Tuning of Lasso-type Regularization for High-Dimensional Data
Songshan Yang (The Pennsylvania State University); Jiawei Wen (The Pennsylvania State University); Xiang Zhan (The Pennsylvania State University); Daniel Kifer (The Pennsylvania State University);

Exact-K Recommendation via Maximal Clique Optimization link
Yu Gong (Alibaba Group); Yu Zhu (Alibaba Group); Lu Duan (Zhejiang Cainiao Supply Chain Management Co., Ltd); Qingwen Liu (Alibaba Group); Ziyu Guan (Xidian University); Fei Sun (Alibaba); Wenwu Ou (Alibaba Group); Kenny Zhu (Shanghai Jiao Tong University);

This paper targets to a novel but practical recommendation problem named exact-K recommendation. It is different from traditional top-K recommendation, as it focuses more on (constrained) combinatorial optimization which will optimize to recommend a whole set of K items called card, rather than ranking optimization which assumes that "better" items should be put into top positions. Thus we take the first step to give a formal problem definition, and innovatively reduce it to Maximum Clique Optimization based on graph. To tackle this specific combinatorial optimization problem which is NP-hard, we propose Graph Attention Networks (GAttN) with a Multi-head Self-attention encoder and a decoder with attention mechanism. It can end-to-end learn the joint distribution of the K items and generate an optimal card rather than rank individual items by prediction scores. Then we propose Reinforcement Learning from Demonstrations (RLfD) which combines the advantages in behavior cloning and reinforcement learning, making it sufficient- and-efficient to train the model. Extensive experiments on three datasets demonstrate the effectiveness of our proposed GAttN with RLfD method, it outperforms several strong baselines with a relative improvement of 7.7% and 4.7% on average in Precision and Hit Ratio respectively, and achieves state-of-the-art (SOTA) performance for the exact-K recommendation problem.

Exploiting Cognitive Structure for Adaptive Learning link
Qi Liu (University of Science and Technology of China); Shiwei Tong (University of Science and Technology of China); Chuanren Liu (Drexel University); Hongke Zhao (The College of Management and Economics, Tianjin University); Enhong Chen (University of Science and Technology of China); Haiping Ma (IFLYTEK CO., LTD.); Shijin Wang (IFLYTEK CO., LTD.);

Adaptive learning, also known as adaptive teaching, relies on learning path recommendation, which sequentially recommends personalized learning items (e.g., lectures, exercises) to satisfy the unique needs of each learner. Although it is well known that modeling the cognitive structure including knowledge level of learners and knowledge structure (e.g., the prerequisite relations) of learning items is important for learning path recommendation, existing methods for adaptive learning often separately focus on either knowledge levels of learners or knowledge structure of learning items. To fully exploit the multifaceted cognitive structure for learning path recommendation, we propose a Cognitive Structure Enhanced framework for Adaptive Learning, named CSEAL. By viewing path recommendation as a Markov Decision Process and applying an actor-critic algorithm, CSEAL can sequentially identify the right learning items to different learners. Specifically, we first utilize a recurrent neural network to trace the evolving knowledge levels of learners at each learning step. Then, we design a navigation algorithm on the knowledge structure to ensure the logicality of learning paths, which reduces the search space in the decision process. Finally, the actor-critic algorithm is used to determine what to learn next and whose parameters are dynamically updated along the learning path. Extensive experiments on real-world data demonstrate the effectiveness and robustness of CSEAL.

Factorization Bandits for Online Influence Maximization link
Qingyun Wu (University of Virginia); Zhige Li (Shanghai Jiao Tong University); Huazheng Wang (University of Virginia); Wei Chen (Microsoft); Hongning Wang (University of Virginia);

We study the problem of online influence maximization in social networks. In this problem, a learner aims to identify the set of "best influencers" in a network by interacting with it, i.e., repeatedly selecting seed nodes and observing activation feedback in the network. We capitalize on an important property of the influence maximization problem named network assortativity, which is ignored by most existing works in online influence maximization. To realize network assortativity, we factorize the activation probability on the edges into latent factors on the corresponding nodes, including influence factor on the giving nodes and susceptibility factor on the receiving nodes. We propose an upper confidence bound based online learning solution to estimate the latent factors, and therefore the activation probabilities. Considerable regret reduction is achieved by our factorization based online influence maximization algorithm. And extensive empirical evaluations on two real-world networks showed the effectiveness of our proposed solution.

Fast and Accurate Anomaly Detection in Dynamic Graphs with a Two-Pronged Approach
Minji Yoon (Carnegie Mellon University); Bryan Hooi (Carnegie Mellon University); Kijung Shin (Carnegie Mellon University); Christos Faloutsos (Carnegie Mellon University);

Fates of Microscopic Social Ecosystems: Keep Alive or Dead?
Haoyang Li (Tsinghua University); Peng Cui (Tsinghua University); Chengxi Zang (Tsinghua University); Tianyang Zhang (Tsinghua University); Wenwu Zhu (Tsinghua University); Yishi Lin (Tencent);

Fighting Opinion Control in Social Networks via Link Recommendation
Victor Amelkin (University of Pennsylvania); Ambuj Singh (University of California, Santa Barbara);

Figuring out the User in a Few Steps: Bayesian Multifidelity Active Search with Cokriging
Nikita Klyuchnikov (Tsuru Robotics); Davide Mottin (Aarhus University); Georgia Koutrika (Athena Research Center); Emmanuel Müller (University of Bonn); Panagiotis Karras (Aarhus University);

Focused Context Balancing for Robust Offline Policy Evaluation
Hao Zou (Tsinghua University); Kun Kuang (Tsinghua University); Boqi Chen (Boston University); Peng Cui (Tsinghua University); Peixuan Chen (Tencent);

GCN-MF: Disease-Gene Association Identification By Graph Convolutional Networks and Matrix Factorzation
Peng Han (King Abdullah University of Science and Technology); Peng Yang (King Abdullah University of Science and Technology); Peilin Zhao (King Abdullah University of Science and Technology); Shuo Shang (Inception Institute of Artificial Intelligence); Yong Liu (Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University); Jiayu Zhou (Michigan State University); Xin Gao (King Abdullah University of Science and Technology); Panos Kalnis (King Abdullah University of Science and Technology);

Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space
Nicholas Monath (Google); Manzil Zaheer (Google); Daniel Silva (Google); Andrew McCallum (University of Massachusetts Amherst); Amr Amhed (Google);

Graph Convolutional Networks with EigenPooling link
Yao Ma (Michigan State University); Suhang Wang (The Pennsylvania State University); Charu Aggarwal (IBM); Jiliang Tang (Michigan State University);

Graph neural networks, which generalize deep neural network models to graph structured data, have attracted increasing attention in recent years. They usually learn node representations by transforming, propagating and aggregating node features and have been proven to improve the performance of many graph related tasks such as node classification and link prediction. To apply graph neural networks for the graph classification task, approaches to generate the \textit{graph representation} from node representations are demanded. A common way is to globally combine the node representations. However, rich structural information is overlooked. Thus a hierarchical pooling procedure is desired to preserve the graph structure during the graph representation learning. There are some recent works on hierarchically learning graph representation analogous to the pooling step in conventional convolutional neural (CNN) networks. However, the local structural information is still largely neglected during the pooling process. In this paper, we introduce a pooling operator $\pooling$ based on graph Fourier transform, which can utilize the node features and local structures during the pooling process. We then design pooling layers based on the pooling operator, which are further combined with traditional GCN convolutional layers to form a graph neural network framework $\m$ for graph classification. Theoretical analysis is provided to understand $\pooling$ from both local and global perspectives. Experimental results of the graph classification task on $6$ commonly used benchmarks demonstrate the effectiveness of the proposed framework.

Graph Recurrent Networks with Attributed Random Walks
Xiao Huang (Texas A&M University); Qingquan Song (Texas A&M University); Yuening Li (Texas A&M University); Xia Hu (Texas A&M University);

Graph Representation Learning via Hard and Channel-Wise Attention Networks
Hongyang Gao (Texas A&M University); Shuiwang Ji (Texas A&M University);

Graph Transformation Policy Network for Chemical Reaction Prediction link
Kien Do (Deakin University); Truyen Tran (Deakin University); Svetha Venkatesh (Deakin University);

We address a fundamental problem in chemistry known as chemical reaction product prediction. Our main insight is that the input reactant and reagent molecules can be jointly represented as a graph, and the process of generating product molecules from reactant molecules can be formulated as a sequence of graph transformations. To this end, we propose Graph Transformation Policy Network (GTPN) -- a novel generic method that combines the strengths of graph neural networks and reinforcement learning to learn the reactions directly from data with minimal chemical knowledge. Compared to previous methods, GTPN has some appealing properties such as: end-to-end learning, and making no assumption about the length or the order of graph transformations. In order to guide model search through the complex discrete space of sets of bond changes effectively, we extend the standard policy gradient loss by adding useful constraints. Evaluation results show that GTPN improves the top-1 accuracy over the current state-of-the-art method by about 3% on the large USPTO dataset. Our model's performances and prediction errors are also analyzed carefully in the paper.

Graph-based Semi-Supervised & Active Learning for Edge Flows link
Junteng Jia (Cornell University); Michael Schaub (Massachusetts Institute of Technology); Santiago Segarra (Rice University); Austin Benson (Cornell University);

We present a graph-based semi-supervised learning (SSL) method for learning edge flows defined on a graph. Specifically, given flow measurements on a subset of edges, we want to predict the flows on the remaining edges. To this end, we develop a computational framework that imposes certain constraints on the overall flows, such as (approximate) flow conservation. These constraints render our approach different from classical graph-based SSL for vertex labels, which posits that tightly connected nodes share similar labels and leverages the graph structure accordingly to extrapolate from a few vertex labels to the unlabeled vertices. We derive bounds for our method's reconstruction error and demonstrate its strong performance on synthetic and real-world flow networks from transportation, physical infrastructure, and the Web. Furthermore, we provide two active learning algorithms for selecting informative edges on which to measure flow, which has applications for optimal sensor deployment. The first strategy selects edges to minimize the reconstruction error bound and works well on flows that are approximately divergence-free. The second approach clusters the graph and selects bottleneck edges that cross cluster-boundaries, which works well on flows with global trends.

HATS: A Hierarchical Sequence-Attention Framework for Inductive Set-of-Sets Embeddings
Changping Meng (Purdue University); Jiasen Yang (Purdue University); Bruno Ribeiro (Purdue University); Jennifer Neville (Purdue University);

HetGNN: Heterogeneous Graph Neural Network
Chuxu Zhang (University of Notre Dame); Dongjin Song (NEC Laboratories America); Chao Huang (University of Notre Dame); Ananthram Swami (US); Nitesh V. Chawla (University of Notre Dame);

Hidden Markov Contour Tree: A Spatial Structured Model for Hydrological Applications
Zhe Jiang (The University of Alabama); Arpan Man Sainju (The University of Alabama);

Hidden POI Ranking with Spatial Crowdsourcing
Yue Cui (“University of Electronic Science and Technology of China”); Liwei Deng (“University of Electronic Science and Technology of China”); Yan Zhao (School of Computer Science and Technology, Soochow University); Vicent Zheng (WeBank); Bin Yao (Shanghai Jiao Tong University); Kai Zheng (“University of Electronic Science and Technology of China”);

Hierarchical Gating Networks for Sequential Recommendation
Chen Ma (McGill University); Peng Kang (Northwestern University); Xue Liu (McGill University);

Hierarchical Multi-Task Word Embedding Learning for Medical Synonym Prediction
Hongliang Fei (Baidu); Shulong Tan (Baidu); Ping Li (Baidu);

Hypothesis Generation From Text Based On Co-Evolution Of Biomedical Concepts
Kishlay Jha (University of Virginia); Guangxu Xun (University of Virginia); Yaqing Wang (University at Buffalo); Aidong Zhang (University of Virginia);

Identifiability of Cause and Effect using Regularized Regression
Alexander Marx (Max Planck Institute for Informatics); Jilles Vreeken (CISPA);

Improving the quality of explanations with local embedding perturbations
Yunzhe Jia (The University of Melbourne); James Bailey (The University of Melbourne); Kotagiri Ramamohanarao (The University of Melbourne); Christopher Leckie (The University of Melbourne); Michael E. Houle (National Institute of Informatics, Japan);

Incorporating Interpretability into Latent Factor Models via Fast Influence Analysis
Weiyu Cheng (Shanghai Jiao Tong University); Yanyan Shen (Shanghai Jiao Tong University); Linpeng Huang (Shanghai Jiao Tong University); Yanmin Zhu (Shanghai Jiao Tong University);

Individualized Indicator for All: Stock-wise Technical Indicator Optimization with Stock Embedding
Zhige Li (Shanghai Jiao Tong University); Derek Yang (Tsinghua University); Li Zhao (Microsoft); Jiang Bian (Microsoft); Tao Qin (Microsoft); Tie-Yan Liu (Microsoft);

Interpretable and Steerable Sequence Learning via Prototypes
Yao Ming (The Hong Kong University of Science and Technology); Panpan Xu (Bosch Research North America); Huamin Qu (The Hong Kong University of Science and Technology); Liu Ren (Bosch Research North America);

Interpretable Neural Network-based Classification of Limited, Noisy Brain Data
Yujun Yan (University of Michigan); Jiong Zhu (University of Michigan); Marlena Duda (University of Michigan); Eric Solarz (University of Michigan); Chandra Sripada (University of Michigan); Danai Koutra (University of Michigan);

Interview Choice Reveals Your Preference on the Market:To Improve Job-Resume Matching through Profiling Memories
Rui Yan (Peking University); Ran Le (Peking University); Yang Song (Boss Zhipin NLP Center); Tao Zhang (Boss Zhipin NLP Center); Xiangliang Zhang (KAUST); Dongyan Zhao (Peking University);

Investigating Cognitive Effects in Session-level Search User Satisfaction
Mengyang Liu (Tsinghua University); Jiaxin Mao (Tsinghua University); Yiqun Liu (Tsinghua University); Min Zhang (Tsinghua University); Shaoping Ma (Tsinghua University);

Is a Single Vector Enough? Exploring Node Polysemy for Network Embedding link
Ninghao Liu (Texas A&M University); Qiaoyu Tan (Texas A&M University); Yuening Li (Texas A&M University); Hongxia Yang (Alibaba); Jingren Zhou (Alibaba); Xia Hu (Texas A&M University);

Networks have been widely used as the data structure for abstracting real-world systems as well as organizing the relations among entities. Network embedding models are powerful tools in mapping nodes in a network into continuous vector-space representations in order to facilitate subsequent tasks such as classification and link prediction. Existing network embedding models comprehensively integrate all information of each node, such as links and attributes, towards a single embedding vector to represent the node's general role in the network. However, a real-world entity could be multifaceted, where it connects to different neighborhoods due to different motives or self-characteristics that are not necessarily correlated. For example, in a movie recommender system, a user may love comedies or horror movies simultaneously, but it is not likely that these two types of movies are mutually close in the embedding space, nor the user embedding vector could be sufficiently close to them at the same time. In this paper, we propose a polysemous embedding approach for modeling multiple facets of nodes, as motivated by the phenomenon of word polysemy in language modeling. Each facet of a node is mapped as an embedding vector, while we also maintain association degree between each pair of node and facet. The proposed method is adaptive to various existing embedding models, without significantly complicating the optimization process. We also discuss how to engage embedding vectors of different facets for inference tasks including classification and link prediction. Experiments on real-world datasets help comprehensively evaluate the performance of the proposed method.

Isolation Set-Kernel and Its Application to Multi-Instance Learning
Bi-Cun Xu (Nanjing University); Kai Ming Ting (School of Engineering and Information Technology, Federation University); Zhi-Hua Zhou (Nanjing University);

K-Multiple-Means: A Multiple-Means Clustering Method with Specified K Clusters
Feiping Nie (School of Computer Science and Center for OPTIMAL, Northwestern Polytechnical University, China.); Cheng-Long Wang (School of Computer Science and Center for OPTIMAL, Northwestern Polytechnical University, China.); Xuelong Li (School of Computer Science and Center for OPTIMAL, Northwestern Polytechnical University, China.);

Knowledge Graph Convolutional Networks for Recommender Systems with Label Smoothness Regularization
Hongwei Wang (Shanghai Jiao Tong University); Fuzheng Zhang (Meituan-Dianping Group); Mengdi Zhang (Meituan-Dianping Group); Jure Leskovec (Stanford University); Miao Zhao (The Hong Kong Polytechnic University); Wenjie Li (The Hong Kong Polytechnic University); Zhongyuan Wang (Meituan-Dianping Group);

Latent Network Summarization
Di Jin (University of Michigan); Ryan Rossi (Adobe); Danai Koutra (University of Michigan); Eunyee Koh (Adobe); Sungchul Kim (Adobe); Anup Rao (Adobe);

Learning Class-Conditional GANs with Active Sampling
Ming-Kun Xie (Nanjing University of Aeronautics and Astronautics); Sheng-Jun Huang (Nanjing University of Aeronautics and Astronautics);

Learning Dynamic Context Graphs for Predicting Social Events
Songgaojun Deng (Stevens Institute of Technology); Huzefa Rangwala (George Mason University); Yue Ning (Stevens Institute of Technology);

Learning from Incomplete and Inaccurate Supervision
Zhen-Yu Zhang (Nanjing University); Peng Zhao (Nanjing University); Yuan Jiang (Nanjing University); Zhi-Hua Zhou (Nanjing University);

Learning Interpretable Metric between Graphs: Convex Formulation and Computation with Graph Mining
Tomoki Yoshida (Nagoya Institute of Technology); Ichiro Takeuchi (Nagoya Institute of Technology, National Institute for Material Science, RIKEN Center for Advanced Intelligence Project); Masayuki Karasuyama (Nagoya Institute of Technology, National Institute for Material Science, Japan Science and Technology Agency);

Learning Network-to-Network Model for Content-rich Network Embedding
Zhicheng He (Nankai University); Jie Liu (Nankai University); Na Li (Nankai University); Yalou Huang (Nankai University);

Link Prediction with Signed Latent Factors in Signed Social Networks
Pinghua Xu (Wuhan University); Wenbin Hu (Wuhan University); Jia Wu (Macquarie University); Bo Du (Wuhan University);

Log2Intent: Towards Interpretable User Modeling via Recurrent Semantics Memory Unit
Zhiqiang Tao (Northeastern University); Sheng Li (University of Georgia); Zhaowen Wang (Adobe); Chen Fang (Bytedance); Longqi Yang (Cornell University); Handong Zhao (Adobe); Yun Fu (Northeastern University);

MCNE: An End-to-End Framework for Learning Multiple Conditional Network Representations of Social Network link
Hao Wang (University of Science and Technology of China); Tong Xu (University of Science and Technology of China); Qi Liu (University of Science and Technology of China); Defu Lian (University of Science and Technology of China); Enhong Chen (University of Science and Technology of China); Dongfang Du (Tencent); Han Wu (University of Science and Technology of China); Wen Su (Tencent);

Recently, the Network Representation Learning (NRL) techniques, which represent graph structure via low-dimension vectors to support social-oriented application, have attracted wide attention. Though large efforts have been made, they may fail to describe the multiple aspects of similarity between social users, as only a single vector for one unique aspect has been represented for each node. To that end, in this paper, we propose a novel end-to-end framework named MCNE to learn multiple conditional network representations, so that various preferences for multiple behaviors could be fully captured. Specifically, we first design a binary mask layer to divide the single vector as conditional embeddings for multiple behaviors. Then, we introduce the attention network to model interaction relationship among multiple preferences, and further utilize the adapted message sending and receiving operation of graph neural network, so that multi-aspect preference information from high-order neighbors will be captured. Finally, we utilize Bayesian Personalized Ranking loss function to learn the preference similarity on each behavior, and jointly learn multiple conditional node embeddings via multi-task learning framework. Extensive experiments on public datasets validate that our MCNE framework could significantly outperform several state-of-the-art baselines, and further support the visualization and transfer learning tasks with excellent interpretability and robustness.

Mining Algorithm Roadmap in Scientific Publications
Hanwen Zha (University of California, Santa Barbara); Wenhu Chen (University of California, Santa Barbara); Keqian Li (University of California, Santa Barbara); Xifeng Yan (University of California, Santa Barbara);

Modeling Dwell Time Engagement on Visual Multimedia
Hemank Lamba (Carnegie Mellon University); Neil Shah (Snap Research);

Modeling Extreme Events in Time Series Prediction
Daizong Ding (Fudan University); Mi Zhang (Fudan University); Xudong Pan (Fudan University); Min Yang (School of Computer Science, Fudan University); Xiangnan He (University of Science and Technology of China);

Multi-Relational Classification via Bayesian Ranked Non-Linear Embeddings
Ahmed Rashed (Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of Hildesheim); Josif Grabocka (Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of Hildesheim); Lars Schmidt-Thieme (Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of Hildesheim);

Multi-task Recurrent Neural Network and Higher-order Markov Random Fields for Stock Price Prediction
Chang Li (School of Computer Science University of Sydney); Dongjin Song ( Capital Market CRC); Dacheng Tao (NEC);

Multiple Relational Attention Network for Multi-task Learning
Jiejie Zhao (Beihang University); Bowen Du (Beihang University); Leilei Sun (Beihang University); Fuzhen Zhuang (Chinese Academy of Sciences); Weifeng Lv (Beihang University); Hui Xiong (Rutgers University);

Network Density of States
Kun Dong (Cornell University); Austin Benson (Cornell University); David Bindel (Cornell University);

NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching
Dingqi Yang (eXascale Infolab, University of Fribourg,); Paolo Rosso (University of Fribourg); Bin Li (Fudan University); Philippe Cudre-Mauroux (U. of Fribourg);

OBOE: Collaborative Filtering for AutoML Model Selection link
Chengrun Yang (Cornell University); Yuji Akimoto (Cornell University); Dae Won Kim (Cornell University); Madeleine Udell (Cornell University);

Algorithm selection and hyperparameter tuning remain two of the most challenging tasks in machine learning. Automated machine learning (AutoML) seeks to automate these tasks to enable widespread use of machine learning by non-experts. This paper introduces OBOE, a collaborative filtering method for time-constrained model selection and hyperparameter tuning. OBOE forms a matrix of the cross-validated errors of a large number of supervised learning models (algorithms together with hyperparameters) on a large number of datasets, and fits a low rank model to learn the low-dimensional feature vectors for the models and datasets that best predict the cross-validated errors. To find promising models for a new dataset, OBOE runs a set of fast but informative algorithms on the new dataset and uses their cross-validated errors to infer the feature vector for the new dataset. OBOE can find good models under constraints on the number of models fit or the total time budget. To this end, this paper develops a new heuristic for active learning in time-constrained matrix completion based on optimal experiment design. Our experiments demonstrate that OBOE delivers state-of-the-art performance faster than competing approaches on a test bed of supervised learning problems. Moreover, the success of the bilinear model used by OBOE suggests that AutoML may be simpler than was previously understood.

Off-policy Learning for Multiple Loggers
Li He (Data Science Lab, JD.com); Long Xia (Data Science Lab, JD.com); Wei Zeng (Institute of Computing Technology, Chinese Academy of Sciences); Zhi-Ming Ma (JD.com); Yihong Zhao (JD.com); Dawei Yin (JD.com);

On Dynamic Network Models and Application to Causal Impact
Yu-Chia Chen (University of Washington); Avleen S. Bijral (Microsoft); Juan Lavista Ferres (Microsoft);

Optimizing Impression Counts for Outdoor Advertising
Yipeng Zhang (RMIT University); Yuchen Li (Singapore Management University); Zhifeng Bao (RMIT University); Songsong Mo (Wuhan University); Ping Zhang (Huawei);

Optimizing Peer Learning in Online Groups with Affinities
Mohammadreza Esfandiari (NJIT); Dong Wei (NJIT); Sihem Amer-Yahia (CNRS); Senjuti Basu Roy (New Jersey Institute Of Technology);

Origin-Destination Matrix Prediction via Graph Convolution: A New Perspective of Passenger Demand Modeling
Yuandong Wang (Beihang University); Hongzhi Yin (The University of Queensland); Hongxu Chen (The University of Queensland); Tianyu Wo (Beihang University); Jie Xu (University of Leeds); Kai Zheng (University of Electronic Science and Technology);

Pairwise Comparisons with Flexible Time-Dynamics link
Lucas Maystre (Ecole Polytechnique Fédérale de Lausanne); Victor Kristof (Ecole Polytechnique Fédérale de Lausanne); Matthias Grossglauser (Ecole Polytechnique Fédérale de Lausanne);

Inspired by applications in sports where the skill of players or teams competing against each other varies over time, we propose a probabilistic model of pairwise-comparison outcomes that can capture a wide range of time dynamics. We achieve this by replacing the static parameters of a class of popular pairwise-comparison models by continuous-time Gaussian processes; the covariance function of these processes enables expressive dynamics. We develop an efficient inference algorithm that computes an approximate Bayesian posterior distribution. Despite the flexbility of our model, our inference algorithm requires only a few linear-time iterations over the data and can take advantage of modern multiprocessor computer architectures. We apply our model to several historical databases of sports outcomes and find that our approach outperforms competing approaches in terms of predictive performance, scales to millions of observations, and generates compelling visualizations that help in understanding and interpreting the data.

Paper Matching with Local Fairness Constraints link
Ari Kobren (University of Massachusetts Amherst); Barna Saha (University of Massachusetts Amherst); Andrew McCallum (University of Massachusetts Amherst);

Automatically matching reviewers to papers is a crucial step of the peer review process for venues receiving thousands of submissions. Unfortunately, common paper matching algorithms often construct matchings suffering from two critical problems: (1) the group of reviewers assigned to a paper do not collectively possess sufficient expertise, and (2) reviewer workloads are highly skewed. In this paper, we propose a novel local fairness formulation of paper matching that directly addresses both of these issues. Since optimizing our formulation is not always tractable, we introduce two new algorithms, FairIR and FairFlow, for computing fair matchings that approximately optimize the new formulation. FairIR solves a relaxation of the local fairness formulation and then employs a rounding technique to construct a valid matching that provably maximizes the objective and only compromises on fairness with respect to reviewer loads and papers by a small constant. In contrast, FairFlow is not provably guaranteed to produce fair matchings, however it can be 2x as efficient as FairIR and an order of magnitude faster than matching algorithms that directly optimize for fairness. Empirically, we demonstrate that both FairIR and FairFlow improve fairness over standard matching algorithms on real conference data. Moreover, in comparison to state-of-the-art matching algorithms that optimize for fairness only, FairIR achieves higher objective scores, FairFlow achieves competitive fairness, and both are capable of more evenly allocating reviewers.

PerDREP: Personalized Drug Effectiveness Prediction from Longitudinal Observational Data
Sanjoy Dey (IBM); Ping Zhang (The Ohio State University); Daby Sow (IBM); Kenney Ng (IBM);

Predicting Embedding Trajectories for Temporal Interaction Networks
Srijan Kumar (Stanford University); Xikun Zhang (University of Illinois at Urbana-Champaign); Jure Leskovec (Stanford University);

Predicting Path Failure In Time-Evolving Graphs link
Jia Li (The Chinese University of Hong Kong); Zhichao Han (The Chinese University of Hong Kong); Hong Cheng (The Chinese University of Hong Kong); Jiao Su (The Chinese University of Hong Kong); Pengyun Wang (Noah Ark's Lab, Huawei Technologies); Jianfeng Zhang (Noah Ark's Lab, Huawei Technologies); Lujia Pan (Noah Ark's Lab, Huawei Technologies);

In this paper we use a time-evolving graph which consists of a sequence of graph snapshots over time to model many real-world networks. We study the path classification problem in a time-evolving graph, which has many applications in real-world scenarios, for example, predicting path failure in a telecommunication network and predicting path congestion in a traffic network in the near future. In order to capture the temporal dependency and graph structure dynamics, we design a novel deep neural network named Long Short-Term Memory R-GCN (LRGCN). LRGCN considers temporal dependency between time-adjacent graph snapshots as a special relation with memory, and uses relational GCN to jointly process both intra-time and inter-time relations. We also propose a new path representation method named self-attentive path embedding (SAPE), to embed paths of arbitrary length into fixed-length vectors. Through experiments on a real-world telecommunication network and a traffic network in California, we demonstrate the superiority of LRGCN to other competing methods in path failure prediction, and prove the effectiveness of SAPE on path representation.

PressLight: Learning Max Pressure Control for Signalized Intersections in Arterial Network
Hua Wei (The Pennsylvania State University); Chacha Chen (Shanghai Jiao Tong University); Guanjie Zheng (The Pennsylvania State University); Kan Wu (The Pennsylvania State University); Vikash Gayah (The Pennsylvania State University); Kai Xu (Tianrang Inc.); Zhenhui Jessie Li (The Pennsylvania State University);

PrivPy: General and Scalable Privacy-Preserving Data Mining
Yi Li (Tsinghua University); Wei Xu (Tsinghua University);

ProGAN: Network Embedding via Proximity Generative Adversarial Network
Hongchang Gao (University of Pittsburgh); Jian Pei (Simon Fraser University); Heng Huang (University of Pittsburgh);

Quantifying Long Range Dependence in Language and User Behavior to improve RNNs link
Francois Belletti (Google); Minmin Chen (Google); Ed Chi (Google);

Characterizing temporal dependence patterns is a critical step in understanding the statistical properties of sequential data. Long Range Dependence (LRD) --- referring to long-range correlations decaying as a power law rather than exponentially w.r.t. distance --- demands a different set of tools for modeling the underlying dynamics of the sequential data. While it has been widely conjectured that LRD is present in language modeling and sequential recommendation, the amount of LRD in the corresponding sequential datasets has not yet been quantified in a scalable and model-independent manner. We propose a principled estimation procedure of LRD in sequential datasets based on established LRD theory for real-valued time series and apply it to sequences of symbols with million-item-scale dictionaries. In our measurements, the procedure estimates reliably the LRD in the behavior of users as they write Wikipedia articles and as they interact with YouTube. We further show that measuring LRD better informs modeling decisions in particular for RNNs whose ability to capture LRD is still an active area of research. The quantitative measure informs new Evolutive Recurrent Neural Networks (EvolutiveRNNs) designs, leading to state-of-the-art results on language understanding and sequential recommendation tasks at a fraction of the computational cost.

QuesNet: A Unified Representation for Heterogeneous Test Questions link
Yu Yin (University of Science and Technology of China); Qi Liu (University of Science and Technology of China); Zhenya Huang (University of Science and Technology of China); Enhong Chen (University of Science and Technology of China); Wei Tong (University of Science and Technology of China); Shijin Wang (iFLYTEK Research); Yu Su (iFLYTEK Research);

Understanding learning materials (e.g. test questions) is a crucial issue in online learning systems, which can promote many applications in education domain. Unfortunately, many supervised approaches suffer from the problem of scarce human labeled data, whereas abundant unlabeled resources are highly underutilized. To alleviate this problem, an effective solution is to use pre-trained representations for question understanding. However, existing pre-training methods in NLP area are infeasible to learn test question representations due to several domain-specific characteristics in education. First, questions usually comprise of heterogeneous data including content text, images and side information. Second, there exists both basic linguistic information as well as domain logic and knowledge. To this end, in this paper, we propose a novel pre-training method, namely QuesNet, for comprehensively learning question representations. Specifically, we first design a unified framework to aggregate question information with its heterogeneous inputs into a comprehensive vector. Then we propose a two-level hierarchical pre-training algorithm to learn better understanding of test questions in an unsupervised way. Here, a novel holed language model objective is developed to extract low-level linguistic features, and a domain-oriented objective is proposed to learn high-level logic and knowledge. Moreover, we show that QuesNet has good capability of being fine-tuned in many question-based tasks. We conduct extensive experiments on large-scale real-world question data, where the experimental results clearly demonstrate the effectiveness of QuesNet for question understanding as well as its superior applicability.

Regularized regression for hierarchical forecasting without unbiasdness conditions
Souhaib Ben Taieb (University of Mons); Bonsoo Koo (Monash University);

Relation Extraction via Domain-aware Transfer Learning
Shimin Di (The Hong Kong University of Secience and Technology); Yanyan Shen (Shanghai Jiao Tong University); Lei Chen (The Hong Kong University of Science and Technology);

Representation Learning for Attributed Multiplex Heterogeneous Network link
Yukuo Cen (Tsinghua University); Xu Zou (Tsinghua University); Jianwei Zhang (Alibaba); Hongxia Yang (Alibaba); Jingren Zhou (Alibaba); Jie Tang (Tsinghua University);

Network embedding (or graph embedding) has been widely used in many real-world applications. However, existing methods mainly focus on networks with single-typed nodes/edges and cannot scale well to handle large networks. Many real-world networks consist of billions of nodes and edges of multiple types, and each node is associated with different attributes. In this paper, we formalize the problem of embedding learning for the Attributed Multiplex Heterogeneous Network and propose a unified framework to address this problem. The framework supports both transductive and inductive learning. We also give the theoretical analysis of the proposed framework, showing its connection with previous works and proving its better expressiveness. We conduct systematical evaluations for the proposed framework on four different genres of challenging datasets: Amazon, YouTube, Twitter, and Alibaba. Experimental results demonstrate that with the learned embeddings from the proposed framework, we can achieve statistically significant improvements (e.g., 5.99-28.23% lift by F1 scores; p<<0.01, t-test) over previous state-of-the-art methods for link prediction. The framework has also been successfully deployed on the recommendation system of a worldwide leading e-commerce company, Alibaba Group. Results of the offline A/B tests on product recommendation further confirm the effectiveness and efficiency of the framework in practice.

Retaining Privileged Information for Multi-Task Learning
Fengyi Tang (Michigan State University); Cao Xiao (IQVIA); Fei Wang (Cornell University); Jiayu Zhou (Michigan State University); Li-Wei Lehman (Massachusetts Institute of Technology);

Revisiting kd-tree for Nearest Neighbor Search
Parikshit Ram (IBM); Kaushik Sinha (Wichita State University);

Riker: Mining Rich Keyword Representations for Interpretable Product Question Answering
Jie Zhao (The Ohio State University); Ziyu Guan (Xidian University); Huan Sun (The Ohio State University);

Robust Graph Convolutional Networks Against Adversarial Attacks
Dingyuan Zhu (Tsinghua University); Ziwei Zhang (Tsinghua University); Peng Cui (Tsinghua University); Wenwu Zhu (Tsinghua University);

Robust Task Grouping with Representative Tasks for Clustered Multi-Task Learning
Yaqiang Yao (University of Science and Technology of China); Jie Cao (Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics); Huanhuan Chen (University of Science and Technology of China);

Scalable Global Alignment Graph Kernel Using Random Features: From Node Embedding to Graph Embedding
Lingfei Wu (IBM); Ian En-Hsu Yen (Carnegie Mellon University); Zhen Zhang (Washington University in St. Louis); Kun Xu (Peking University); Liang Zhao (George Mason University); Xi Peng (Binghamton University); Yinglong Xia (Huawei); Charu Aggarwal (IBM);

Scalable Graph Embeddings via Sparse Transpose Proximities link
Yuan Yin (School of Information, Renming University); Zhewei Wei (School of Information, Renming University);

Graph embedding learns low-dimensional representations for nodes in a graph and effectively preserves the graph structure. Recently, a significant amount of progress has been made toward this emerging research area. However, there are several fundamental problems that remain open. First, existing methods fail to preserve the out-degree distributions on directed graphs. Second, many existing methods employ random walk based proximities and thus suffer from conflicting optimization goals on undirected graphs. Finally, existing factorization methods are unable to achieve scalability and non-linearity simultaneously. This paper presents an in-depth study on graph embedding techniques on both directed and undirected graphs. We analyze the fundamental reasons that lead to the distortion of out-degree distributions and to the conflicting optimization goals. We propose {\em transpose proximity}, a unified approach that solves both problems. Based on the concept of transpose proximity, we design \strap, a factorization based graph embedding algorithm that achieves scalability and non-linearity simultaneously. \strap makes use of the {\em backward push} algorithm to efficiently compute the sparse {\em Personalized PageRank (PPR)} as its transpose proximities. By imposing the sparsity constraint, we are able to apply non-linear operations to the proximity matrix and perform efficient matrix factorization to derive the embedding vectors. Finally, we present an extensive experimental study that evaluates the effectiveness of various graph embedding algorithms, and we show that \strap outperforms the state-of-the-art methods in terms of effectiveness and scalability.

Scalable Hierarchical Clustering via Tree Grafting
Nicholas Monath (University of Massachusetts Amherst); Ari Kobren (UMass Amherst); Akshay Krishnamurthy (University of Massachusetts, Amherst); Michael Glass (IBM); Andrew Mccallum (University of Massachusetts);

Scaling Multi-Armed Bandit Algorithms
Edouard Fouché (Karlsruhe Institute of Technology); Junpei Komiyama (The University of Tokyo); Klemens Böhm (Karlsruhe Institute of Technology);

Scaling Multinomial Logistic Regression via Hybrid Parallelism
Parameswaran Raman (University of California, Santa Cruz); Sriram Srinivasan (University of California, Santa Cruz); Shin Matsushima (The University of Tokyo); Xinhua Zhang (University of Illinios, Chicago); Hyokun Yun (Amazon); Vishwanathan S.V.N. (Amazon);

Separated Trust Regions Policy Optimization Method
Luobao Zou (Shanghai Jiao Tong University); Zhiwei Zhuang (Shanghai Jiao Tong University); Yin Cheng (Shanghai Jiao Tong University); Xuechun Wang (Shanghai Jiao Tong University); Weidong Zhang (Shanghai Jiao Tong University);

Sequential Anomaly Detection using Inverse Reinforcement Learning
Min-Hwan Oh (Columbia University); Garud Iyengar (Columbia University);

Sets2Sets: Learning from Sequential Sets with Neural Networks
Haoji Hu (University of Minnesota); Xiangnan He (University of Science and Technology of China);

Sherlock: A Deep Learning Approach to Semantic Data Type Detection link
Madelon Hulsebos (Massachusetts Institute of Technology); Kevin Hu (Massachusetts Institute of Technology); Michiel Bakker (Massachusetts Institute of Technology); Emanuel Zgraggen (Massachusetts Institute of Technology); Arvind Satyanarayan (Massachusetts Institute of Technology); Tim Kraska (Massachusetts Institute of Technology); Çağatay Demiralp (Massachusetts Institute of Technology); César Hidalgo (Massachusetts Institute of Technology);

Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number of types. We introduce Sherlock, a multi-input deep neural network for detecting semantic types. We train Sherlock on $686,765$ data columns retrieved from the VizNet corpus by matching $78$ semantic types from DBpedia to column headers. We characterize each matched column with $1,588$ features describing the statistical properties, character distributions, word embeddings, and paragraph vectors of column values. Sherlock achieves a support-weighted F$_1$ score of $0.89$, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.

Significance of Patterns in Data Visualisations
Rafael Savvides (University of Helsinki); Andreas Henelius (University of Helsinki); Emilia Oikarinen (University of Helsinki); Kai Puolamäki (University of Helsinki);

Social Recommendation with Optimal Limited Attention
Xin Wang (Tsinghua University); Wenwu Zhu (Tsinghua University); Chenghao Liu (Singapore Management University);

SPuManTE: Significant Pattern Mining with Unconditional Testing
Leonardo Pellegrina (University of Padova); Matteo Riondato (Amherst College); Fabio Vandin (University of Padova);

Stability and Generalization of Graph Convolutional Neural Networks link
Saurabh Verma (University of Minnesota); Zhi-Li Zhang (University of Minnesota);

Inspired by convolutional neural networks on 1D and 2D data, graph convolutional neural networks (GCNNs) have been developed for various learning tasks on graph data, and have shown superior performance on real-world datasets. Despite their success, there is a dearth of theoretical explorations of GCNN models such as their generalization properties. In this paper, we take a first step towards developing a deeper theoretical understanding of GCNN models by analyzing the stability of single-layer GCNN models and deriving their generalization guarantees in a semi-supervised graph learning setting. In particular, we show that the algorithmic stability of a GCNN model depends upon the largest absolute eigenvalue of its graph convolution filter. Moreover, to ensure the uniform stability needed to provide strong generalization guarantees, the largest absolute eigenvalue must be independent of the graph size. Our results shed new insights on the design of new & improved graph convolution filters with guaranteed algorithmic stability. We evaluate the generalization gap and stability on various real-world graph datasets and show that the empirical results indeed support our theoretical findings. To the best of our knowledge, we are the first to study stability bounds on graph learning in a semi-supervised setting and derive generalization bounds for GCNN models.

State-Sharing Sparse Hidden Markov Models for Personalized Sequences
Hongzhi Shi (Tsinghua University); Chao Zhang (Georgia Institute of Technology); Quanming Yao (4Paradigm); Yong Li (Tsinghua University); Funing Sun (Tencent); Depeng Jin (Tsinghua University);

Streaming Adaptation of Deep Forecasting Models using Adaptive Recurrent Units link
Prathamesh Deshpande (IIT); Sunita Sarawagi (IIT);

We present ARU, an Adaptive Recurrent Unit for streaming adaptation of deep globally trained time-series forecasting models. The ARU combines the advantages of learning complex data transformations across multiple time series from deep global models, with per-series localization offered by closed-form linear models. Unlike existing methods of adaptation that are either memory-intensive or non-responsive after training, ARUs require only fixed sized state and adapt to streaming data via an easy RNN-like update operation. The core principle driving ARU is simple --- maintain sufficient statistics of conditional Gaussian distributions and use them to compute local parameters in closed form. Our contribution is in embedding such local linear models in globally trained deep models while allowing end-to-end training on the one hand, and easy RNN-like updates on the other. Across several datasets we show that ARU is more effective than recently proposed local adaptation methods that tax the global network to compute local parameters.

Streaming Session-based Recommendation
Lei Guo (Shandong Normal University); Hongzhi Yin (The University of Queensland); Qinyong Wang (The University of Queensland); Tong Chen (The University of Queensland); Alexander Zhou (The University of Queensland); Nguyen Quoc Viet Hung (Griffith University);

SurfCon: Synonym Discovery on Privacy-Aware Clinical Data link
Zhen Wang (The Ohio State University); Xiang Yue (The Ohio State University); Soheil Moosavinasab (Nationwide Children's Hospital); Yungui Huang (Nationwide Children's Hospital); Simon Lin (Nationwide Children's Hospital); Huan Sun (The Ohio State University);

Unstructured clinical texts contain rich health-related information. To better utilize the knowledge buried in clinical texts, discovering synonyms for a medical query term has become an important task. Recent automatic synonym discovery methods leveraging raw text information have been developed. However, to preserve patient privacy and security, it is usually quite difficult to get access to large-scale raw clinical texts. In this paper, we study a new setting named synonym discovery on privacy-aware clinical data (i.e., medical terms extracted from the clinical texts and their aggregated co-occurrence counts, without raw clinical texts). To solve the problem, we propose a new framework SurfCon that leverages two important types of information in the privacy-aware clinical data, i.e., the surface form information, and the global context information for synonym discovery. In particular, the surface form module enables us to detect synonyms that look similar while the global context module plays a complementary role to discover synonyms that are semantically similar but in different surface forms, and both allow us to deal with the OOV query issue (i.e., when the query is not found in the given data). We conduct extensive experiments and case studies on publicly available privacy-aware clinical data, and show that SurfCon can outperform strong baseline methods by large margins under various settings.

Tackle Balancing Constraint for Incremental Semi-Supervised Support Vector Learning
Shuyang Yu (Northeastern University); Bin Gu (JD.com); Kunpeng Ning (Nanjing University of Aeronautics and Astronautics); Haiyan Chen (Nanjing University of Aeronautics and Astronautics); Jian Pei (Simon Fraser University); Heng Huang (University of Pittsburgh);

Task-Adversarial Co-Generative Nets
Pei Yang (Arizona State University & South China University of Technology); Qi Tan (South China Normal University); Hanghang Tong (Arizona State University); Jingrui He (Arizona State University);

Tensorized Determinantal Point Processes for Recommendation
Romain Warlop (fifty-five); Jérémie Mary (Criteo); Mike Gartrell (Criteo);

Testing Dynamic Incentive Compatibility in Display Ad Auctions
Yuan Deng (Duke University); Sébastien Lahaie (Google);

The Impact of Person-Organization Fit on Talent Management: A Structure-Aware Convolutional Neural Network Approach
Ying Sun (Institute of Computing Technology, Chinese Academy of Sciences); Fuzhen Zhuang (Institute of Computing Technology, Chinese Academy of Sciences); Hengshu Zhu (Baidu); Xin Song (Baidu); Qing He (Institute of Computing Technology, Chinese Academy of Sciences); Hui Xiong (Baidu);

The Role of “Condition”: A Novel Scientific Knowledge Graph Representation and Construction Model
Tianwen Jiang (University of Notre Dame); Tong Zhao (University of Notre Dame); Bing Qin (Harbin Institute of Technology); Ting Liu (Harbin Institute of Technology); Nitesh Chawla (University of Notre Dame); Meng Jiang (University of Notre Dame);

Three-Dimensional Stable Matching Problem for Spatial Crowdsourcing Platforms
Boyang Li (Northeastern University); Yurong Cheng (Beijing Institute of Technology); Ye Yuan (Northeastern University); Guoren Wang (Beijing Institute of Technology); Lei Chen (The Hong Kong University of Science and Technology);

Time Critic Policy Gradient Methods for Traffic Signal Control in Complex and Congested Scenarios
Stefano Giovanni Rizzo (Qatar Computing Research Institute); Giovanna Vantini (Qatar Computing Research Institute); Sanjay Chawla (Qatar Computing Research Institute);

Towards Robust and Discriminative Sequential Data Learning: When and How to Perform Adversarial Training?
Xiaowei Jia (University of Minnesota); Sheng Li (University of Georgia); Handong Zhao (Adobe); Sungchul Kim (Adobe); Vipin Kumar (University of Minnesota);

Training and Meta-Training Binary Neural Networks with Quantum Computing
Abdulah Fawaz (Siemens); Sebastien Piat (Siemens); Paul Klein (Siemens); Simone Severini (University College London); Peter Mountney (Siemens);

TUBE: Embedding Behavior Outcomes for Predicting Success
Daheng Wang (University of Notre Dame); Tianwen Jiang (University of Notre Dame); Nitesh Chawla (University of Notre Dame); Meng Jiang (University of Notre Dame);

Uncovering Pattern Formation of Information Flow
Chengxi Zang (Tsinghua University); Peng Cui (Tsinghua University); Chaoming Song (University of Miami); Wenwu Zhu (Tsinghua University); Fei Wang (Cornell University);

Unifying Inter-region Autocorrelation and Intra-region Structures for Spatial Embedding via Collective Adversarial Learning
Yunchao Zhang (Missouri University of Science and Technology); Pengyang Wang (Missouri University of Science and Technology); Xiaolin Li (Nanjing University); Yu Zheng (JD); Yanjie Fu (Missouri University of Science and Technology);

Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological Concepts
Junheng Hao (University of California, Los Angeles); Muhao Chen (University of California, Los Angeles); Wenchao Yu (University of California, Los Angeles); Yizhou Sun (University of California, Los Angeles); Wei Wang (University of California, Los Angeles);

Urban Traffic Prediction from Spatio-Temporal Data using Deep Meta Learning
Zheyi Pan (Shanghai Jiao Tong University); Yuxuan Liang (National University of Singapore); Weifeng Wang (Shanghai Jiao Tong University); Yong Yu (Shanghai Jiao Tong University); Yu Zheng (JD); Junbo Zhang (JD);

λOpt: Learn to Regularize Recommender Models in Finer Levels
Yihong Chen (Tsinghua University); Bei Chen (Microsoft); Xiangnan He (University of Science and Technology of China); Chen Gao (Tsinghua University); Yong Li (Tsinghua University); Jian-Guang Lou (Microsoft); Yue Wang (Tsinghua University);

Applied Data Science Track Papers

150 successful Machine Learning models: 6 lessons learned at Booking.com
Pablo Estevez, Themistoklis Mavridis and Lucas Bernardi

A Collaborative Learning Framework to Tag Refinement for Points of Interest
Jingbo Zhou, Shan Gou, Renjun Hu, Dongxiang Zhang, Jin Xu, Xuehui Wu, Airong Jiang and Hui Xiong

A Data-Driven Approach for Multi-level Packing Problems in Manufacturing Industry
Lei Chen, Xialiang Tong, Mingxuan Yuan, Jia Zeng and Lei Chen

A Deep Generative Approach to Search Extrapolation and Recommendation
Fred.X Han, Di Niu, Haolan Chen, Kunfeng Lai, Yancheng He and Yu Xu

A Deep Value-network Based Approach for Multi-Driver Order Dispatching
Xiaocheng Tang, Zhiwei Qin, Fan Zhang, Zhaodong Wang, Zhe Xu, Yintai Ma, Hongtu Zhu and Jieping Ye

A Generalized Framework for Population Based Training
Ang Li, Ola Spyra, Sagi Perel, Valentin Dalibard, Max Jaderberg, Chenjie Gu, David Budden, Tim Harley and Pramod Gupta

A Robust Framework for Accelerated Outcome-driven Risk Factor Identification from EHR
Prithwish Chakraborty and Faisal Farooq

A Severity Score for Retinopathy of Prematurity
Peng Tian, Yuan Guo, Jayashree Kalpathy-Cramer, Susan Ostmo, J. Peter Campbell, Michael F. Chiang, Jennifer Dy, Deniz Erdogmus and Stratis Ioannidis

A Unified Framework for Marketing Budget Allocation link
Kui Zhao, Junhao Hua, Ling Yan, Qi Zhang, Huan Xu and Cheng Yang

While marketing budget allocation has been studied for decades in traditional business, nowadays online business brings much more challenges due to the dynamic environment and complex decision-making process. In this paper, we present a novel unified framework for marketing budget allocation. By leveraging abundant data, the proposed data-driven approach can help us to overcome the challenges and make more informed decisions. In our approach, a semi-black-box model is built to forecast the dynamic market response and an efficient optimization method is proposed to solve the complex allocation task. First, the response in each market-segment is forecasted by exploring historical data through a semi-black-box model, where the capability of logit demand curve is enhanced by neural networks. The response model reveals relationship between sales and marketing cost. Based on the learned model, budget allocation is then formulated as an optimization problem, and we design efficient algorithms to solve it in both continuous and discrete settings. Several kinds of business constraints are supported in one unified optimization paradigm, including cost upper bound, profit lower bound, or ROI lower bound. The proposed framework is easy to implement and readily to handle large-scale problems. It has been successfully applied to many scenarios in Alibaba Group. The results of both offline experiments and online A/B testing demonstrate its effectiveness.

A User-Centered Concept Mining System for Query and Document Understanding at Tencent link
Bang Liu, Weidong Guo, Di Niu, Chaoyue Wang, Shunnan Xu, Jinghong Lin, Kunfeng Lai and Yu Xu

Concepts embody the knowledge of the world and facilitate the cognitive processes of human beings. Mining concepts from web documents and constructing the corresponding taxonomy are core research problems in text understanding and support many downstream tasks such as query analysis, knowledge base construction, recommendation, and search. However, we argue that most prior studies extract formal and overly general concepts from Wikipedia or static web pages, which are not representing the user perspective. In this paper, we describe our experience of implementing and deploying ConcepT in Tencent QQ Browser. It discovers user-centered concepts at the right granularity conforming to user interests, by mining a large amount of user queries and interactive search click logs. The extracted concepts have the proper granularity, are consistent with user language styles and are dynamically updated. We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser. We performed extensive offline evaluation to demonstrate that our approach could extract concepts of higher quality compared to several other existing methods. Our system has been deployed in Tencent QQ Browser. Results from online A/B testing involving a large number of real users suggest that the Impression Efficiency of feeds users increased by 6.01% after incorporating the user-centered concepts into the recommendation framework of Tencent QQ Browser.

AccuAir: Winning Solution to Air Quality Prediction for KDD Cup 2018
Zhipeng Luo, Jianqiang Huang, Ke Hu, Xue Li and Peng Zhang

Actions Speak Louder Than Goals: Valuing Player Actions in Soccer link
Tom Decroos, Lotte Bransen, Jan Van Haaren and Jesse Davis

Assessing the impact of the individual actions performed by soccer players during games is a crucial aspect of the player recruitment process. Unfortunately, most traditional metrics fall short in addressing this task as they either focus on rare events like shots and goals alone or fail to account for the context in which the actions occurred. This paper introduces a novel advanced soccer metric for valuing any type of individual player action on the pitch, be it with or without the ball. Our metric values each player action based on its impact on the game outcome while accounting for the circumstances under which the action happened. When applied to on-the-ball actions like passes, dribbles, and shots alone, our metric identifies Argentine forward Lionel Messi, French teenage star Kylian Mbapp\'e, and Belgian winger Eden Hazard as the most effective players during the 2016/2017 season.

Active Deep Learning for Activity Recognition with Context Aware Annotator Selection
H M Sajjad Hossain and Nirmalya Roy

Adversarial Matching of Dark Net Market Vendor Accounts
Xiao Hui Tai, Kyle Soska and Nicolas Christin

AiAds: Automated and Intelligent Advertising System for Sponsored Search
Xiao Yang

AKUPM: Attention-Enhanced Knowledge-Aware User Preference Model for Recommendation
Xiaoli Tang, Tengyun Wang, Haizhi Yang and Hengjie Song

AlphaStock: Buying Winners and Selling Losers in Deep
Jingyuan Wang, Yang Zhang, Ke Tang, Junjie Wu and Zhang Xiong

Ambulatory Atrial Fibrillation Monitoring Using Wearable Photoplethysmography with Deep Learning link
Maxime Voisin, Yichen Shen, Alireza Aliamiri, Anand Avati, Awni Hannun and Andrew Ng

We develop an algorithm that accurately detects Atrial Fibrillation (AF) episodes from photoplethysmograms (PPG) recorded in ambulatory free-living conditions. We collect and annotate a dataset containing more than 4000 hours of PPG recorded from a wrist-worn device. Using a 50-layer convolutional neural network, we achieve a test AUC of 95% and show robustness to motion artifacts inherent to PPG signals. Continuous and accurate detection of AF from PPG has the potential to transform consumer wearable devices into clinically useful medical monitoring tools.

Annotating Videos at YouTube Scale
Seong Jae Hwang, Joonseok Lee, Balakrishnan Varadarajan, Ariel Gordon, Zheng Xu and Apostol Natsev

Anomaly Detection for an E-commerce Pricing System link
Jagdish Ramakrishnan, Elham Shaabani, Chao Li and Mátyás Sustik

Online retailers execute a very large number of price updates when compared to brick-and-mortar stores. Even a few mis-priced items can have a significant business impact and result in a loss of customer trust. Early detection of anomalies in an automated real-time fashion is an important part of such a pricing system. In this paper, we describe unsupervised and supervised anomaly detection approaches we developed and deployed for a large-scale online pricing system at Walmart. Our system detects anomalies both in batch and real-time streaming settings, and the items flagged are reviewed and actioned based on priority and business impact. We found that having the right architecture design was critical to facilitate model performance at scale, and business impact and speed were important factors influencing model selection, parameter choice, and prioritization in a production environment for a large-scale system. We conducted analyses on the performance of various approaches on a test set using real-world retail data and fully deployed our approach into production. We found that our approach was able to detect the most important anomalies with high precision.

Applying Deep Learning To Airbnb Search link
Malay Haldar, Mustafa Abdool, Prashant Ramanathan, Tao Xu, Shulin Yang, Huizhong Duan, Qing Zhang, Nick Barrow-Williams, Bradley C. Turnbull, Brendan M. Collins and Thomas Legrand

The application to search ranking is one of the biggest machine learning success stories at Airbnb. Much of the initial gains were driven by a gradient boosted decision tree model. The gains, however, plateaued over time. This paper discusses the work done in applying neural networks in an attempt to break out of that plateau. We present our perspective not with the intention of pushing the frontier of new modeling techniques. Instead, ours is a story of the elements we found useful in applying neural networks to a real life product. Deep learning was steep learning for us. To other teams embarking on similar journeys, we hope an account of our struggles and triumphs will provide some useful pointers. Bon voyage!

Auto-Keras: An Efficient Neural Architecture Search System link
Haifeng Jin, Qingquan Song and Xia Hu

Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a neural network while changing its neural architecture, could be helpful for NAS by enabling more efficient training during the search. In this paper, we propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search. The framework develops a neural network kernel and a tree-structured acquisition function optimization algorithm to efficiently explores the search space. Intensive experiments on real-world benchmark datasets have been done to demonstrate the superior performance of the developed framework over the state-of-the-art methods. Moreover, we build an open-source AutoML system based on our method, namely Auto-Keras. The system runs in parallel on CPU and GPU, with an adaptive search strategy for different GPU memory limits.

AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications link
Luo Yuanfei, Mengshuo Wang, Hao Zhou, Quanming Yao, Wei-Wei Tu, Yuqiang Chen, Qiang Yang and Wenyuan Dai

Feature crossing captures interactions among categorical features and is useful to enhance learning from tabular data in real-world businesses. In this paper, we present AutoCross, an automatic feature crossing tool provided by 4Paradigm to its customers, ranging from banks, hospitals, to Internet corporations. By performing beam search in a tree-structured space, AutoCross enables efficient generation of high-order cross features, which is not yet visited by existing works. Additionally, we propose successive mini-batch gradient descent and multi-granularity discretization to further improve efficiency and effectiveness, while ensuring simplicity so that no machine learning expertise or tedious hyper-parameter tuning is required. Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and real-world business datasets demonstrate the effectiveness and efficiency of AutoCross. It is shown that AutoCross can significantly enhance the performance of both linear and deep models.

Automatic Dialogue Summary Generation for Customer Service
Chunyi Liu, Peng Wang, Jiang Xu, Zang Li and Jieping Ye

Bid Optimization by Multivariable Control in Display Advertising link
Xun Yang, Yasong Li, Hao Wang, Di Wu, Qing Tan, Jian Xu and Kun Gai

Real-Time Bidding (RTB) is an important paradigm in display advertising, where advertisers utilize extended information and algorithms served by Demand Side Platforms (DSPs) to improve advertising performance. A common problem for DSPs is to help advertisers gain as much value as possible with budget constraints. However, advertisers would routinely add certain key performance indicator (KPI) constraints that the advertising campaign must meet due to practical reasons. In this paper, we study the common case where advertisers aim to maximize the quantity of conversions, and set cost-per-click (CPC) as a KPI constraint. We convert such a problem into a linear programming problem and leverage the primal-dual method to derive the optimal bidding strategy. To address the applicability issue, we propose a feedback control-based solution and devise the multivariable control system. The empirical study based on real-word data from Taobao.com verifies the effectiveness and superiority of our approach compared with the state of the art in the industry practices.

Blending Noisy Social Media Signals with Traditional Movement Variables to Predict Forced Migration
Lisa Singh, Laila Wahedi, Yanchen Wang, Yifang Wei, Christo Kirov, Susan Martin, Katharine Donato, Yaguang Liu and Kornraphop Kawintiranon

Buying or Browsing? : Predicting Real-time Purchasing Intent using Attention-based Deep Network with Multiple Behavior
Long Guo, Lifeng Hua, Rongfei Jia, Binqiang Zhao, Xiaobo Wang and Bin Cui

Carousel Ads Optimization in Yahoo Gemini Native
Oren Somekh, Michal Aharon, Avi Shahar, Assaf Singer, Boris Trayvas, Hadas Vogel and Dobri Dobrev

Chainer: a Deep Learning Framework for Accelerating the Research Cycle
Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, Shunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel and Hiroyuki Yamazaki Vincent

Characterizing and Detecting Malicious Accounts inPrivacy-Centric Mobile Social Networks: A Case Study
Zenghua Xia, Chang Liu, Neil Gong, Qi Li, Yong Cui and Dawn Song

Characterizing and Forecasting User Engagement with In-App Action Graphs: A Case Study of Snapchat
Yozen Liu, Xiaolin Shi, Lucas Pierce and Xiang Ren

Combining Decision Trees and Neural Networks forLearning-to-Rank in Personal Search
Pan Li, Zhen Qin, Xuanhui Wang and Donald Metzler

Community Detection on Large Complex Attribute Network
Zhe Chen, Aixin Sun and Xiaokui Xiao

Constructing High Precision Knowledge Bases with Subjective and Factual Attributes link
Ari Kobren, Pablo Bario, Oksana Yakhnenko, Johann Hibschman and Ian Langmore

Knowledge bases (KBs) are the backbone of many ubiquitous applications and are thus required to exhibit high precision. However, for KBs that store subjective attributes of entities, e.g., whether a movie is "kid friendly", simply estimating precision is complicated by the inherent ambiguity in measuring subjective phenomena. In this work, we develop a method for constructing KBs with tunable precision--i.e., KBs that can be made to operate at a specific false positive rate, despite storing both difficult-to-evaluate subjective attributes and more traditional factual attributes. The key to our approach is probabilistically modeling user consensus with respect to each entity-attribute pair, rather than modeling each pair as either True or False. Uncertainty in the model is explicitly represented and used to control the KB's precision. We propose three neural networks for fitting the consensus model and evaluate each one on data from Google Maps--a large KB of locations and their subjective and factual attributes. The results demonstrate that our learned models are well-calibrated and thus can successfully be used to control the KB's precision. Moreover, when constrained to maintain 95% precision, the best consensus model matches the F-score of a baseline that models each entity-attribute pair as a binary variable and does not support tunable precision. When unconstrained, our model dominates the same baseline by 12% F-score. Finally, we perform an empirical analysis of attribute-attribute correlations and show that leveraging them effectively contributes to reduced uncertainty and better performance in attribute prediction.

Context by Proxy: Identifying Contextual Anomalies Using an Output Proxy
Jan-Philipp Schulze, Artur Mrowca, Elizabeth Ren, Hans-Andrea Loeliger and Konstantin Böttinger

Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creatives
Shunsuke Kitada, Hitoshi Iyatomi and Yoshifumi Seki

Deep Semantic Product Search
Vihan Lakshman, Vijai Mohan, Priyanka Nigam, Yiwei Song, Weitian Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu and Bing Yin

Deep Spatio-Temporal Neural Networks for Click-Through Rate Prediction
Wentao Ouyang, Xiuwu Zhang, Li Li, Heng Zou, Xin Xing, Zhaojie Liu and Yanlong Du

Deep Uncertainty Quantification: A Machine Learning Approach for Weather Forecasting link
Bin Wang, Jie Lu, Zheng Yan, Huaishao Luo, Tianrui Li, Yu Zheng and Guangquan Zhang

Weather forecasting is usually solved through numerical weather prediction (NWP), which can sometimes lead to unsatisfactory performance due to inappropriate setting of the initial states. In this paper, we design a data-driven method augmented by an effective information fusion mechanism to learn from historical data that incorporates prior knowledge from NWP. We cast the weather forecasting problem as an end-to-end deep learning problem and solve it by proposing a novel negative log-likelihood error (NLE) loss function. A notable advantage of our proposed method is that it simultaneously implements single-value forecasting and uncertainty quantification, which we refer to as deep uncertainty quantification (DUQ). Efficient deep ensemble strategies are also explored to further improve performance. This new approach was evaluated on a public dataset collected from weather stations in Beijing, China. Experimental results demonstrate that the proposed NLE loss significantly improves generalization compared to mean squared error (MSE) loss and mean absolute error (MAE) loss. Compared with NWP, this approach significantly improves accuracy by 47.76%, which is a state-of-the-art result on this benchmark dataset. The preliminary version of the proposed method won 2nd place in an online competition for daily weather forecasting.

DeepHoops: Evaluating Micro-Actions in Basketball Using Deep Feature Representations of Spatio-Temporal Data
Anthony Sicilia, Konstantinos Pelechrinis and Kirk Goldsberry

DeepRoof: A Data-driven Approach For Solar Potential Estimation Using Rooftop Imagery
Stephen Lee, Srinivasan Iyengar, Menghong Feng, Prashant Shenoy and Subhransu Maji

DeepUrbanEvent: A System for Predicting Citywide Crowd Dynamics at Big Events
Renhe Jiang, Xuan Song, Dou Huang, Xiaoya Song, Tianqi Xia, Zekun Cai, Zhaonan Wang, Kyoung-Sook Kim and Ryosuke Shibasaki

Detecting Anomalies in Space using Multivariate Convolutional LSTM with Mixtures of Probabilistic PCA
Simon Woo, Shahroz Tariq, Sangyup Lee, Youjin Shin, Myeong Shin Lee, Okchul Jung and Daewon Chung

Detection of Review Abuse via Semi-Supervised Binary Multi-Target Tensor Decomposition link
Anil Yelundur, Vineet Chaoji and Bamdev Mishra

Product reviews and ratings on e-commerce websites provide customers with detailed insights about various aspects of the product such as quality, usefulness, etc. Since they influence customers' buying decisions, product reviews have become a fertile ground for abuse by sellers (colluding with reviewers) to promote their own products or to tarnish the reputation of competitor's products. In this paper, our focus is on detecting such abusive entities (both sellers and reviewers) by applying tensor decomposition on the product reviews data. While tensor decomposition is mostly unsupervised, we formulate our problem as a semi-supervised binary multi-target tensor decomposition, to take advantage of currently known abusive entities. We empirically show that our multi-target semi-supervised model achieves higher precision and recall in detecting abusive entities as compared to unsupervised techniques. Finally, we show that our proposed stochastic partial natural gradient inference for our model empirically achieves faster convergence than stochastic gradient and Online-EM with sufficient statistics.

Developing Measures of Cognitive Impairment in the Real World from Consumer-Grade Multimodal Sensor Streams
Richard Chen, Filip Jankovic, Nikki Marinsek, Luca Foschini, Lampros Kourtis, Alessio Signorini, Melissa Pugh, Jie Shen, Roy Yaari, Vera Maljkovic, Marc Sunga, Han Hee Song, Hyun Joon Jung, Belle Tseng and Andrew Trister

Diagnosing Sample Ratio Mismatch in Online Controlled Experiments: A Taxonomy and Rules of Thumb for Practitioners
Aleksander Fabijan, Jayant Gupchup, Somit Gupta, Jeff Omhover, Wen Quin, Lukas Vermeer and Pavel Dmitriev

DuerQuiz: A Personalized Question Recommender System for Intelligent Job Interview
Chuan Qin, Hengshu Zhu, Chen Zhu, Tong Xu, Fuzhen Zhuang, Chao Ma, Jingshuai Zhang and Hui Xiong

Dynamic Pricing for Airline Ancillaries with Customer Context link
Naman Shukla, Arinbjörn Kolbeinsson, Ken Otwell, Lavanya Marla and Kartik Yellepeddi

Ancillaries have become a major source of revenue and profitability in the travel industry. Yet, conventional pricing strategies are based on business rules that are poorly optimized and do not respond to changing market conditions. This paper describes the dynamic pricing model developed by Deepair solutions, an AI technology provider for travel suppliers. We present a pricing model that provides dynamic pricing recommendations specific to each customer interaction and optimizes expected revenue per customer. The unique nature of personalized pricing provides the opportunity to search over the market space to find the optimal price-point of each ancillary for each customer, without violating customer privacy. In this paper, we present and compare three approaches for dynamic pricing of ancillaries, with increasing levels of sophistication: (1) a two-stage forecasting and optimization model using a logistic mapping function; (2) a two-stage model that uses a deep neural network for forecasting, coupled with a revenue maximization technique using discrete exhaustive search; (3) a single-stage end-to-end deep neural network that recommends the optimal price. We describe the performance of these models based on both offline and online evaluations. We also measure the real-world business impact of these approaches by deploying them in an A/B test on an airline's internet booking website. We show that traditional machine learning techniques outperform human rule-based approaches in an online setting by improving conversion by 36% and revenue per offer by 10%. We also provide results for our offline experiments which show that deep learning algorithms outperform traditional machine learning techniques for this problem. Our end-to-end deep learning model is currently being deployed by the airline in their booking system.

E.T.-RNN: Applying Deep Learning to Credit Loan Applications
Alexander Tuzhilin, Dmitri Babaev, Dmitri Umerenkov and Maxim Savchenko

Enabling Onboard Detection of Events of Scientific Interest for the Europa Clipper Spacecraft
Kiri Wagstaff, Gary Doran, Ashley Davies, Saadat Anwar, Srija Chakraborty, Marissa Cameron, Ingrid Daubar and Cynthia Phillips

Estimating Cellular Goals from High-Dimensional Biological Data link
Laurence Yang, Michael A. Saunders, Jean-Christophe Lachance, Bernhard O. Palsson and José Bento

Optimization-based models have been used to predict cellular behavior for over 25 years. The constraints in these models are derived from genome annotations, measured macro-molecular composition of cells, and by measuring the cell's growth rate and metabolism in different conditions. The cellular goal (the optimization problem that the cell is trying to solve) can be challenging to derive experimentally for many organisms, including human or mammalian cells, which have complex metabolic capabilities and are not well understood. Existing approaches to learning goals from data include (a) estimating a linear objective function, or (b) estimating linear constraints that model complex biochemical reactions and constrain the cell's operation. The latter approach is important because often the known/observed biochemical reactions are not enough to explain observations, and hence there is a need to extend automatically the model complexity by learning new chemical reactions. However, this leads to nonconvex optimization problems, and existing tools cannot scale to realistically large metabolic models. Hence, constraint estimation is still used sparingly despite its benefits for modeling cell metabolism, which is important for developing novel antimicrobials against pathogens, discovering cancer drug targets, and producing value-added chemicals. Here, we develop the first approach to estimating constraint reactions from data that can scale to realistically large metabolic models. Previous tools have been used on problems having less than 75 biochemical reactions and 60 metabolites, which limits real-life-size applications. We perform extensive experiments using 75 large-scale metabolic network models for different organisms (including bacteria, yeasts, and mammals) and show that our algorithm can recover cellular constraint reactions, even when some measurements are missing.

Fairness in Recommendation Ranking through Pairwise Comparisons link
Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi and Cristos Goodrow

Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information. As such it is important to ask: what are the possible fairness risks, how can we quantify them, and how should we address them? In this paper we offer a set of novel metrics for evaluating algorithmic fairness concerns in recommender systems. In particular we show how measuring fairness based on pairwise comparisons from randomized experiments provides a tractable means to reason about fairness in rankings from recommender systems. Building on this metric, we offer a new regularizer to encourage improving this metric during model training and thus improve fairness in the resulting rankings. We apply this pairwise regularization to a large-scale, production recommender system and show that we are able to significantly improve the system's pairwise fairness.

Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search
Sahin Geyik, Stuart Ambler and Krishnaram Kenthapadi

FDML: A Collaborative Machine Learning Framework for Distributed Features
Yaochen Hu, Di Niu, Jianming Yang and Shengping Zhou

Feedback Shaping: A Modeling Approach to Nurture Content Creation
Ye Tu, Chun Lo, Yiping Yuan and Shaunak Chatterjee

Finding Users Who Act Alike: Transfer Learning for Expanding Advertiser Audiences
Stephanie Dewet and Jiafan Ou

FoodAI: Food Image Recognition via Deep Learning for Smart Food Logging
Doyen Sahoo, Hao Wang, Shu Ke, Xiongwei Wu, Hung Le, Palakorn Achananuparp, Ee-Peng Lim and Steven Hoi

Generating Better Search Engine Text Advertisements with Deep Reinforcement Learning
John Hughes, Keng-Hao Chang and Ruofei Zhang

Glaucoma Progression Prediction Using Retinal Thickness via Latent Space Linear Regression
Yuhui Zheng, Linchuan Xu, Taichi Kiwaki, Jing Wang, Hiroshi Murata, Ryo Asaoka and Kenji Yamanishi

Gmail Smart Compose: Real-Time Assisted Writing link
Mia Xu Chen, Benjamin N Lee, Gagan Bansal, Yuan Cao, Shuyuan Zhang, Justin Lu, Jackie Tsay, Yinan Wang, Andrew M. Dai, Zhifeng Chen, Timothy Sohn and Yonghui Wu

In this paper, we present Smart Compose, a novel system for generating interactive, real-time suggestions in Gmail that assists users in writing mails by reducing repetitive typing. In the design and deployment of such a large-scale and complicated system, we faced several challenges including model selection, performance evaluation, serving and other practical issues. At the core of Smart Compose is a large-scale neural language model. We leveraged state-of-the-art machine learning techniques for language model training which enabled high-quality suggestion prediction, and constructed novel serving infrastructure for high-throughput and real-time inference. Experimental results show the effectiveness of our proposed system design and deployment approach. This system is currently being served in Gmail.

Hard to Park? Estimating Parking Difficulty at Scale
Andrew Tomkins, Ravi Kumar, Neha Arora, James Cook, Ivan Kuznetsov, Yechen Li, Huai-Jen Liang, Andrew Miller, Iveel Tsogsuren and Yi Wang

How to Invest my Time: Lessons from HITL Entity Extraction
Shanshan Zhang, Lihong He, Eduard Dragut and Slobodan Vucetic

Hydra: A Personalized and Context-Aware Multi-Modal Transportation Recommendation System
Hao Liu, Yongxin Tong, Panpan Zhang, Xinjiang Lu, Jianguo Duan and Hui Xiong

Improving Subseasonal Forecasting in the Western U.S. with Machine Learning link
Jessica Hwang, Paulo Orenstein, Judah Cohen, Karl Pfeiffer and Lester Mackey

Water managers in the western United States (U.S.) rely on longterm forecasts of temperature and precipitation to prepare for droughts and other wet weather extremes. To improve the accuracy of these longterm forecasts, the U.S. Bureau of Reclamation and the National Oceanic and Atmospheric Administration (NOAA) launched the Subseasonal Climate Forecast Rodeo, a year-long real-time forecasting challenge in which participants aimed to skillfully predict temperature and precipitation in the western U.S. two to four weeks and four to six weeks in advance. Here we present and evaluate our machine learning approach to the Rodeo and release our SubseasonalRodeo dataset, collected to train and evaluate our forecasting system. Our system is an ensemble of two regression models. The first integrates the diverse collection of meteorological measurements and dynamic model forecasts in the SubseasonalRodeo dataset and prunes irrelevant predictors using a customized multitask model selection procedure. The second uses only historical measurements of the target variable (temperature or precipitation) and introduces multitask nearest neighbor features into a weighted local linear regression. Each model alone is significantly more accurate than the debiased operational U.S. Climate Forecasting System (CFSv2), and our ensemble skill exceeds that of the top Rodeo competitor for each target variable and forecast horizon. Moreover, over 2011-2018, an ensemble of our regression models and debiased CFSv2 improves debiased CFSv2 skill by 40-50% for temperature and 129-169% for precipitation. We hope that both our dataset and our methods will help to advance the state of the art in subseasonal forecasting.

Infer Implicit Contexts in Real-time Online-to-Offline Recommendation
Xichen Ding, Jie Tang, Tracy Liu, Cheng Xu, Yaping Zhang, Feng Shi, Qixia Jiang and Dan Shen

IntentGC: a Scalable Graph Convolution Framework Fusing Heterogeneous Information for Recommendation
Jun Zhao, Zhou Zhou, Ziyu Guan, Wei Zhao, Ning Wei, Guang Qiu and Xiaofei He

Internal Promotion Optimization
Rupesh Gupta, Guangde Chen and Shipeng Yu

Investigate Transitions into Drug Addiction through Text Mining of Reddit Data
John Lu, Sumati Sridhar, Ritika Pandey, Mohammad Hasan and George Mohler

Investment Behaviors Can Tell What Inside: Exploring Stock Intrinsic Properties for Stock Trend Prediction
Chi Chen, Li Zhao, Jiang Bian, Chunxiao Xing and Tie-Yan Liu

IRNet: A General Purpose Deep Residual Regression Framework For Materials Discovery
Dipendra Jha, Logan Ward, Zijiang Yang, Christopher Wolverton, Ian Foster, Wei-Keng Liao, Alok Choudhary and Ankit Agrawal

Large-scale User Visits Understanding and Forecasting with Deep Spatial-Temporal Tensor Factorization Framework
Xiaoyang Ma, Lan Zhang, Lan Xu, Zhicheng Liu, Ge Chen, Zhili Xiao, Yang Wang and Zhengtao Wu

Learning a Unified Embedding for Visual Search at Pinterest
Andrew Zhai, Hao-Yu Wu, Eric Tzeng, Dong Huk Park and Charles Rosenberg

Learning Sleep Quality from Daily Logs
Sungkyu Park, Cheng-Te Li, Sungwon Han, Cheng Hsu, Sang Won Lee and Meeyoung Cha

Learning to Prescribe Interventions for Tuberculosis Patients using Digital Adherence Data
Jackson Killian, Bryan Wilder, Amit Sharma, Vinod Choudhary, Bistra Dilkina and Milind Tambe

LightNet: A Dual Spatiotemporal Encoder Network Model for Lightning Prediction
Yangli-Ao Geng, Qingyong Li, Tianyang Lin, Lei Jiang, Liangtao Xu, Dong Zheng, Wen Yao, Yijun Zhang and Weitao Lyu

Machine Learning at Microsoft with ML.NET
Zeeshan Ahmed, Saeed Amizadeh, Mikhail Bilenko, Rogan Carr, Wei-Sheng Chin, Yael Dekel, Xavier Dupre, Vadim Eksarevskiy, Senja Filipi, Tom Finley, Abhishek Goswami, Monte Hoover, Scott Inglis, Matteo Interlandi, Najeeb Kazmi, Gleb Krivosheev, Pete Luferenko, Ivan Matantsev, Sergiy Matusevych, Shahab Moradi, Gani Nazirov, Justin Ormont, Gal Oshri, Artidoro Pagnoni, Jignesh Parmar, Prabhat Roy, Zeeshan Siddiqui, Markus Weimer, Shauheen Zahirazami and Yiwen Zhu

Mathematical Notions vs. Human Perception of Fairness: A Descriptive Approach to Fairness for Machine Learning link
Megha Srivastava, Hoda Heidari and Andreas Krause

Fairness for Machine Learning has received considerable attention, recently. Various mathematical formulations of fairness have been proposed, and it has been shown that it is impossible to satisfy all of them simultaneously. The literature so far has dealt with these impossibility results by quantifying the tradeoffs between different formulations of fairness. Our work takes a different perspective on this issue. Rather than requiring all notions of fairness to (partially) hold at the same time, we ask which one of them is the most appropriate given the societal domain in which the decision-making model is to be deployed. We take a descriptive approach and set out to identify the notion of fairness that best captures lay people's perception of fairness. We run adaptive experiments designed to pinpoint the most compatible notion of fairness with each participant's choices through a small number of tests. Perhaps surprisingly, we find that the most simplistic mathematical definition of fairness---namely, demographic parity---most closely matches people's idea of fairness in two distinct application scenarios. This remains the case even when we explicitly tell the participants about the alternative, more complicated definitions of fairness and we reduce the cognitive burden of evaluating those notions for them. Our findings have important implications for the Fair ML literature and the discourse on formalizing algorithmic fairness.

MediaRank: Compuational Ranking of Online News Sources
Junting Ye and Steven Skiena

Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation
Shaohua Fan, Junxiong Zhu, Xiaotian Han, Chuan Shi, Linmei Hu, Biyu Ma and Yongliang Li

MetaPred: Meta-Learning for Clinical Risk Prediction with Limited Patient Electronic Health Records link
Xi Zhang, Andy Tang, Hiroko Dodge, Jiayu Zhou and Fei Wang

In recent years, increasingly augmentation of health data, such as patient Electronic Health Records (EHR), are becoming readily available. This provides an unprecedented opportunity for knowledge discovery and data mining algorithms to dig insights from them, which can, later on, be helpful to the improvement of the quality of care delivery. Predictive modeling of clinical risk, including in-hospital mortality, hospital readmission, chronic disease onset, condition exacerbation, etc., from patient EHR, is one of the health data analytic problems that attract most of the interests. The reason is not only because the problem is important in clinical settings, but also there are challenges working with EHR such as sparsity, irregularity, temporality, etc. Different from applications in other domains such as computer vision and natural language processing, the labeled data samples in medicine (patients) are relatively limited, which creates lots of troubles for effective predictive model learning, especially for complicated models such as deep learning. In this paper, we propose MetaPred, a meta-learning for clinical risk prediction from longitudinal patient EHRs. In particular, in order to predict the target risk where there are limited data samples, we train a meta-learner from a set of related risk prediction tasks which learns how a good predictor is learned. The meta-learned can then be directly used in target risk prediction, and the limited available samples can be used for further fine-tuning the model performance. The effectiveness of MetaPred is tested on a real patient EHR repository from Oregon Health & Science University. We are able to demonstrate that with CNN and RNN as base predictors, MetaPred can achieve much better performance for predicting target risk with low resources comparing with the predictor trained on the limited samples available for this risk.

MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu's Sponsored Search
Miao Fan, Jiacheng Guo, Shuai Zhu, Shuo Miao, Mingming Sun and Ping Li

MSURU: Large Scale E-commerce Image Classification With Weakly Supervised Search Data
Yina Tang, Fedor Borisyuk, Siddarth Malreddy, Yixuan Li, Yiqun Liu and Sergey Kirshner

Multi-Horizon Time Series Forecasting with Temporal Attention Learning
Chenyou Fan, Yuze Zhang, Yi Pan, Xiaoyue Li, Chi Zhang, Rong Yuan, Di Wu, Wensheng Wang, Jian Pei and Heng Huang

MVAN: Multi-view Attention Networks for Real Money Trading Detection in Online Games
Jianrong Tao, Jianshi Lin, Shize Zhang, Sha Zhao, Runze Wu, Changjie Fan and Peng Cui

Naranjo Question Answering using End-to-End Multi-task Learning Model
Bhanu Pratap Singh Rawat, Fei Li and Hong Yu

Nonparametric Mixture of Sparse Regressions on Spatio-Temporal Data -- An Application to Climate Prediction
Yumin Liu, Junxiang Chen, Auroop Ganguly and Jennifer Dy

Nostalgin: Extracting 3D City Models from Historical Image Data link
Amol Kapoor, Hunter Larco and Raimondas Kiveris

What did it feel like to walk through a city from the past? In this work, we describe Nostalgin (Nostalgia Engine), a method that can faithfully reconstruct cities from historical images. Unlike existing work in city reconstruction, we focus on the task of reconstructing 3D cities from historical images. Working with historical image data is substantially more difficult, as there are significantly fewer buildings available and the details of the camera parameters which captured the images are unknown. Nostalgin can generate a city model even if there is only a single image per facade, regardless of viewpoint or occlusions. To achieve this, our novel architecture combines image segmentation, rectification, and inpainting. We motivate our design decisions with experimental analysis of individual components of our pipeline, and show that we can improve on baselines in both speed and visual realism. We demonstrate the efficacy of our pipeline by recreating two 1940s Manhattan city blocks. We aim to deploy Nostalgin as an open source platform where users can generate immersive historical experiences from their own photos.

NPA: Neural News Recommendation with Personalized Attention
Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang and Xing Xie

OAG: Toward Linking Large-scale Heterogeneous Entity Graphs
Fanjin Zhang, Xiao Liu, Jie Tang, Yuxiao Dong, Peiran Yao, Jie Zhang, Xiaotao Gu, Yan Wang, Bin Shao, Rui Li and Kuansan Wang

OCC: A Smart Reply System for Efficient In-App Communications
Yue Weng, Huaixiu Zheng, Franziska Bell and Gokhan Tur

Online Amnestic DTW to allow Real-Time Golden Batch Monitoring
Chin-Chia Michael Yeh, Yan Zhu, Hoang Anh Dau, Amirali Darvishzadeh, Mikhail Noskov and Eamonn Keogh

Online Purchase Prediction via Multi-Scale Modeling of Behavior Dynamics
Chao Huang, Xian Wu, Xuchao Zhang, Chuxu Zhang, Jiashu Zhao, Dawei Yin and Nitesh Chawla

Optuna: A Next-generation Hyperparameter Optimization Framework
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta and Masanori Koyama

Personalized Attraction Enhanced Sponsored Search with Multi-task Learning
Wei Zhao, Boxuan Zhang, Beidou Wang, Ziyu Guan, Wanxian Guan, Guang Qiu, Wei Ning, Jiming Chen and Hongmin Liu

Personalized Purchase Prediction of Market Baskets with Wasserstein-Based Sequence Matching link
Mathias Kraus and Stefan Feuerriegel

Personalization in marketing aims at improving the shopping experience of customers by tailoring services to individuals. In order to achieve this, businesses must be able to make personalized predictions regarding the next purchase. That is, one must forecast the exact list of items that will comprise the next purchase, i.e., the so-called market basket. Despite its relevance to firm operations, this problem has received surprisingly little attention in prior research, largely due to its inherent complexity. In fact, state-of-the-art approaches are limited to intuitive decision rules for pattern extraction. However, the simplicity of the pre-coded rules impedes performance, since decision rules operate in an autoregressive fashion: the rules can only make inferences from past purchases of a single customer without taking into account the knowledge transfer that takes place between customers. In contrast, our research overcomes the limitations of pre-set rules by contributing a novel predictor of market baskets from sequential purchase histories: our predictions are based on similarity matching in order to identify similar purchase habits among the complete shopping histories of all customers. Our contributions are as follows: (1) We propose similarity matching based on subsequential dynamic time warping (SDTW) as a novel predictor of market baskets. Thereby, we can effectively identify cross-customer patterns. (2) We leverage the Wasserstein distance for measuring the similarity among embedded purchase histories. (3) We develop a fast approximation algorithm for computing a lower bound of the Wasserstein distance in our setting. An extensive series of computational experiments demonstrates the effectiveness of our approach. The accuracy of identifying the exact market baskets based on state-of-the-art decision rules from the literature is outperformed by a factor of 4.0.

PinText: A Multitask Text Embedding System in Pinterest
Jinfeng Zhuang and Yu Liu

POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion link
Wen Chen, Pipei Huang, Jiaming Xu, Xin Guo, Cheng Guo, Fei Sun, Chao Li, Andreas Pfadler, Huan Zhao and Binqiang Zhao

Increasing demand for fashion recommendation raises a lot of challenges for online shopping platforms and fashion communities. In particular, there exist two requirements for fashion outfit recommendation: the Compatibility of the generated fashion outfits, and the Personalization in the recommendation process. In this paper, we demonstrate these two requirements can be satisfied via building a bridge between outfit generation and recommendation. Through large data analysis, we observe that people have similar tastes in individual items and outfits. Therefore, we propose a Personalized Outfit Generation (POG) model, which connects user preferences regarding individual items and outfits with Transformer architecture. Extensive offline and online experiments provide strong quantitative evidence that our method outperforms alternative methods regarding both compatibility and personalization metrics. Furthermore, we deploy POG on a platform named Dida in Alibaba to generate personalized outfits for the users of the online application iFashion. This work represents a first step towards an industrial-scale fashion outfit generation and recommendation solution, which goes beyond generating outfits based on explicit queries, or merely recommending from existing outfit pools. As part of this work, we release a large-scale dataset consisting of 1.01 million outfits with rich context information, and 0.28 billion user click actions from 3.57 million users. To the best of our knowledge, this dataset is the largest, publicly available, fashion related dataset, and the first to provide user behaviors relating to both outfits and fashion items.

Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction link
Qi Pi, Weijie Bian, Guorui Zhou, Xiaoqiang Zhu and Kun Gai

Click-through rate (CTR) prediction is critical for industrial applications such as recommender system and online advertising. Practically, it plays an important role for CTR modeling in these applications by mining user interest from rich historical behavior data. Driven by the development of deep learning, deep CTR models with ingeniously designed architecture for user interest modeling have been proposed, bringing remarkable improvement of model performance over offline metric.However, great efforts are needed to deploy these complex models to online serving system for realtime inference, facing massive traffic request. Things turn to be more difficult when it comes to long sequential user behavior data, as the system latency and storage cost increase approximately linearly with the length of user behavior sequence. In this paper, we face directly the challenge of long sequential user behavior modeling and introduce our hands-on practice with the co-design of machine learning algorithm and online serving system for CTR prediction task. Theoretically, the co-design solution of UIC and MIMN enables us to handle the user interest modeling with unlimited length of sequential behavior data. Comparison between model performance and system efficiency proves the effectiveness of proposed solution. To our knowledge, this is one of the first industrial solutions that are capable of handling long sequential user behavior data with length scaling up to thousands. It now has been deployed in the display advertising system in Alibaba.

Precipitation nowcasting with satellite imagery
Vadim Lebedev, Vladimir Ivashkin, Irina Rudenko, Alexander Ganshin, Ivan Bushmarinov, Alexander Molchanov, Sergey Ovcharenko, Ruslan Grokhovetskiy and Dmitry Solomentsev

Predicting Different Types of Conversions with Multi-Task Learning in Online Advertising
Junwei Pan, Yizhi Mao, Alfonso Lobos Ruiz, Yu Sun and Aaron Flores

Predicting Economic Development using Geolocated Wikipedia Articles link
Evan Sheehan, Chenlin Meng, Matthew Tan, Burak Uzkent, Neal Jean, David Lobell, Marshall Burke and Stefano Ermon

Progress on the UN Sustainable Development Goals (SDGs) is hampered by a persistent lack of data regarding key social, environmental, and economic indicators, particularly in developing countries. For example, data on poverty --- the first of seventeen SDGs --- is both spatially sparse and infrequently collected in Sub-Saharan Africa due to the high cost of surveys. Here we propose a novel method for estimating socioeconomic indicators using open-source, geolocated textual information from Wikipedia articles. We demonstrate that modern NLP techniques can be used to predict community-level asset wealth and education outcomes using nearby geolocated Wikipedia articles. When paired with nightlights satellite imagery, our method outperforms all previously published benchmarks for this prediction task, indicating the potential of Wikipedia to inform both research in the social sciences and future policy decisions.

Predicting Evacuation Decisions using Representations of Individuals' Pre-Disaster Web Search Behavior link
Takahiro Yabe, Kota Tsubouchi, Toru Shimizu, Yoshihide Sekimoto and Satish Ukkusuri

Predicting the evacuation decisions of individuals before the disaster strikes is crucial for planning first response strategies. In addition to the studies on post-disaster analysis of evacuation behavior, there are various works that attempt to predict the evacuation decisions beforehand. Most of these predictive methods, however, require real time location data for calibration, which are becoming much harder to obtain due to the rising privacy concerns. Meanwhile, web search queries of anonymous users have been collected by web companies. Although such data raise less privacy concerns, they have been under-utilized for various applications. In this study, we investigate whether web search data observed prior to the disaster can be used to predict the evacuation decisions. More specifically, we utilize a "session-based query encoder" that learns the representations of each user's web search behavior prior to evacuation. Our proposed approach is empirically tested using web search data collected from users affected by a major flood in Japan. Results are validated using location data collected from mobile phones of the same set of users as ground truth. We show that evacuation decisions can be accurately predicted (84%) using only the users' pre-disaster web search data as input. This study proposes an alternative method for evacuation prediction that does not require highly sensitive location data, which can assist local governments to prepare effective first response strategies.

Probabilistic Latent Variable Modeling for Assessing Behavioral Influences on Well-Being
Ehimwenma Nosakhare and Rosalind Picard

Pythia: AI assisted code completion system
Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu and Neel Sundaresan

Raise to speak: an accurate, low-power detector for activating voice assistants on smartwatches
Shiwen Zhao, Brandt Westing, Shawn Scully, Heri Nieto, Roman Holenstein, Minwoo Jeong, Krishna Sridhar, Brandon Newendorp, Mike Bastian, Sethu Raman, Tim Paek, Kevin Lynch and Carlos Guestrin

Randomized Experimental Design via Geographic Clustering link
David Rolnick, Kevin Aydin, Jean Pouget-Abadie, Shahab Kamali, Vahab Mirrokni and Amir Najmi

Web-based services often run randomized experiments to improve their products. A popular way to run these experiments is to use geographical regions as units of experimentation, since this does not require tracking of individual users or browser cookies. Since users may issue queries from multiple geographical locations, geo-regions cannot be considered independent and interference may be present in the experiment. In this paper, we study this problem, and first present GeoCUTS, a novel algorithm that forms geographical clusters to minimize interference while preserving balance in cluster size. We use a random sample of anonymized traffic from Google Search to form a graph representing user movements, then construct a geographically coherent clustering of the graph. Our main technical contribution is a statistical framework to measure the effectiveness of clusterings. Furthermore, we perform empirical evaluations showing that the performance of GeoCUTS is comparable to hand-crafted geo-regions with respect to both novel and existing metrics.

Ranking in Genealogy: Search Results Fusion at Ancestry link
Peng Jiang, Yingrui Yang, Gann Bierner, Fengjie Alex Li, Ruhan Wang and Azadeh Moghtaderi

Genealogy research is the study of family history using available resources such as historical records. Ancestry provides its customers with one of the world's largest online genealogical index with billions of records from a wide range of sources, including vital records such as birth and death certificates, census records, court and probate records among many others. Search at Ancestry aims to return relevant records from various record types, allowing our subscribers to build their family trees, research their family history, and make meaningful discoveries about their ancestors from diverse perspectives. In a modern search engine designed for genealogical study, the appropriate ranking of search results to provide highly relevant information represents a daunting challenge. In particular, the disparity in historical records makes it inherently difficult to score records in an equitable fashion. Herein, we provide an overview of our solutions to overcome such record disparity problems in the Ancestry search engine. Specifically, we introduce customized coordinate ascent (customized CA) to speed up ranking within a specific record type. We then propose stochastic search (SS) that linearly combines ranked results federated across contents from various record types. Furthermore, we propose a novel information retrieval metric, normalized cumulative entropy (NCE), to measure the diversity of results. We demonstrate the effectiveness of these two algorithms in terms of relevance (by NDCG) and diversity (by NCE) if applicable in the offline experiments using real customer data at Ancestry.

Real-time Attention Based Look-alike Model for Recommender System
Yudan Liu, Kaikai Ge, Xu Zhang and Leyu Lin

Real-time Event Detection on Social Data Streams
Mateusz Fedoryszak, Brent Frederick, Vijay Rajaram and Changtao Zhong

Real-time On-Device Troubleshooting Recommendation for Smartphones
Keiichi Ochiai, Kohei Senkawa, Naoki Yamamoto, Yuya Tanaka and Yusuke Fukazawa

Real-World Product Deployment of Adaptive Push Notification Scheduling on Smartphones
Tadashi Okoshi, Kota Tsubouchi and Hideyuki Tokuda

Recurrent Neural Networks for Stochastic Control in Real-Time Bidding
Nicolas Grislain, Nicolas Perrin and Antoine Thabault

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems link
Lixin Zou, Long Xia, Zhuoye Ding, Song Jiaxing, Weidong Liu and Dawei Yin

Recommender systems play a crucial role in our daily lives. Feed streaming mechanism has been widely used in the recommender system, especially on the mobile Apps. The feed streaming setting provides users the interactive manner of recommendation in never-ending feeds. In such an interactive manner, a good recommender system should pay more attention to user stickiness, which is far beyond classical instant metrics, and typically measured by {\bf long-term user engagement}. Directly optimizing the long-term user engagement is a non-trivial problem, as the learning target is usually not available for conventional supervised learning methods. Though reinforcement learning~(RL) naturally fits the problem of maximizing the long term rewards, applying RL to optimize long-term user engagement is still facing challenges: user behaviors are versatile and difficult to model, which typically consists of both instant feedback~(\eg clicks, ordering) and delayed feedback~(\eg dwell time, revisit); in addition, performing effective off-policy learning is still immature, especially when combining bootstrapping and function approximation. To address these issues, in this work, we introduce a reinforcement learning framework --- FeedRec to optimize the long-term user engagement. FeedRec includes two components: 1)~a Q-Network which designed in hierarchical LSTM takes charge of modeling complex user behaviors, and 2)~an S-Network, which simulates the environment, assists the Q-Network and voids the instability of convergence in policy learning. Extensive experiments on synthetic data and a real-world large scale data show that FeedRec effectively optimizes the long-term user engagement and outperforms state-of-the-arts.

Reserve Price Failure Rate Prediction with Header Bidding in Display Advertising
Achir Kalra, Chong Wang, Cristian Borcea and Yi Chen

Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun and Dan Pei

Robust Gaussian Process Regression for Real-Time High Precision GPS Signal Enhancement link
Ming Lin, Xiaomin Song, Qi Qian, Hao Li, Liang Sun, Shenghuo Zhu and Rong Jin

Satellite-based positioning system such as GPS often suffers from large amount of noise that degrades the positioning accuracy dramatically especially in real-time applications. In this work, we consider a data-mining approach to enhance the GPS signal. We build a large-scale high precision GPS receiver grid system to collect real-time GPS signals for training. The Gaussian Process (GP) regression is chosen to model the vertical Total Electron Content (vTEC) distribution of the ionosphere of the Earth. Our experiments show that the noise in the real-time GPS signals often exceeds the breakdown point of the conventional robust regression methods resulting in sub-optimal system performance. We propose a three-step approach to address this challenge. In the first step we perform a set of signal validity tests to separate the signals into clean and dirty groups. In the second step, we train an initial model on the clean signals and then reweigting the dirty signals based on the residual error. A final model is retrained on both the clean signals and the reweighted dirty signals. In the theoretical analysis, we prove that the proposed three-step approach is able to tolerate much higher noise level than the vanilla robust regression methods if two reweighting rules are followed. We validate the superiority of the proposed method in our real-time high precision positioning system against several popular state-of-the-art robust regression methods. Our method achieves centimeter positioning accuracy in the benchmark region with probability $78.4\%$ , outperforming the second best baseline method by a margin of $8.3\%$. The benchmark takes 6 hours on 20,000 CPU cores or 14 years on a single CPU.

Sample Adaptive Multiple Kernel Learning for Failure Prediction of Railway Points link
Zhibin Li, Jian Zhang, Qiang Wu, Yongshun Gong, Jinfeng Yi and Christina Kirsch

Railway points are among the key components of railway infrastructure. As a part of signal equipment, points control the routes of trains at railway junctions, having a significant impact on the reliability, capacity, and punctuality of rail transport. Traditionally, maintenance of points is based on a fixed time interval or raised after the equipment failures. Instead, it would be of great value if we could forecast points' failures and take action beforehand, minimising any negative effect. To date, most of the existing prediction methods are either lab-based or relying on specially installed sensors which makes them infeasible for large-scale implementation. Besides, they often use data from only one source. We, therefore, explore a new way that integrates multi-source data which are ready to hand to fulfil this task. We conducted our case study based on Sydney Trains rail network which is an extensive network of passenger and freight railways. Unfortunately, the real-world data are usually incomplete due to various reasons, e.g., faults in the database, operational errors or transmission faults. Besides, railway points differ in their locations, types and some other properties, which means it is hard to use a unified model to predict their failures. Aiming at this challenging task, we firstly constructed a dataset from multiple sources and selected key features with the help of domain experts. In this paper, we formulate our prediction task as a multiple kernel learning problem with missing kernels. We present a robust multiple kernel learning algorithm for predicting points failures. Our model takes into account the missing pattern of data as well as the inherent variance on different sets of railway points. Extensive experiments demonstrate the superiority of our algorithm compared with other state-of-the-art methods.

Seasonal-adjustment based feature selection method for predicting epidemic with large-scale search engine logs
Quang Thien Tran and Jun Sakuma

Seeker: Real-Time Interactive Search
Ari Biswas, Thai Pham, Michael Vogelsong, Benjamin Snyder and Houssam Nassif

Sequence Multi-task Learning to Forecast Mental Wellbeing from Sparse Self-reported Data
Dimitris Spathis, Sandra Servia Rodríguez, Katayoun Farrahi, Cecilia Mascolo and Jason Rentfrow

Sequential Scenario-Specific Meta Learner for Online Recommendation link
Zhengxiao Du, Xiaowei Wang, Hongxia Yang, Jingren Zhou and Jie Tang

Cold-start problems are long-standing challenges for practical recommendations. Most existing recommendation algorithms rely on extensive observed data and are brittle to recommendation scenarios with few interactions. This paper addresses such problems using few-shot learning and meta learning. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. To accomplish this, we combine the scenario-specific learning with a model-agnostic sequential meta-learning and unify them into an integrated end-to-end framework, namely Scenario-specific Sequential Meta learner (or s^2 meta). By doing so, our meta-learner produces a generic initial model through aggregating contextual information from a variety of prediction tasks while effectively adapting to specific tasks by leveraging learning-to-learn knowledge. Extensive experiments on various real-world datasets demonstrate that our proposed model can achieve significant gains over the state-of-the-arts for cold-start problems in online recommendation. Deployment is at the Guess You Like session, the front page of the Mobile Taobao.

Short and Long-term Pattern Discovery Over Large-Scale Geo-Spatiotemporal Data link
Sobhan Moosavi, Mohammad Hossein Samavatian, Arnab Nandi, Srinivasan Parthasarathy and Rajiv Ramnath

Pattern discovery in geo-spatiotemporal data (such as traffic and weather data) is about finding patterns of collocation, co-occurrence, cascading, or cause and effect between geospatial entities. Using simplistic definitions of spatiotemporal neighborhood (a common characteristic of the existing general-purpose frameworks) is not semantically representative of geo-spatiotemporal data. We therefore introduce a new geo-spatiotemporal pattern discovery framework which defines a semantically correct definition of neighborhood; and then provides two capabilities, one to explore propagation patterns and the other to explore influential patterns. Propagation patterns reveal common cascading forms of geospatial entities in a region. Influential patterns demonstrate the impact of temporally long-term geospatial entities on their neighborhood. We apply this framework on a large dataset of traffic and weather data at countrywide scale, collected for the contiguous United States over two years. Our important findings include the identification of 90 common propagation patterns of traffic and weather entities (e.g., rain --> accident --> congestion), which results in identification of four categories of states within the US; and interesting influential patterns with respect to the "location", "duration", and "type" of long-term entities (e.g., a major construction --> more traffic incidents). These patterns and the categorization of the states provide useful insights on the driving habits and infrastructure characteristics of different regions in the US, and could be of significant value for applications such as urban planning and personalized insurance.

Shrinkage Estimators in Online Experiments link
Drew Dimmery, Eytan Bakshy and Jasjeet Sekhon

We develop and analyze empirical Bayes Stein-type estimators for use in the estimation of causal effects in large-scale online experiments. While online experiments are generally thought to be distinguished by their large sample size, we focus on the multiplicity of treatment groups. The typical analysis practice is to use simple differences-in-means (perhaps with covariate adjustment) as if all treatment arms were independent. In this work we develop consistent, small bias, shrinkage estimators for this setting. In addition to achieving lower mean squared error these estimators retain important frequentist properties such as coverage under most reasonable scenarios. Modern sequential methods of experimentation and optimization such as multi-armed bandit optimization (where treatment allocations adapt over time to prior responses) benefit from the use of our shrinkage estimators. Exploration under empirical Bayes focuses more efficiently on near-optimal arms, improving the resulting decisions made under uncertainty. We demonstrate these properties by examining seventeen large-scale experiments conducted on Facebook from April to June 2017.

Smart Roles: Inferring Professional Roles in Email Networks
Di Jin, Mark Heimann, Tara Safavi, Mengdi Wang, Wei Lee, Lindsay Snider and Danai Koutra

SMOILE: A Shopper Marketing Optimization and Inverse Learning Engine
Abhilash Reddy Chenreddy, Parshan Pakiman, Selvaprabu Nadarajah, Ranganathan Chandrasekaran and Rick Abens

Social Skill Validation at LinkedIn
Xiao Yan, Jaewon Yang, Mikhail Obukhov, Lin Zhu, Joey Bai, Shiqi Wu and Qi He

Structured Noise Detection: Application on Well Test Pressure Derivative Data
Farhan Asif Chowdhury, Satomi Suzuki and Abdullah Mueen

Temporal Probabilistic Profiles for Sepsis Prediction in the ICU
Eitam Sheetrit, Nir Nissim, Denis Klimov and Yuval Shahar

TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank
Rama Kumar Pasumarthi, Sebastian Bruch, Xuanhui Wang, Cheng Li, Michael Bendersky, Marc Najork, Jan Pfeifer, Nadav Golbandi, Rohan Anil and Stephan Wolf

The Error is the Feature: How to Forecast Lightning using a Model Prediction Error
Christian Schön, Jens Dittrich and Richard Müller

The Identification and Estimation of Direct and Indirect Effects in Online A/B Tests through Causal Mediation Analysis
Xuan Yin and Liangjie Hong

The Secret Lives of Names? Name Embeddings from Social Media link
Junting Ye and Steven Skiena

Your name tells a lot about you: your gender, ethnicity and so on. It has been shown that name embeddings are more effective in representing names than traditional substring features. However, our previous name embedding model is trained on private email data and are not publicly accessible. In this paper, we explore learning name embeddings from public Twitter data. We argue that Twitter embeddings have two key advantages: \textit{(i)} they can and will be publicly released to support research community. \textit{(ii)} even with a smaller training corpus, Twitter embeddings achieve similar performances on multiple tasks comparing to email embeddings. As a test case to show the power of name embeddings, we investigate the modeling of lifespans. We find it interesting that adding name embeddings can further improve the performances of models using demographic features, which are traditionally used for lifespan modeling. Through residual analysis, we observe that fine-grained groups (potentially reflecting socioeconomic status) are the latent contributing factors encoded in name embeddings. These were previously hidden to demographic models, and may help to enhance the predictive power of a wide class of research studies.

Time-Series Anomaly Detection Service at Microsoft link
Hansheng Ren, Bixiong Xu, Yujing Wang, Chao Yi, Congrui Huang, Tony Xing, Xiaoyu Kou, Mao Yang and Jie Tong

Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which is designed to be accurate, efficient and general. The pipeline consists of three major modules, including data ingestion, experimentation platform and online compute. To tackle the problem of time-series anomaly detection, we propose a novel algorithm based on Spectral Residual (SR) and Convolutional Neural Network (CNN). Our work is the first attempt to borrow the SR model from visual saliency detection domain to time-series anomaly detection. Moreover, we innovatively combine SR and CNN together to improve the performance of SR model. Our approach achieves superior experimental results compared with state-of-the-art baselines on both public datasets and Microsoft production data.

Topic-Enhanced Memory Networks for Personalised Point-of-Interest Recommendation link
Xiao Zhou, Cecilia Mascolo and Zhongxiang Zhao

Point-of-Interest (POI) recommender systems play a vital role in people's lives by recommending unexplored POIs to users and have drawn extensive attention from both academia and industry. Despite their value, however, they still suffer from the challenges of capturing complicated user preferences and fine-grained user-POI relationship for spatio-temporal sensitive POI recommendation. Existing recommendation algorithms, including both shallow and deep approaches, usually embed the visiting records of a user into a single latent vector to model user preferences: this has limited power of representation and interpretability. In this paper, we propose a novel topic-enhanced memory network (TEMN), a deep architecture to integrate the topic model and memory network capitalising on the strengths of both the global structure of latent patterns and local neighbourhood-based features in a nonlinear fashion. We further incorporate a geographical module to exploit user-specific spatial preference and POI-specific spatial influence to enhance recommendations. The proposed unified hybrid model is widely applicable to various POI recommendation scenarios. Extensive experiments on real-world WeChat datasets demonstrate its effectiveness (improvement ratio of 3.25% and 29.95% for context-aware and sequential recommendation, respectively). Also, qualitative analysis of the attention weights and topic modeling provides insight into the model's recommendation process and results.

Towards Identifying Impacted Users in Cellular Services
Shobha Venkataraman and Jia Wang

Towards Knowledge-Based Personalized Product Description Generation in E-commerce link
Qibin Chen, Junyang Lin, Yichang Zhang, Hongxia Yang, Jingren Zhou and Jie Tang

Quality product descriptions are critical for providing competitive customer experience in an e-commerce platform. An accurate and attractive description not only helps customers make an informed decision but also improves the likelihood of purchase. However, crafting a successful product description is tedious and highly time-consuming. Due to its importance, automating the product description generation has attracted considerable interests from both research and industrial communities. Existing methods mainly use templates or statistical methods, and their performance could be rather limited. In this paper, we explore a new way to generate the personalized product description by combining the power of neural networks and knowledge base. Specifically, we propose a KnOwledge Based pErsonalized (or KOBE) product description generation model in the context of e-commerce. In KOBE, we extend the encoder-decoder framework, the Transformer, to a sequence modeling formulation using self-attention. In order to make the description both informative and personalized, KOBE considers a variety of important factors during text generation, including product aspects, user categories, and knowledge base, etc. Experiments on real-world datasets demonstrate that the proposed method out-performs the baseline on various metrics. KOBE can achieve an improvement of 9.7% over state-of-the-arts in terms of BLEU. We also present several case studies as the anecdotal evidence to further prove the effectiveness of the proposed approach. The framework has been deployed in Taobao, the largest online e-commerce platform in China.

Towards sustainable dairy management - a machine learning enhanced method for estrus detection
Kevin Fauvel, Véronique Masson, Élisa Fromont, Philippe Faverdin and Alexandre Termier

TrajGuard: A Comprehensive Trajectory Copyright Protection Scheme
Zheyi Pan, Jie Bao, Weinan Zhang, Yong Yu and Yu Zheng

TV Advertisement Scheduling by Learning Expert Intentions
Yasuhisa Suzuki, Wemer Wee and Itaru Nishioka

Two-Sided Fairness for Repeated Matchings in Two-Sided Markets: A Case Study of a Ride-Hailing Platform
Tom Sühr, Asia J. Biega, Meike Zehlike, Krishna P. Gummadi and Abhijnan Chakraborty

Uncovering the Co-driven Mechanism of Social and Content Links in User Churn Phenomena
Yunfei Lu, Linyun Yu, Peng Cui, Chengxi Zang, Renzhe Xu, Yihao Liu, Lei Li and Wenwu Zhu

Understanding Consumer Journey using Attention based Recurrent Neural Networks
Yichao Zhou, Shaunak Mishra, Jelena Gligorijevic, Tarun Bhatia and Narayan Bhamidipati

Understanding the Role of Style in E-commerce Shopping
Hao Jiang, Aakash Sabharwal, Adam Henderson, Diane Hu and Liangjie Hong

Unsupervised Clinical Language Translation link
Wei-Hung Weng, Yu-An Chung and Peter Szolovits

As patients' access to their doctors' clinical notes becomes common, translating professional, clinical jargon to layperson-understandable language is essential to improve patient-clinician communication. Such translation yields better clinical outcomes by enhancing patients' understanding of their own health conditions, and thus improving patients' involvement in their own care. Existing research has used dictionary-based word replacement or definition insertion to approach the need. However, these methods are limited by expert curation, which is hard to scale and has trouble generalizing to unseen datasets that do not share an overlapping vocabulary. In contrast, we approach the clinical word and sentence translation problem in a completely unsupervised manner. We show that a framework using representation learning, bilingual dictionary induction and statistical machine translation yields the best precision at 10 of 0.827 on professional-to-consumer word translation, and mean opinion scores of 4.10 and 4.28 out of 5 for clinical correctness and layperson readability, respectively, on sentence translation. Our fully-unsupervised strategy overcomes the curation problem, and the clinically meaningful evaluation reduces biases from inappropriate evaluators, which are critical in clinical machine learning.

UrbanFM: Inferring Fine-Grained Urban Flows
Yuxuan Liang, Kun Ouyang, Lin Jing, Sijie Ruan, Ye Liu, Junbo Zhang, David Rosenblum and Yu Zheng

Using Twitter to Predict When Vulnerabilities will be Exploited
Haipeng Chen, Rui Liu, Noseong Park and V.S. Subrahmanian

Whole Page Optimization with Global Constraints
Weicong Ding, Dinesh Govindaraj and S V N Vishwanathan


Back-to-top