Support Vector Machines On Distributed Computers

1452 Words May 5th, 2016 6 Pages
PSVM: Parallelizing Support Vector Machines on Distributed Computers
Edward Y. Chang∗, Kaihua Zhu, Hao Wang, Hongjie Bai, Jian Li, Zhihuan Qiu, & Hang Cui
Google Research, Beijing, China
Support Vector Machines (SVMs) suffer from a widely recognized scalability problem in both memory use and computational time. To improve scalability, we have developed a parallel SVM algorithm (PSVM), which reduces memory use through performing a row-based, approximate matrix factorization, and which loads only essential data to each machine to perform parallel computation. Let n denote the number of training instances, p the reduced matrix dimension after factorization (p is significantly smaller than n), and m the number of machines. PSVM reduces the memory requirement from O(n2) to O(np/m), and improves computation time to O(np2/m). Empirical study shows PSVM to be effective. PSVM Open Source is available for download at
1 Introduction
Let us examine the resource bottlenecks of SVMs in a binary classification setting to explain our proposed solution. Given a set of training data X = {(xi, yi)|xi ∈ Rd}ni=1, where xi is an obser- vation vector, yi ∈ {−1,1} is the class label of xi, and n is the size of X, we apply SVMs on X to train a binary classifier. SVMs aim to search a hyperplane in the Reproducing Kernel Hilbert Space (RKHS) that maximizes the margin between the two classes of data in X with the smallest train- ing error (Vapnik, 1995). This…
Open Document