(Spark maintains a document about this, but I find it's a bit difficult to understand if you are new to Spark. ) Here is a simpler guide (at least for me).
It was a pain to get Jcuda&scala worked on Spark, in case I (as well as someone else) need to install them later, I will try my best to recall most of the harmful errors. (it's not hard to solve many small bugs just by googling them, and I won't mention them)
Many people want to leverage CUDA for some scala (machine learning) code. But cuda doesn't support scala TAT.
Hope never ends! We can always try the following approaches:
Performance optimization is a never ending process. Project Tungsten will be the largest change to Spark's execution engine since the project's inception. It aims at substantially improving the efficiency of memory and CPU for Spark applications, to push performance closer to the limits of modern hardware.
Why Cpu is the main bottleneck instead IO: 1. Hardware has been improved. 2.Spark's IO has been optimized. 3.Data Formats have improved. 4. Serialization and hashing are CPU-bound bottlenecks.
MLlib supports local vectors and matrices stored on a single machine, as well as distributed matrices backed by one or more RDDs. Local vectors and local matrices are simple data models that serve as public interfaces. The underlying linear algebra operations are provided by Breeze and jblas.
Related Data Types:
Computer cluster (concept), consists of a set of connected computers that work together and can be viewed as a single system. Computer clusters have each node set to perform the same task, controlled and scheduled by software.
(Design & Configuration) In a Beowulf system, the application programs never see the computational nodes (slave computers), but only interact with the "master", which is a specific computer handling the scheduling and management of the slaves.