It was a pain to get Jcuda&scala worked on Spark, in case I (as well as someone else) need to install them later, I will try my best to recall most of the harmful errors. (it's not hard to solve many small bugs just by googling them, and I won't mention them)
1. Install Cuda, Spark
Follow the official guide should be fine.
2. Use JCuda.
Follow this link: http://www.jcuda.org/tutorial/TutorialIndex.html#GeneralSetup
You need to set CLASSPATH and LD_LIBRARY_PATH carefully, and move all the necessary *.so, *.jar to somewhere.
Follow the official guide should be fine.
2. Use JCuda.
Follow this link: http://www.jcuda.org/tutorial/TutorialIndex.html#GeneralSetup
You need to set CLASSPATH and LD_LIBRARY_PATH carefully, and move all the necessary *.so, *.jar to somewhere.
3. Reinstall Nvidia toolkits&CUDA.
It might lead to Nvidia driver mismatch (at this time, even "nvidia-smi" doesn't work).
But why I reinstalled Cuda?
When I ssh into the server this afternoon, I found my pycuda code didn't work. It said I didn't install nvcc, and I can use sudo apt-get install nvidia-cuda-toolkit to install nvcc!!! That might be the worst suggestion ubuntu have ever given me. I was suspicious about it, since I knew CUDA worked yesterday, and the only change I made was Spark and JCuda. But unfortunately, I followed the instruction and installed cuda again, which turned out to be a disaster. (I then figured out the reason why cuda didn't work today. That's because when I changed the PATH & LD_LIBRARY_PATH ) When I set JCuda path yesterday (CLASSPATH & LD_LIBRARY_PATH), I omit something...and then the cuda was not in PATH...Though it was wired, since when I tested JCuda yesterday, it worked.)
"nvidia-smi" said "No CUDA Device".
"Failed to initialize NVML: Unknown Error"
"Failed to initialize NVML: GPU access blocked by the operating system" . That's because cuda-driver version mismatch.
I then installed the specific driver version using (you should find your driver version first)
It might lead to Nvidia driver mismatch (at this time, even "nvidia-smi" doesn't work).
But why I reinstalled Cuda?
When I ssh into the server this afternoon, I found my pycuda code didn't work. It said I didn't install nvcc, and I can use sudo apt-get install nvidia-cuda-toolkit to install nvcc!!! That might be the worst suggestion ubuntu have ever given me. I was suspicious about it, since I knew CUDA worked yesterday, and the only change I made was Spark and JCuda. But unfortunately, I followed the instruction and installed cuda again, which turned out to be a disaster. (I then figured out the reason why cuda didn't work today. That's because when I changed the PATH & LD_LIBRARY_PATH ) When I set JCuda path yesterday (CLASSPATH & LD_LIBRARY_PATH), I omit something...and then the cuda was not in PATH...Though it was wired, since when I tested JCuda yesterday, it worked.)
"nvidia-smi" said "No CUDA Device".
"Failed to initialize NVML: Unknown Error"
"Failed to initialize NVML: GPU access blocked by the operating system" . That's because cuda-driver version mismatch.
I then installed the specific driver version using (you should find your driver version first)
curl -o NVIDIA-Linux-x86_64-352.55.run http://us.download.nvidia.com/XFree86/Linux-x86_64/352.55/NVIDIA-Linux-x86_64-352.55.run
There were many other weird errors but I fixed them without too much efforts (Google).
4. Using Jcuda in Scala.
(put all files under JCuda directory.)
(put all files under JCuda directory.)
scalac -cp ".:jcuda-0.7.5.jar" JCudaVectorAdd.scala
scala -cp ".:jcuda-0.7.5.jar" JCudaVectorAdd
//make sure you have pre-compiled JCudaVectorAddKernel.cu to JCudaVectorAddKernel.ptx followed jcuda document.
JCudaVectorAdd.scala
5. Using JCuda(scala) on Spark
So far, we can run JCudaVectorAdd.scala under the Jcuda directory successfully, and now I'd like to move on to Spark.
I build the jar file with sbt.
But when I submitted the job,I got,
So far, we can run JCudaVectorAdd.scala under the Jcuda directory successfully, and now I'd like to move on to Spark.
I build the jar file with sbt.
But when I submitted the job,I got,
# Your directory layout should look like this (mkdir src, main, scala by yourself)
./build.sbt
./src/main/scala/JCudaVectorAdd.scala
./src/main/scala/JCudaVectorAddKernel.cu
./src/main/scala/JCudaVectorAddKernel.ptx
./lib/*.jar
Then, run
sbt package //you need to install sbt first
/home/lzy/spark/bin/spark-submit --class JCudaVectorAdd --master local[*] --jars $(find /home/lzy/JCuda-All-0.7.5-bin-Linux-x86_64/| grep '\.jar'|tr '\n' ',') target/scala-2.11/simple-project-lzy_2.11-1.0.jar
Note: You need to include all necessary jars (that you have imported in your application) when submit a job on Spark. Or you might get errors like "Exception in thread "main" java.lang.NoClassDefFoundError: jcuda/driver/JCudaDriver"
Also, don't forget to include the jar file created by "sbt package". It is under target/scala-2.11/ by default.
Also, don't forget to include the jar file created by "sbt package". It is under target/scala-2.11/ by default.
build.sbt