To fully understand Hadoop/Spark IO, you should better to first understand "Sequence File" and "Serializable".
I have been confused for a while though....and decide to write this post.
I have been confused for a while though....and decide to write this post.
You can extend a class to provide more specialized behavior.
A class that extends another class inherits all the methods and properties of the extended class (you can get all methods for free and you can write extra methods!!). In addition, the extending class can override the existing virtual methods by using the override keyword in the method definition, which allows you to provide a different implementation for an existing method. [Polymorphism]
A class can only extends one class, but can implement multiple interfaces. (differences: extends is for extending a random class except for a final class, implements if for implementing an interface; differences between abstract class & interface)
Extensions also apply to interfaces—an interface can extend another interface. As with classes, when an interface extends another interface, all the methods and properties of the extended interface are available to the extending interface.
A class that extends another class inherits all the methods and properties of the extended class (you can get all methods for free and you can write extra methods!!). In addition, the extending class can override the existing virtual methods by using the override keyword in the method definition, which allows you to provide a different implementation for an existing method. [Polymorphism]
A class can only extends one class, but can implement multiple interfaces. (differences: extends is for extending a random class except for a final class, implements if for implementing an interface; differences between abstract class & interface)
Extensions also apply to interfaces—an interface can extend another interface. As with classes, when an interface extends another interface, all the methods and properties of the extended interface are available to the extending interface.
Interface Serializable
Serializability of a class is enabled by the class implementing the java.io.Serializable interface. Classes that do not implement this interface will not have any of their state serialized or deserialized. All subtypes of a serializable class are themselves serializable. The serialization interface has no methods or fields and serves only to identify the semantics of being serializable.
Relavant link: How serialization works in C#
The basic idea is: Serialization is the process of converting an object into a stream of bytes in order to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.
Serializability of a class is enabled by the class implementing the java.io.Serializable interface. Classes that do not implement this interface will not have any of their state serialized or deserialized. All subtypes of a serializable class are themselves serializable. The serialization interface has no methods or fields and serves only to identify the semantics of being serializable.
Relavant link: How serialization works in C#
The basic idea is: Serialization is the process of converting an object into a stream of bytes in order to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.
MIxin
In OOP language, a mixin is a class that contains a combination of methods from other classes. How such a combination is done depends on the language. If a combination contains all methods of combined classes, it is equivalent to multiple inheritance. Mixins are sometimes described as being "included" rather than "inherited".
Mixins encourages code reuse and can be used to avoid the inheritance ambiguity that multiple inheritance can cause.
Java 8 introduces a new feature in the form of default methods for interfaces. Basically, it allows a method to be defined in a interface with application in the scenario when a new method is to be added to an interface after the interface class programing set up is done. To add a new function to the interface means to implement the method at every class that uses the interface.
Default methods help in this case, where they can be introduced to an interface any time and have an implemented structure which is then used by the associated classes. Hence default methods adds a possibility of applying the concept in a mix-in sort of a way.
In OOP language, a mixin is a class that contains a combination of methods from other classes. How such a combination is done depends on the language. If a combination contains all methods of combined classes, it is equivalent to multiple inheritance. Mixins are sometimes described as being "included" rather than "inherited".
Mixins encourages code reuse and can be used to avoid the inheritance ambiguity that multiple inheritance can cause.
Java 8 introduces a new feature in the form of default methods for interfaces. Basically, it allows a method to be defined in a interface with application in the scenario when a new method is to be added to an interface after the interface class programing set up is done. To add a new function to the interface means to implement the method at every class that uses the interface.
Default methods help in this case, where they can be introduced to an interface any time and have an implemented structure which is then used by the associated classes. Hence default methods adds a possibility of applying the concept in a mix-in sort of a way.
Mixin interfaces and Traits in Scala (you can think that Trait is an abstract class)
Scala has a rich type system, and Traits are a part of Scala type system which help implement mix-in behavior.As the name reveals, Traits are usually used to represent a distinct feature or aspect that is normally orthogonal to the responsibility of a concrete type or at least of a certain instance.
Scala has a rich type system, and Traits are a part of Scala type system which help implement mix-in behavior.As the name reveals, Traits are usually used to represent a distinct feature or aspect that is normally orthogonal to the responsibility of a concrete type or at least of a certain instance.
trait Singer{
def sing { println(" singing … ") }
//more methods
}
class Birds extends Singer
Here, Birds has mixed in all methods of the Trait into its own definition as if class Birds would have defined method sing() on its own.
Usually "extends" is used to inherit from a super class, but here, we also use "extends" to mix in the (first) trait. Other following traits are mixed in using keyword "with"
Usually "extends" is used to inherit from a super class, but here, we also use "extends" to mix in the (first) trait. Other following traits are mixed in using keyword "with"
class Person
class Actor extends Person with Singer
class Actor extends Singer with Performer
Scala also allows to mix-in a trait dynamically when creating a new instance of a class.
In case of creating a Person class instance, not all instances can sing:
In case of creating a Person class instance, not all instances can sing:
class Person{
def tell { println (" Human ") }
}
val singingPerson = new Person with Singer
singingPerson.sing
Related files:
BIDMach/src/main/scala/BIDMach/Learner.scala (+--)
BIDMach/src/main/scala/BIDMach/mixins/Mixin.scala (+-)
BIDMach/src/main/scala/BIDMach/models/
BIDMach/src/main/scala/BIDMach/datasources/DataSource.scala (+-)
BIDMach/src/main/scala/BIDMach/Learner.scala (+--)
BIDMach/src/main/scala/BIDMach/mixins/Mixin.scala (+-)
BIDMach/src/main/scala/BIDMach/models/
BIDMach/src/main/scala/BIDMach/datasources/DataSource.scala (+-)
abstract class DataSource(val opts:DataSource.Opts = new DataSource.Options) extends Serializable{}
object DataSource{}
BIDMach/src/main/scala/BIDMach/datasources/IteratorDS.scala (++++++++++--------) (support for spark
// Datasource designed to work with Iterators as provided by Spark.
// We assume the iterator returns pairs from a Sequencefile of (StringWritable, MatIO)
class IteratorDS(override val opts:IteratorDS.Opts = new IteratorDS.Options) extends DataSource(opts){}
BIDMat/src/main/scala/BIDMat/HDFSIO.scala (+)
trait HDFSIOtrait {
def write/readMat/ND/Mats/NDs()
}
trait MatIOtrait { def get:Array[Mat]}
BIDMat/src/main/scala/BIDMat/HMat.scala (++++++++----------)
//other changes in BIDMat are mainly adding "extends/with serializable"
//change def checkHDFSloaded = {} commit NOV17th
BIDMat/src/main/scala/BIDMat/HMat.scala (++++++++----------)
//other changes in BIDMat are mainly adding "extends/with serializable"
//change def checkHDFSloaded = {} commit NOV17th
BIDMach_Spark/src/main/scala/BIDMat/HDFSIO.scala (+++++++---) (simplified interface)
class HDFSIO extends HDFSIOtrait {
def read/writeLotsOfMat{} def appendFiles{}
}
BIDMach_Spark/src/main/scala/BIDMat/MatIO.scala (++++++++++++------) (support for Spark)
import org.apache.hadoop.io.Writable
class MatIO extends Writable with MatIOtrait {
override def write(out: DataOutput):Unit = {case: save*Mat}
override def readFields(in: DataInput):Unit = {case: load*Mat}
}
File IO for BIDMat on Hadoop is working now. We added a class in BIDMach_Spark to manage the sequence files.
A new script "bidmach" in BIDMat loads the appropriate jar files from a hadoop installation. (set HADOOP_HOME first)
(HDFS frequently used commands)
The steps to build it:
- install and setup Hadoop and start hdfs server
- build BIDMat first, and then copy BIDMat.jar into BIDMach_Spark/lib
- copy hadoop-common-nnn.jar from your hadoop installation and a lz4.jar into BIDMach_Spark/lib.
- run ./sbt package from the BIDMach_Spark directory. Copy BIDMatHDFS.jar into BIDMat/lib.
- run "bidmath" from the BIDMat directory
Try: (if you're on a Mac, this second step may fail because the file is empty. The fix is here)
A new script "bidmach" in BIDMat loads the appropriate jar files from a hadoop installation. (set HADOOP_HOME first)
(HDFS frequently used commands)
The steps to build it:
- install and setup Hadoop and start hdfs server
- build BIDMat first, and then copy BIDMat.jar into BIDMach_Spark/lib
- copy hadoop-common-nnn.jar from your hadoop installation and a lz4.jar into BIDMach_Spark/lib.
- run ./sbt package from the BIDMach_Spark directory. Copy BIDMatHDFS.jar into BIDMat/lib.
- run "bidmath" from the BIDMat directory
Try: (if you're on a Mac, this second step may fail because the file is empty. The fix is here)
saveFMat("hdfs://localhost:9000/somefilename", rand(10,10))
val a=loadFMat("hdfs://localhost:9000/somefilename"