Hadoop
源码解析之Writable
类
1.源码
package org.apache.hadoop.io;
import java.io.DataOutput;
import java.io.DataInput;
import java.io.IOException;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Writable {
...
}
2. 方法详解
write()
方法
//Serialize the fields of this object to out#将本对象中的字段序列化至out中
@param out: DataOuput to serialize this object into.
@throws IOException
void write(DataOutput out) throws IOException;
readFields()
方法
@param in: DataInput to deseriablize this object from.
@throws IOException#可能抛出异常
void readFields(DataInput in) throws IOException;
}
Deserialize the fields of this object from in. For efficiency, implementations should attempt to re-use storage in the existing object where possible.
A serializable object which implements a simple, efficient, serialization protocol, based on {@link DataInput} and {@link DataOutput}.
Any key or value type in the Hadoop Map-Reduce framework implements this interface.
Implementations typically implement a static read(DataInput) method which constructs
a new instance, calls readFields(DataInput)and returns the instance.
- 将来自
in
(输入流)中对象的字段反序列化,从效率而言,实现应该尽可能的尝试重用既在的对象。 - 基于
DataInput
和DataOutPut
,实现一个简单的,有效的,序列化协议的一个序列化对象。 Hadoop MapReduce
框架中的任何key,value类型 都必须实现这个接口- 典型的实现是:实现静态的
read(DataInput)
方法,这个方法方法构造一个新的实例;同时调用readFields(DataInput)
方法并且返回一个实例。
3.示例
Example
public class MyWritable implements Writable {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public static MyWritable read(DataInput in) throws IOException {
MyWritable w = new MyWritable();
w.readFields(in);
return w;
}
}
4. 实现类
4.1 IntWritable
- 类释义
A WritableComparable for ints.
针对java 中基础类型int
的WritableComparable
类型 => IntWritable
- 构造器
IntWritable()
IntWritable(int value)
一个是无参构造器;一个是带有一个int参数的构造器
set()
方法
public void set(int value)
Set the value of this IntWritable.
get()
方法
public int get()
Return the value of this IntWritable.
… 因为实现的是WritableComparable
接口,所以会有其他方法,诸如:equal()
,readFields()
,write()
等方法。这里不再一一介绍。
- 类代码示例
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Writable;
import java.io.*;
public class WritableDemo {
public static byte[] serialize(Writable writable) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
DataOutputStream dataOut = new DataOutputStream(out);
writable.write(dataOut);
dataOut.close();
return out.toByteArray();
}
public static byte[] deserialize(Writable writable,byte[] bytes) throws IOException {
ByteArrayInputStream in = new ByteArrayInputStream(bytes);
DataInputStream dataIn = new DataInputStream(in);
writable.readFields(dataIn);
dataIn.close();
return bytes;
}
public static void main(String[] args) throws IOException {
/*step 1: 将 IntWritable 对象作为输入流,放到bytes中
*/
IntWritable inW = new IntWritable(163);
byte[] bytes = serialize(inW);
System.out.println("bytes.length: "+bytes.length);
/*step 2: 将上述的 bytes 作为输入流,放到outW中
*/
IntWritable outW = new IntWritable();
deserialize(outW, bytes);
System.out.println(outW.get());
}
}
write a small helper method that wraps a java.io.ByteArrayOutputStream in a java.io.DataOutputStream (an implementation of java.io.DataOutput) to capture the bytes in the serialized stream.
其中 DataOutputStream
类介绍如下:
A data output stream lets an application write primitive Java data types to an output stream in a portable way. An application can then use a data input stream to read the data back in.
一个数据输出流,允许应用以一个合适的方式将原生java 数据类型写到一个输出流中。应用能够数据输入流从而将数据读入。
既然有DataOutputStream
类,那么就有DataInputSteam
类
A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.
5.总结
Writable
这个类的作用就在于序列化<—>反序列化的相互转化上。- 注意在任何的
MapReduce
代码中,如果想写/读一个非MapReduce
自定义的变量类型,那么就必须实现Writable
接口,然后实现这两个方法。否则会报错。错误示例详见我的博客【】