IT虾米网

Hadoop源码解析之Writable类

qq123 2022年04月29日 大数据 194 0

Hadoop源码解析之Writable

1.源码

package org.apache.hadoop.io; 
 
import java.io.DataOutput; 
import java.io.DataInput; 
import java.io.IOException; 
 
import org.apache.hadoop.classification.InterfaceAudience; 
import org.apache.hadoop.classification.InterfaceStability; 
 
@InterfaceAudience.Public 
@InterfaceStability.Stable 
public interface Writable {
   
     
... 
} 

2. 方法详解

  • write()方法
//Serialize the fields of this object to out#将本对象中的字段序列化至out中 
@param out: DataOuput to serialize this object into. 
@throws IOException 
  void write(DataOutput out) throws IOException; 
  • readFields()方法
@param in: DataInput to deseriablize this object from. 
@throws IOException#可能抛出异常 
  void readFields(DataInput in) throws IOException; 
} 

Deserialize the fields of this object from in. For efficiency, implementations should attempt to re-use storage in the existing object where possible.

A serializable object which implements a simple, efficient, serialization protocol, based on {@link DataInput} and {@link DataOutput}.

Any key or value type in the Hadoop Map-Reduce framework implements this interface.

Implementations typically implement a static read(DataInput) method which constructs
a new instance, calls readFields(DataInput)and returns the instance.

  • 将来自in(输入流)中对象的字段反序列化,从效率而言,实现应该尽可能的尝试重用既在的对象。
  • 基于DataInputDataOutPut,实现一个简单的,有效的,序列化协议的一个序列化对象。
  • Hadoop MapReduce 框架中的任何key,value类型 都必须实现这个接口
  • 典型的实现是:实现静态的read(DataInput)方法,这个方法方法构造一个新的实例;同时调用readFields(DataInput)方法并且返回一个实例。

3.示例

  • Example
    public class MyWritable implements Writable {
   
     
      // Some data      
      private int counter; 
      private long timestamp; 
       
      public void write(DataOutput out) throws IOException {
   
     
        out.writeInt(counter); 
        out.writeLong(timestamp); 
      } 
       
      public void readFields(DataInput in) throws IOException {
   
     
        counter = in.readInt(); 
        timestamp = in.readLong(); 
      } 
       
      public static MyWritable read(DataInput in) throws IOException {
   
     
        MyWritable w = new MyWritable(); 
        w.readFields(in); 
        return w; 
      } 
    } 

4. 实现类

4.1 IntWritable
  • 类释义

A WritableComparable for ints.

针对java 中基础类型intWritableComparable类型 => IntWritable

  • 构造器
IntWritable()  
IntWritable(int value)  

一个是无参构造器;一个是带有一个int参数的构造器

  • set()方法
    public void set(int value) 
    Set the value of this IntWritable. 
  • get()方法
    public int get() 
    Return the value of this IntWritable. 

… 因为实现的是WritableComparable 接口,所以会有其他方法,诸如:equal()readFields()write()等方法。这里不再一一介绍。

  • 类代码示例
import org.apache.hadoop.hbase.util.Bytes; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.Writable; 
 
import java.io.*; 
 
public class WritableDemo {
   
     
    public static byte[] serialize(Writable writable) throws IOException {
   
     
        ByteArrayOutputStream out = new ByteArrayOutputStream(); 
        DataOutputStream dataOut = new DataOutputStream(out); 
        writable.write(dataOut); 
        dataOut.close(); 
        return out.toByteArray(); 
    } 
 
    public static byte[] deserialize(Writable writable,byte[] bytes) throws IOException {
   
     
        ByteArrayInputStream in = new ByteArrayInputStream(bytes); 
        DataInputStream dataIn = new DataInputStream(in); 
        writable.readFields(dataIn); 
        dataIn.close(); 
        return bytes; 
    } 
 
    public static void main(String[] args) throws IOException {
   
     
 
        /*step 1: 将 IntWritable 对象作为输入流,放到bytes中 
         */ 
        IntWritable inW = new IntWritable(163); 
        byte[] bytes = serialize(inW); 
        System.out.println("bytes.length: "+bytes.length); 
 
        /*step 2: 将上述的 bytes 作为输入流,放到outW中 
         */ 
        IntWritable outW = new IntWritable(); 
        deserialize(outW, bytes); 
        System.out.println(outW.get()); 
    } 
} 

在这里插入图片描述

write a small helper method that wraps a java.io.ByteArrayOutputStream in a java.io.DataOutputStream (an implementation of java.io.DataOutput) to capture the bytes in the serialized stream.

其中 DataOutputStream 类介绍如下:

A data output stream lets an application write primitive Java data types to an output stream in a portable way. An application can then use a data input stream to read the data back in.

一个数据输出流,允许应用以一个合适的方式将原生java 数据类型写到一个输出流中。应用能够数据输入流从而将数据读入。

既然有DataOutputStream 类,那么就有DataInputSteam

A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.

5.总结

  • Writable这个类的作用就在于序列化<—>反序列化的相互转化上。
  • 注意在任何的MapReduce 代码中,如果想写/读一个非MapReduce自定义的变量类型,那么就必须实现Writable接口,然后实现这两个方法。否则会报错。错误示例详见我的博客【】

评论关闭
IT虾米网

微信公众号号:IT虾米 (左侧二维码扫一扫)欢迎添加!

MapReduce之WritableComparable类介绍