中国邮电高校学报(英文) ›› 2011, Vol. 18 ›› Issue (1): 121-128.doi: 10.1016/S1005-8885(10)60037-4

• Others • 上一篇    

Design of new format for mass data compression

覃健诚1,白中英2   

  1. 1. 北京邮电大学计算机学院08博士班
    2.
  • 收稿日期:2010-01-08 修回日期:2010-11-08 出版日期:2011-02-28 发布日期:2011-02-28
  • 通讯作者: 覃健诚 E-mail:dragon_2k@21cn.com

Design of new format for mass data compression

  • Received:2010-01-08 Revised:2010-11-08 Online:2011-02-28 Published:2011-02-28
  • Contact: Qin Jian-Cheng E-mail:dragon_2k@21cn.com

摘要:

In the field of lossless compression, most kinds of traditional software have some shortages when they face the mass data. Their compressing abilities are limited by the data window size and the compressing format design. This paper presents a new design of compressing format named ‘CZ format’ which supports the data window size up to 4 GB and has some advantages in the mass data compression. Using this format, a compressing shareware named ‘ComZip’ is designed. The experiment results support that ComZip has better compression ratio than WinZip, Bzip2 and WinRAR in most cases, especially when GBs or TBs of mass data are compressed. And ComZip has the potential to beat 7-zip in future as the data window size exceeds 128 MB.

关键词:

mass data coding, lossless compression, LZ77/LZSS algorithm, arithmetic coding

Abstract:

In the field of lossless compression, most kinds of traditional software have some shortages when they face the mass data. Their compressing abilities are limited by the data window size and the compressing format design. This paper presents a new design of compressing format named ‘CZ format’ which supports the data window size up to 4 GB and has some advantages in the mass data compression. Using this format, a compressing shareware named ‘ComZip’ is designed. The experiment results support that ComZip has better compression ratio than WinZip, Bzip2 and WinRAR in most cases, especially when GBs or TBs of mass data are compressed. And ComZip has the potential to beat 7-zip in future as the data window size exceeds 128 MB.

Key words:

mass data coding, lossless compression, LZ77/LZSS algorithm, arithmetic coding