Monday, 18 January 2021

SAP HANA data compression types

Author Techrelam Posted-on 21:49

Tags

SAP HANA Data Compression.

Data compression lets you use less storage space for the same amount of data, reduce cache memory consumption, and improve performance because of lower I/O demands.You can compress large object (LOB) and regular data.

SAP HANA Column store tables, you can perform data compression up to 11 times, which results in a cost-saving solution to store more data in SAP HANA database. Column store tables also provide faster data access, search, and complex calculations.

The ratio of uncompressed data size to compressed data size is known as Compression Factor. The compressed table size is the size occupied by the table in the main memory of SAP HANA database.

There are some types of sap data compression techniques.

DEFAULT
PREFIXED
RLE
CLUSTERED
INDIRECT
SPARSE

DEFAULT (Dictionary)

The standard column store dictionary approach already provides a significant space reduction, because the distinct column values are mapped to value ID numbers which typically require much less space in memory.Dictionary compression is always used. Additionally any one of the other compression techniques mentioned below can be in place. Use on Typical Scenario generally.

Dictionary compression.

PREFIXED (Prefix encoding)

Identical values at the beginning of the value ID array are stored only once, together with the number of occurrences.Use on Typical scenario single predominant column value .

Prefix encoding compression.

RLE(Run-length encoding)

Consecutive identical value IDs are replaced with a single instance of this value ID and its start position.Use on Typical Scenario several frequent column values.

CLUSTERED(Cluster encoding)

The value ID array is cut into clusters of 1024 elements. If a cluster contains only occurrences of a single value, the cluster is replaced by a single occurrence of that value. This technique use on scenario several frequent column values.

INDIRECT(Indirect encoding)

The value ID array is cut into clusters of 1024 elements. If a cluster contains only a few distinct value IDs, a cluster specific dictionary is created, so that each value ID is represented with even fewer bits. This techniques use on Typical scenario several frequent column values .

SPARSE(Sparse encoding)

The most popular value is removed from the value ID array. A bit vector indicates at which positions the value was removed. This Techniques use on single predominant column value, value ID array not well clustered.

Additionally implicit compression of dictionary entries may be performed:

Compression of consecutive dictionary entries ("delta compression")
Representation of individual characters with a minimum number of bits

Reference from SAP Snote