With the SAP HANA database, SAP has taken up the cause of performance. This is very welcome, because in fact the long runtime was always a nuisance when using the software. It was not uncommon to have to wait minutes for more complex tasks or reports.
The SAP HANA database achieves very good performance through concepts such as in-memory computing and column-oriented storage of database tables. The SAP NetWeaver ABAP platform has been enhanced accordingly to better exploit this potential. These enhancements include the AMDP framework, which allows SQLScript procedures to be called directly from ABAP code, and Core Data Services (CDS), which can now be used to model complex views on the NetWeaver system.
In addition, Open SQL, an elementary component of the ABAP language, has been enhanced with some important features. With these resources, the classic SAP applications were also specifically optimized for SAP HANA. SAP S/4HANA and SAP BW/4HANA are two variants of the ERP system and SAP Business Warehouse (BW) that have been specially adapted for use on SAP HANA.
Articles about the SAP HANA database in our expert blog
What is SAP HANA?
SAP HANA is a database and application development platform. It contains many practical tools to map all possible use cases. I would like to consider these different aspects in the following.
SAP HANA – a fast SQL database
The term HANA originally stood for High Performance Analytic Appliance. This was a column-oriented relational database using in-memory technology. I will briefly describe the most important techniques that contribute to good performance below.
Column Store – Column based storing
In SAP HANA, most data is stored in the so-called column store. This means that the data of a column are stored together. Figure 1.2 shows an example of how the data in Figure 1.1 is stored column by column.
Many operations can be performed much faster in individual columns than in an entire table. For example, aggregate functions only require the data of a single column at a time. Another advantage of column orientation is that each column can act as an index. When selecting column values, you do not have to search the entire table, but only this one column. Memory consumption is also significantly lower with column-oriented storage because of efficient compression for each column.
Because of these advantages, SAP recommends storing tables in the column store for practically all use cases. For more information about the Column Store and Row Store storage types, see SAP Note 2222277, titled FAQ: SAP HANA Column Store and Row Store.
It goes without saying that a database is faster if the data does not have to be read from the hard disk first. A main memory access is about 30 to 40 times faster compared to reading an SSD disk. If the data is even in the CPU cache, this factor is about 10 times higher.
Disadvantage of in-memory technology
However, in-memory technology has two serious drawbacks:
- The costs for main memory are considerably higher than for hard disk memory. On the one hand, this problem is mitigated by the continuously decreasing hardware prices, and on the other hand, the compression (see the following section) of the data reduces the need for main memory.
- In the event of a power failure or hardware defect, the data is immediately deleted from the memory. Since the data is not stored in parallel on hard disk, in contrast to the in-memory approaches of other database manufacturers, a different concept is necessary. SAP uses a combination of savepoints, which regularly write the contents of main memory to disk, and the transaction log, which is written for all changing database accesses. In the event of a restart of the database, the last savepoint is loaded, and the changes thereafter are tracked from the transaction log. This ensures the persistence and consistency of the data at all times.
In-memory technology not only allows read operations to be performed more quickly because the data does not have to be copied to the main memory first. Write operations are also considerably more performant, partly because the creation of additional indexes is largely unnecessary.
The data is compressed column by column. No complex algorithms are used, as for example with the well-known Zip procedure. Instead, several simple compression types are used, each allowing very efficient packing and unpacking. The standard algorithm is dictionary compression, where each occurring value in a column is assigned a numeric key as a 4-byte integer. Only these value keys must then be stored in the column.
This already achieves good compression rates for most columns. Figure 1.3 shows an example of this procedure. Very high compression rates can be achieved, especially for columns with only a few different characteristics.
Figure 1.3: Example of dictionary compression
|CPU load during packing and unpacking|
The obvious disadvantage is the CPU load when packing and unpacking. There are two strategies against this:
- The packing of the data always takes place only during the delta merge operation (described in the following section “Insert-Only”). This is done in parallel with the other insert and read operations. Thus, it does not affect the running applications.
- Unpacking is only ever done for the columns that are actually needed. For example, for a join, only the columns relevant to the join condition are initially unpacked. This allows you to quickly determine the relevant data records for the result. Later the other columns are read in addition. These must then also be unpacked. However, this can be distributed very well on several processors in parallel. Figure 11.27 shows an example of the different database plan operators for a simple join. The plan operator JERequestedAttributes reads the corresponding attributes, which are then combined into a result row by JEAssembleResults.
For the different compression methods and even more background information on the topic, see SAP Note 2112604 titled FAQ: SAP HANA Compression.
The SAP HANA database uses the insert-only approach for all changing operations. This does not mean that updating or deleting records is not possible. But the existing data sets in the main storage are not changed by write operations during operation. This memory area is optimized for read accesses and low memory consumption. Changes here would be relatively expensive.
Instead, changes are written to the so-called delta storage (see Figure 1.4). This is an additional memory area optimized for write accesses. The data are available here uncompressed. New data is inserted directly into the Delta Storage. When data is deleted, it is marked as to be deleted in the main or delta storage. When changes are made to an existing record, the original record is marked as to be deleted and a new record is written to the Delta Storage.
The Delta Merge operation writes the changes of the Delta Storage back to the Main Storage. Records to be deleted are removed and only one valid version per record remains in the Main Storage. The delta storage is empty after the delta merge.
The delta merge is usually triggered automatically depending on the configuration of the database. For data transfer processes (DTP) in SAP BW, an explicit delta merge can be set after execution.
Figure 1.4: Operations on Main and Delta Storage
|Information about the table status|
You can use the views M_CS_TABLES and M_CS_COLUMNS to display the current state of a table or its columns. Here you can see, for example, how much data is currently in the delta and main storage.
For background on the delta merge process, see the document “How To – Delta Merge for SAP HANA and SAP BW powered by SAP HANA”.
The four techniques described above complement each other perfectly and compensate for each other’s weaknesses:
- The column-by-column storage allows for good compression of the data.
- Compression in turn reduces memory consumption, so less main memory is needed.
- The insert-only strategy with occasional delta merge means that the compression of the data does not affect the running applications.
- And last but not least, a high-performance database also allows you to dispense with additional indexes, aggregates and materialized views. This reduces the main memory requirements even further.
The combination of techniques makes SAP HANA a very powerful database. They are also described in detail in the SAP Notes mentioned above and in the blog by Werner Daehn (http://s-prs.de/v620800), which is well worth reading.
This post is also available in: Deutsch