Treat unallowed characters in SAP BW/4HANA with SQLScript

Treat unallowed characters in SAP BW/4HANA with SQLScript

Veröffentlicht am 6. November 2020 von

Jörg Brandeis

| ABAP | AMDP | BW/4HANA | SQLScript |

With SQLScript, errors can be avoided elegantly and performantly during loading due to the unallowed (i.e. invalid) characters in SAP BW. This article provides an overview of the problem and outlines the solutions in AMDP transformation routines to clean up the data.


The problem

The allowed characters

In SAP BW, not all characters are allowed in the values of InfoObjects. By default, these are only the following characters:

!"%&'()*+,-./:;<=>?_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ

If you need more than this, which is likely to be an SAP customer for an international clientele, you have to set this separately in the BW Customizing Guide or in the RSKC transaction. There are three options that cannot be combined:

Single characters

Either you maintain here a ridiculously short list of 75 more characters. This will not be enough sooner or later, as no umlauts, accents or Asian characters are allowed in the standard.

ALL_CAPITAL

Or you can allow nearly all characters when you use here the ALL_CAPITAL constant (SAP Note 173241 - allowed characters in the BW) .

ALL_CAPITAL_PLUS_HEX

The third option is to use ALL_CAPITAL_PLUS_HEX in the customizing. SAP explicitly discourages this, as control characters are also allowed in the InfoObject values. This can cause problems with the front-end tools (SAP Note 1075403 - unallowed characters and ALL_CAPITAL). This option is intended only for the short-term maintenance of productive operations. If it has to be used, then the data should be cleaned up in a timely manner and then the value should be deferred.

Irrespective of the above settings, a characteristic value must never consist of a simple hash '#' and must not begin with a quotation mark '!'.

Treatment of invalid characters in ABAP

In ABAP transformation routines, there are different approaches to removing the invalid characters. In the blog of Denis Reis, for example, you will find ideas on how to do this manualy as long as you have a positive list of allowed characters (Denis Reis: Useful ABAP transformation routines).

Function

And of course there is more than one function module to do this. With RSKC_CHAVL_CHECK you can check the values against the current settings of the system. And with SCP_REPLACE_STRANGE_CHARS (SAP Note 1062237 - Transliteration of characters with SCP_REPLACE_STRANGE_CHARS) the suspicious characters can be replaced. But there can occur performance problems with this function module(SAP Note 2305674 - Performance of SCP_REPLACE_STRANGE_CHARS is bad).

Analysis of errors due to unallowed characters in SQLScript

If the loading or activation of data stops due to unallowed characters, then the first thing to do is to understand why the error happens. In the ABAP debugger there is the HEX-Code of each character visible.

With the scalar function UNICODE you can determine in SQLScript the corresponding Unicode number of a character ( SAP documentation UNICODE).

Now we only have to analyze every single letter of a faulty value. To do this, we create an anonymous block in the SQL console, in which we disassemble the suspicious strings using the SERIES_GENERATE_INTEGER function. The example goes to a table from the demo data model from my book and needs to be adopted to your needs. The relevant parts are provided with comments.

    DO BEGIN

    lt_suspect_data = SELECT title AS suspect --Replace with the column name
                        FROM tasks            --Replace by your table e.g. "/BIC/A...2"
                       WHERE id = 1;          --Restrict to the corrupt data
                        
                        
    SELECT suspect,
         element_number,
         SUBSTR(suspect, element_number, 1),
         UNICODE(SUBSTR(suspect, element_number, 1))
    FROM :lt_suspect_data
    CROSS JOIN SERIES_GENERATE_INTEGER( 1,1,60) 
    WHERE SUBSTR(suspect, element_number, 1) <> ''
    ORDER BY suspect, 
              element_number;
    END;

Non-Breaking Space

Although the above SAP note mentions above all the control characters as a source of error, there may be other characters that cause problems. In this way, for example, we identified the Unicode character 0160 in a project that did not want to be loaded. It is a so-called non-breaking space (NBSP) that also causes problems in other contexts(SAP notes on this).

Handling Unallowed Characters in SQLScript

Single characters

When it comes to replacing individual characters, the simplest method is the SQL-Function REPLACE. One difficulty with programming is that the character to be replaced is not on the keyboard. But if we know its Unicode character number, then we can use the SQL function NCHAR (<character number> ) to generate the character.

OutTab = SELECT ...
                REPLACE("/BIC/BUTXT", NCHAR(0160), ' ') AS "/BIC/BUTXT",
                ...
           FROM :InTab;

If you want to replace multiple characters in this way, you have to nest the REPLACE function. This reduces the clarity and thus the maintainability.

Replace with regular expressions

Regular expressions describe a pattern that can also be used to make substitutions. In SQLScript, there is the SQL-Function REPLACE_REGEXPR. In these patterns, Unicode characters (see also here) can be entered with the following notation:

x{<HEX-CharacterNumber>}

For the example of NBSP this is x{A0} for example. And with the square bracket you can also search elegantly for several Unicode characters at the same time.

OutTab = SELECT ...
                REPLACE_REGEXPR('[x{A0}\x{8239}]' IN title WITH ' ') AS "/BIC/BUTXT",
                ...
           FROM :InTab;

However, we still have to name all characters individually with their HEX character number.

All control characters

If all control characters HEX00 to HEX1F mentioned in SAP Note 1075403 are to be removed, this can also be done very elegantly with a regular expression. The character class [[:cntrl:]] includes exactly these characters ( Wikipedia RegularExpressions).

The corresponding query will look like this:

OutTab = SELECT ...
                REPLACE_REGEXPR('[[:cntrl:]]' IN title WITH '') AS "/BIC/BUTXT",
                ...
           FROM :InTab;

Or you can use the negation of the character class of all printable characters: [^[:print:]] . This expression already includes the NBSP.

OutTab = SELECT ...
                REPLACE_REGEXPR('[[:cntrl:]\x{00A0}]' IN title WITH '') AS "/BIC/BUTXT",
                ...
           FROM :InTab;

Warning - please test

Unfortunately, regular expressions are extremely fragile. A small thing can render the expression completely unusable. This means that we should always test our logic beforehand. I usually do this in the SQL console with a small example:

create table test_char(line nvarchar(100));

insert into test_char values ('ABCDE');
insert into test_char values ('AB'||nchar(160)||'CDE');
insert into test_char values (nchar(0010)||'ABCDE');
insert into test_char values ('AB'||nchar(10)||'CDE');
insert into test_char values ('ABCDE');

 SELECT line,
        REPLACE_REGEXPR('[[:cntrl:]\x{00A0}]' IN line WITH '#') AS "/BIC/BUTXT"
   from test_char;

Conclusion

There are also very elegant ways in SQLScript to remove the characters that are not allowed. However, in-depth SQLScript knowledge and careful testing are very important here.


I am pleased that this text will help you in your work in the projects. I would be happy about links, quotes with references and, above all, mentions on social media. And of course also about any form of feedback, if you have had other or complementary experiences.