VirtDB’s Mass Data Extractor for SAP
One of the difficulties/constraints to extracting large amounts of data from SAP ERP is memory management limitations on the SAP Application server. In instances where you want to process queries resulting from large data sets, SAP (by default) will try to read all data in-memory. When this happens, it will fail with an “out of memory error.” Organizations that are faced with the challenge of large data extraction typically use ABAP programmers to write a cyclic data process function that breaks down SAP data into smaller batches and operates in loops rather than in a single dataset.
The Mass Data Extractor removes the need of ABAP programmers to write cyclic data processes and instead gives SAP users an interface within SAP ERP to automatically extract large data sets.
In previous posts (VirtDB on S4HANA, SAP data mart in Tableau 1, 2) we have showcased how VirtDB supports the business in its need for self-service data extraction from SAP on an adhoc basis. In this article we will explain how VirtDB Mass Data Extractor – a featured component of our Data Unfolder – supports IT departments and appointed SAP administrators to ship large amounts of data from SAP ERP to other target systems.
Start Mass Data Extractor (MDE) by running transaction /N/STAK/MDE on an SAP ERP or S/4HANA system with VirtDB already installed:
First of all, you have to specify the table or view you want to extract the data from.
Now we will export data from table BSEG, which is usually a huge table in most SAP systems.
Then you have to specify the selection criteria (filtering conditions) you want to apply in your query. In order to do this, simply start SE16 and display the selection screen for BSEG. Now we will export all records for company code 1000.
Here you can set up any kind of selection as usual, and then you have to save it as a variant, so we can reference this variant in the Mass Data Extractor.
There is also an option to set up incremental logic in SAP, either based on a date field (like creation/modification dates) or SAP’s change logs (Change Documents…). The incremental logic will be detailed in the second entry of this series.
An important setting is the package size, specifying how many records of the table/view will be processed at once. This way you can limit the memory usage of the extraction process.
The next step is to define the target system that you want to load your data to. For this instance we will use a Microsoft SQL server as the target.
Naturally, you must enter the name of the table that will be created in MS SQL. You can also define the logic that will be used when loading data to the MS SQL table: you can choose what to do with the results of each extraction process:
- overwrite the table in MS SQL
- append the new records to the contents of the existing MS SQL table
- merge the new records with the contents of the existing MS SQL table (delta logic based on the key fields of the table)
Now you have to save these settings as a variant so you can schedule a background job:
After all details are set, press button “Schedule extraction” and you will be directed to the standard SAP job scheduler.
Here you can set up any kind of periodic job (hourly, daily or every 5 minutes etc.).
You can monitor the extractions using SM37, the standard SAP job monitor or you can extract (with Mass Data Extractor of course) VirtDB’s detailed log table into your favorite BI tool and build a monitoring dashboard. We will cover this in one of our upcoming blog posts.
As you can see, the job has finished:
You can check out the standard job log and see that the extraction was carried out in packages of 100.000 records.
There is a detailed application log for each package so you can closely monitor the progress of an extraction job.
In total, 531.150 records were loaded to MS SQL in about 13 minutes. Considering that both the SAP server and the MS SQL server were running on the same Acer Aspire V Nitro laptop, this is pretty nice. Keep in mind that this includes all transformations and conversions, so no additional steps are necessary in MS SQL.
Let’s see what happened in MS SQL! The table was created automatically using metadata from SAP (dates are dates, numbers are numbers and strings are strings etc.) and the required data was loaded into the table.
Our product also generates a view on top of the generated table with the meaningful descriptions of each column. If you use this view as a data source for any BI tool, you won’t struggle with the technical names of the SAP table fields but instead have a nice description (in any language you want!).
In circumstances where you have to offload many tables or views from SAP with the same schedule – like doing regular backups of a list of tables or reports, you can use our batch scheduling interface. With that you can set up a lot of extract jobs on many SAP data sources with similar scheduling – this functionality will be presented in the third entry of this series.
Whether you want to give the business more agility by enabling SAP users with self-service data management capabilities, allowing them to freely offload data to BI tools or you prefer restricting self-service to data warehouses/data marts ensuring IT controls SAP dataflows, both measures have their benefits. Self-service in data supply means more agility for the business, central IT’s control over the SAP dataflows means more security, robustness and performance. VirtDB’s Data Unfolder is able to cater for both requirements.