Table of Contents

Open all
Close all
Preface
21
Why Read This Book?
21
Audience
22
Structure of the Book
23
Acknowledgments
28
Conclusion
29
PART I Getting Started
31
1 The Data Fabric for the Intelligent Enterprise
33
1.1 Data Fabric
34
1.1.1 Trends
35
1.1.2 Benefits
37
1.2 Data Orchestration
38
1.3 SAP Business Technology Platform
40
1.4 SAP Data Intelligence
43
1.5 Summary
50
2 Architecture and Capabilities
51
2.1 Genesis of SAP Data Intelligence
52
2.1.1 Features from SAP Leonardo Machine Learning Foundation
54
2.1.2 Evolution from SAP Data Hub to SAP Data Intelligence
58
2.2 SAP Data Intelligence Architecture
60
2.3 Deployment Options and Bring Your Own License Model
63
2.4 Kubernetes Cluster and Containers
68
2.4.1 Overview of Kubernetes
68
2.4.2 Kubernetes Cluster Architecture
75
2.4.3 Container Runtimes
78
2.4.4 Pods and Workloads
79
2.4.5 Resources and Policies
81
2.4.6 Kubernetes and SAP Data Intelligence
83
2.5 SAP Data Intelligence Launchpad
86
2.5.1 Persona-Based Application
86
2.5.2 Overview of Applications
88
2.6 Summary
91
3 Setup and Installation
93
3.1 Landscape Sizing
93
3.1.1 Sizing Various SAP Data Intelligence Components
94
3.1.2 Minimum Sizing and Initial Sizing for SAP Data Intelligence
95
3.1.3 Understanding the T-Shirt Sizing Approach
99
3.2 SAP Cloud Appliance Library
99
3.2.1 Getting Started with SAP Cloud Appliance Library
101
3.2.2 Deploying SAP Solutions in the Cloud
103
3.2.3 Activating and Creating Solution Instances
105
3.2.4 Security Considerations for SAP Cloud Appliance Library
106
3.3 On-Demand Cloud Provisioning and Instance Sizing
107
3.3.1 Sizing with SAP Cloud Appliance Library
108
3.3.2 Supported Cloud Providers for SAP Cloud Appliance Library
109
3.3.3 Understanding Costs and Payments
109
3.3.4 Backing Up, Restoring, and Terminating an Instance
112
3.4 Setting Up SAP Data Intelligence on SAP Cloud Appliance Library
113
3.4.1 Prerequisites for Cloud Provider Account
114
3.4.2 Connecting to SAP Cloud Appliance Library
122
3.4.3 Creating and Accessing the Solution
124
3.4.4 Accessing the Jump Box for Monitoring and Troubleshooting
136
3.4.5 Running the Solution
145
3.4.6 Access through Browser Using Local Hosts File
148
3.4.7 Personalization
149
3.5 SAP Data Intelligence 3.0 Installation On-Premise
150
3.5.1 Planning and Prerequisites for an On-Premise Installation
150
3.5.2 Modular Deployment with SLC Bridge
151
3.5.3 Installing SAP Data Intelligence with the Maintenance Planner and SLC Bridge
154
3.6 Summary
168
4 Using SAP Data Intelligence Applications
169
4.1 SAP Data Intelligence Launchpad Applications
169
4.2 Applications for Data Engineers
172
4.2.1 Connection Management
172
4.2.2 Metadata Explorer
174
4.2.3 Modeler
175
4.2.4 Customer Data Export
176
4.3 Applications for Data Scientists
177
4.3.1 ML Scenario Manager
177
4.3.2 Vora Tools
178
4.4 Applications for Modelers and Auditors
179
4.4.1 Monitoring Applications
180
4.4.2 Audit and System Logs
181
4.5 Applications for System Administrators
182
4.5.1 Policy Management
182
4.5.2 Handling Privileges
184
4.5.3 System Management
184
4.5.4 License Management
188
4.6 Summary
189
PART II Data Management, Orchestration, and Machine Learning
191
5 Metadata-Driven Data Governance
193
5.1 Metadata Explorer for Data Governance
194
5.1.1 Intelligent Information Management with the Discovery Dashboard
195
5.1.2 Metadata Crawlers to Explore, Classify, and Label Data Assets
196
5.1.3 Managing Metadata Data across a Connected System Landscape
196
5.2 Data Profiling to Understand Data
197
5.2.1 Profiling Data Sets from Connections
198
5.2.2 Profiling Actions and Monitor
198
5.2.3 Viewing Profile Fact Sheets
199
5.3 Managing Publications and Data Catalogs
202
5.3.1 Catalog of Published Data Sets
202
5.3.2 Automatic Tags and Hierarchical Tagging
207
5.3.3 Using Tags as Search Filters
211
5.3.4 Managing Publications in the Catalog
211
5.3.5 Lineage Depth Set in Publication Processing
214
5.4 Defining Data Quality Rules and Running Rulebooks
214
5.4.1 Rules Determining Business Data Compliance
215
5.4.2 Categories to Organize Business Rules
219
5.4.3 Using the Match Pattern Operator
220
5.4.4 Running and Monitoring Rulebooks
221
5.4.5 Business Glossary of Terms and Definitions
228
5.5 Data Lineage from Transformation History
230
5.5.1 Lineage Analyses for Tracing Data Sets to Sources
230
5.5.2 Lineage Extraction and Supported Sources
231
5.5.3 Understanding and Configuring the Lineage View
234
5.6 Summary
235
6 Modeling Data Processing Pipelines
237
6.1 Using the SAP Data Intelligence Modeler
237
6.1.1 Flow-Based Paradigm as a Network of Information
238
6.1.2 Data Pipeline Engine in the Flow-Based Modeler
239
6.1.3 Navigating the Modeler Panes and Toolbars
240
6.1.4 Built-In Operators
242
6.1.5 Creating and Validating Graphs
244
6.2 Creating and Managing Connections
250
6.2.1 Creating Connections
250
6.2.2 Connecting to Cloud Foundry
251
6.2.3 Managing Certificates
253
6.2.4 Authorizations for Connections
254
6.3 Self-Service Data Preparation with the Metadata Explorer
255
6.3.1 Preparing Data for Accurate Results and Better Insights
255
6.3.2 Self-Service Data Preparation with the Metadata Explorer
255
6.3.3 Transforming Structured Data Sets
256
6.3.4 Managing Data Preparation Actions
258
6.3.5 Processing Data Preparation Actions
259
6.4 Integrating, Processing, and Orchestrating Workflows
261
6.4.1 Graph Snippets as a Group of Operators
262
6.4.2 Working with Data Workflow Operators
264
6.4.3 Integrating SAP Cloud Applications
266
6.4.4 Change Data Capture Graph
267
6.4.5 Custom Operators
267
6.5 Scheduling and Monitoring Data Pipelines
270
6.5.1 Scheduling and Monitoring Data Pipelines
270
6.5.2 Trace Messages
272
6.5.3 Tracking Model Metrics
273
6.5.4 Kubernetes Dashboard and Cluster Logs
273
6.6 Summary
273
7 Creating Operators and Data Types
275
7.1 Creating Custom Operators
276
7.1.1 Visibility of Events
277
7.1.2 Compatibility of Port Types
277
7.1.3 Creating and Editing Operators
281
7.2 Implementing Runtime Operators
288
7.2.1 Subengines in SAP Data Intelligence Modeler
288
7.2.2 Working with Subengines to Create Operators
289
7.3 Creating Data Types
290
7.3.1 Predefined Global Scalar Types
291
7.3.2 Defining Your Own Custom Data Types
292
7.3.3 Leveraging Data Types in Graphs
293
7.4 Summary
293
8 Building Docker Images
295
8.1 Containers in Pods and Pods in Clusters
295
8.1.1 Delivery of Data-Driven Applications
295
8.1.2 Helm: Package Manager for Kubernetes
296
8.1.3 Dockerfiles: Predefined Runtime Environments
297
8.2 Assembling a Docker Image
298
8.2.1 Building Docker Images through Dockerfiles
298
8.2.2 Enhancing Docker Images with Different Package Managers
302
8.3 Dockerfile Inheritance
303
8.4 Using Docker with Python
305
8.5 Summary
308
9 Machine Learning
309
9.1 Machine Learning with SAP
310
9.1.1 Machine Learning Solutions in the SAP Landscape
311
9.1.2 TEI Methodology in Machine Learning
313
9.1.3 Transforming Business Use Cases with Machine Learning
318
9.1.4 Data-Driven Approach versus Traditional Rule-Based Approach
319
9.1.5 Machine Learning Tasks in Enterprise Contexts
321
9.1.6 Architectural Principles for Machine Learning
325
9.2 Machine Learning with SAP Data Intelligence
328
9.2.1 Scalable Data Pipelines in Complex Data Landscapes
329
9.2.2 Data and Algorithms as Assets for Machine Learning
331
9.2.3 Leveraging Open-Source Environments and Skills
331
9.3 Using the ML Scenario Manager
333
9.3.1 ML Scenario Manager Overview
333
9.3.2 Setting Up a Scenario in ML Scenario Manager
334
9.3.3 Integrating Hyperscale Data and Targets
339
9.3.4 Leveraging Scenario Templates for Machine Learning
340
9.3.5 Dockerfile Building and Grouping
345
9.3.6 Implementing TensorFlow Pipelines
347
9.3.7 Training and Deploying Models with New Versions
350
9.3.8 Metrics Explorer and Machine Learning Tracking SDK
360
9.3.9 Run Collection and Run Performance
363
9.3.10 Visualizing SAP Data Intelligence Metrics with SAP Analytics Cloud
363
9.4 ML Data Manager in Data Workspaces and Data Collections
365
9.4.1 Data Workspaces and Data Collections
365
9.4.2 Organizing Data Sets in Data Lakes
367
9.4.3 Curating a Data Collection
368
9.4.4 Registering a Data Set
369
9.5 Summary
371
10 Jupyter Notebook
373
10.1 Jupyter Notebook Fundamentals
374
10.1.1 Interactive Tool for Data Science Projects
374
10.1.2 Jupyter Notebook Dashboard and User Interface
379
10.1.3 Data Analysis in Jupyter Notebook
381
10.2 Working with SAP HANA Cloud
386
10.2.1 SAP HANA Cloud: Cloud Database as a Service
387
10.2.2 Exploring SAP HANA Cloud on an SAP BTP Trial Account
389
10.2.3 Understanding the SAP HANA Cockpit and SAP HANA Database Explorer
391
10.2.4 Using Jupyter Notebook in SAP BTP and Integration with SAP HANA Cloud
393
10.2.5 SAP Data Intelligence Connection
402
10.3 Data Science Experiments with Jupyter Notebook
405
10.3.1 SAP HANA Embedded Machine Learning
406
10.3.2 Machine Learning Core Operators
413
10.3.3 SAP HANA ML Training Operator
423
10.3.4 SAP HANA ML Inference Operator
425
10.4 JupyterLab as the Next-Gen Jupyter Notebook
430
10.4.1 JupyterLab: The Next-Gen User Interface with Built-In Libraries
431
10.4.2 Accessing Jupyter Notebook Artifacts from JupyterLab
434
10.4.3 SAP HANA Python Client API
436
10.5 Summary
437
11 SAP Data Intelligence Python SDK
439
11.1 Using SAP Data Intelligence Python SDK
440
11.1.1 Setting a Context in Jupyter Notebook
440
11.1.2 Data Lake API for SDL
441
11.1.3 Retrieving Machine Learning Scenario Metadata
443
11.1.4 Training Container Using the SDK
444
11.1.5 Executing and Deploying Pipelines
447
11.2 Accessing Artifacts Using Methods
448
11.3 Machine Learning Tracking SDK
450
11.3.1 Initializing Run for an Experiment
451
11.3.2 Grouping Runs in Run Collections
451
11.3.3 Analyzing Metrics and Logs
454
11.4 Summary
454
PART III Integration
457
12 Integrating with ABAP Systems
459
12.1 Integration Scenarios
459
12.1.1 Scenarios and Use Cases for Integration
460
12.1.2 ABAP Metadata in the Metadata Explorer
461
12.2 Provisioning Data from ABAP Systems
465
12.2.1 Exposing the CDS View
465
12.2.2 Connection Prerequisites for Data Extraction
466
12.2.3 Connecting On-Premise Systems with the Cloud Connector
467
12.3 Using Operators to Trigger Execution in an ABAP System
472
12.3.1 ABAP Operators to Trigger Function Modules or BAPIs
472
12.3.2 Prerequisites for ABAP Operators in Remote Systems
474
12.4 SAP BW/4HANA and SAP Data Intelligence Hybrid Data Virtualization
478
12.4.1 Prerequisites in SAP Business Warehouse
478
12.4.2 Using Connection Type HANA_DB
480
12.4.3 Authorization Check for Services
481
12.4.4 SAP BW Operator for Pipeline
484
12.5 Additional Connectivity
485
12.5.1 SAP Information Steward
485
12.5.2 SAP HANA for SQL Data Warehousing
489
12.6 Summary
495
13 Integrating with Non-SAP Systems
497
13.1 Non-SAP Cloud System Connectivity
497
13.1.1 Amazon S3
498
13.1.2 Amazon Redshift
500
13.1.3 Windows Azure Storage Blob
501
13.1.4 Microsoft Azure SQL Data Warehouse
502
13.1.5 Microsoft Azure Data Lake
503
13.1.6 Google Cloud Storage
506
13.1.7 Google BigQuery
508
13.1.8 IBM Cloud Storage
509
13.2 Non-SAP On-Premise System Connectivity
510
13.2.1 Oracle Relational Database Management System
510
13.2.2 Microsoft SQL Server
512
13.3 Summary
513
14 Integrating Big Data Workloads with SAP Vora
515
14.1 SAP Vora in Kubernetes Framework
516
14.1.1 System Management
516
14.1.2 SAP Vora Engine Architecture
517
14.1.3 Accessing SAP Vora User Interface
520
14.1.4 SAP Vora Data Preview
521
14.1.5 Using SQL Editor
522
14.1.6 Using SQL Scripts
523
14.2 Data Modeling in SAP Vora
524
14.2.1 Creating Database Schemas
524
14.2.2 Creating Partition Schemes
525
14.2.3 Creating Tables and Views
527
14.2.4 Creating Calculated Columns
532
14.2.5 Additional Functions for Views
533
14.3 Hierarchies in SAP Vora
536
14.3.1 SAP Vora SQL for Hierarchical Data Analysis
537
14.3.2 Using Adjacency Table to Render a Hierarchy
539
14.3.3 Caching Hierarchies with Materialized Views
539
14.4 Full-Text Search in SAP Vora
540
14.4.1 Text Analysis Graphs in Modeler
540
14.4.2 Linguistic and Semantic Analysis
541
14.4.3 Full-Text Search on a Document Collection
542
14.5 Summary
542
15 Integrating with SAP Data Warehouse Cloud
543
15.1 Overview of SAP Data Warehouse Cloud
543
15.1.1 SAP Cloud Services Ecosystem
544
15.1.2 Setting Up the Trial Tenant
546
15.2 Understanding Spaces
549
15.2.1 Spaces as Virtual Workspaces
549
15.2.2 Development in a Space
554
15.2.3 Managing Spaces
556
15.3 Exploring Connections and Using the Data Builder
561
15.3.1 Available Connection Types
561
15.3.2 Data Builder: Model to Business Catalog
562
15.3.3 Space-Aware Integrated Story Builder
566
15.4 Data Builder in SAP Data Warehouse Cloud versus Pipelines in SAP Data Intelligence
570
15.5 Summary
570
16 Integrating with SAP Analytics Cloud
571
16.1 Overview of SAP Analytics Cloud
571
16.1.1 Solution to Analyze, Plan, Predict, and Collaborate
572
16.1.2 Fundamental Components: Data, Models, and Stories
574
16.2 Use Operators: Read File, Formatter, and Producer
582
16.2.1 Read File Operator
583
16.2.2 Decode Table Operator
584
16.2.3 SAP Analytics Cloud Formatter
585
16.2.4 SAP Analytics Cloud Producer
586
16.3 Pipelines to Train, Predict, and Visualize Data
587
16.3.1 Using the Dataset API
587
16.3.2 Data Set Provision and Consumption
589
16.4 Summary
591
PART IV System Management, Security, and Operations
593
17 Administration
595
17.1 System Management Command-Line Client Reference
595
17.1.1 Command-Line Client for SAP Data Intelligence
596
17.1.2 Using the VCTL Tool: JavaScript Utility
597
17.1.3 Useful Commands for Command-Line Client
598
17.2 Administration Applications
599
17.2.1 Administrator Access
600
17.2.2 System Management
600
17.2.3 License Management
611
17.2.4 Connection Management
613
17.3 Monitoring the SAP Data Intelligence Modeler
616
17.3.1 Monitoring the Status of Graph Execution
616
17.3.2 Tracing Messages to Isolate Problems and Errors
621
17.3.3 Downloading Diagnostic Information for Graphs
623
17.4 SAP Data Intelligence System Logging
626
17.4.1 Kubernetes Cluster-Level Logging Mechanism
627
17.4.2 Browsing Application Logs in the Diagnostics Kibana Web User Interface
629
17.4.3 Aggregating Logs in External Logging Service
630
17.5 System Diagnostics
631
17.5.1 SAP Data Intelligence Diagnostics: Diagnostics Grafana
631
17.5.2 Kubernetes Cluster Metrics
633
17.5.3 Integrating Diagnostics with External APM Solution
635
17.6 Summary
637
18 Security
639
18.1 Approach to Data Protection
639
18.1.1 Business Semantics for Industry-Specific Legislations
640
18.1.2 Functions for Data Privacy Compliance
641
18.1.3 Security Features for Data Protection and Privacy
641
18.2 Authenticating Services and Users
642
18.2.1 Roles and Scope-Driven User Access Control
642
18.2.2 SAP BTP User Account and Authentication
644
18.2.3 Self-Signed Certificate Authority and TLS
649
18.2.4 Leveraging Policy Management for Access Control
649
18.2.5 Enabling Security Features on Kubernetes Cluster
657
18.3 Securely Connecting On-Premise Systems
658
18.3.1 Cloud Connector
658
18.3.2 Site-to-Site Virtual Private Network
659
18.3.3 Virtual Private Cloud Peering
659
18.4 Summary
659
19 Maintenance
661
19.1 Understanding Operational Modes or Run Levels
661
19.2 Switching the Platform to Maintenance Mode
662
19.2.1 Enabling or Disabling Maintenance Mode
663
19.2.2 Restarting SAP Data Intelligence Services
664
19.2.3 Setting Up a Remote Connection to SAP
664
19.3 Increasing System Management Persistent Volume Size
665
19.3.1 Persistent Volume Error Handling
665
19.3.2 Changing the Persistent Storage Size of the SAP Vora Disk Engine
667
19.3.3 Changing the Buffer and File Size of the SAP Vora Disk Engine
668
19.4 Performing Backups
668
19.5 Summary
671
20 Application Lifecycle Management
673
20.1 Version Control System
673
20.2 Git
674
20.2.1 Git Basics and Terminology
675
20.2.2 Git Integration and CI/CD Process
678
20.2.3 Setting Up Your Environment for Git Workflows
697
20.3 Continuous Integration and Continuous Delivery
707
20.3.1 Continuous Integration Best Practices
707
20.3.2 Leveraging SAP Solutions for CI/CD
712
20.4 DevOps Fundamentals and Tools
713
20.4.1 The Core Tenets of DevOps
715
20.4.2 Implement Tooling for DevOps
718
20.4.3 DevOps for Hybrid Architectures
719
20.5 SAP Data Intelligence as the MLOps Platform
723
20.5.1 Production Lifecycle of Machine Learning Models
724
20.5.2 MLOps Challenges
726
20.5.3 MLOps Capabilities
727
20.6 Migrating from SAP Leonardo Machine Learning Foundation
730
20.6.1 Bring Your Own Model
731
20.6.2 Migrating the Training Data
733
20.6.3 Adding the Training Data to a Data Lake
734
20.7 Summary
734
21 Business Content and Use Cases
737
21.1 Digital Transformation and SAP Data Intelligence
737
21.2 Business Content by Industry
740
21.3 Finance Use Cases
746
21.4 Supply Chain Use Cases
747
21.5 Manufacturing Use Cases
749
21.6 Summary
751
Appendices
753
A Outlook and Roadmap
753
A.1 Release Management
754
A.2 Recent Innovations
754
A.3 Roadmap Explorer
758
A.4 Future Outlook
759
B The Authors
763
Index
765