forked from apache/cloudberry-bootcamp
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
15 changed files
with
154 additions
and
131 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
74 changes: 74 additions & 0 deletions
74
101-cbdb-tutorials/101-0-introduction-to-database-and-cloudberrydb-architecture.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
title: "Introduction to Database and CloudberryDB Architecture" | ||
--- | ||
|
||
# Lesson 0: Introduction to Database and CloudberryDB Architecture | ||
|
||
## Background: Database Concepts | ||
|
||
Before starting this tutorials, spend some time to get familiar with how (single instance) databases work. If you already have some knowledge and experience with Oracle, MySQL or especially Postgres - this is great. | ||
|
||
Databases (relational databases) are pieces of software that are used to store and manage/process data. Usually these databases are built with the client/server concept - the database is implemented as a server and multiple clients can connect and read or update the data. | ||
|
||
The clients usually use SQL language to access the data (or some dialect of the SQL language specification). The clients can be different implementations - proprietary client libraries or ODBC/JDBC compliant. | ||
|
||
Database data is usually stored in objects called tables. Tables have predefined structure (columns) and have zero or multiple rows. | ||
|
||
Tables can be grouped in logical entities called 'schemas' (or namespaces). | ||
|
||
Tables/schemas are located in a 'database' entity. Some database software supports multiple databases per instance (MySQL, Postgres), others support one database per instance (Oracle). | ||
|
||
Along with tables there are supporting objects such as indexes, sequences, views, etc. | ||
|
||
The database system needs to maintain some metadata - called the database catalog. The database catalog contains information about the data objects and supporting objects as well as anything else that needs to be stored on system level (user authentication, etc.). | ||
|
||
SQL (Structured Query Language) is a descriptive language, not imperative language. Therefore it describes what the user needs, not how to get it. When the user describes what he needs, the database need to decide how to get it. This process is called query optimization. The end result from this process is a query plan, which is a step by step instruction how to get the result. | ||
|
||
## Introduction to the Cloudberry Database Architecture | ||
|
||
Cloudberry Database is a massively parallel processing (MPP) database server with an architecture specially designed to manage large-scale analytic data warehouses and business intelligence workloads. | ||
|
||
MPP (also known as a shared nothing architecture) refers to systems with two or more processors that cooperate to carry out an operation, each processor with its own memory, operating system and disks. Cloudberry uses this high-performance system architecture to distribute the load of multi-terabyte data warehouses and all of a system's resources in parallel to process a query. | ||
|
||
Cloudberry Database is based on open-source PostgreSQL open-source. It is essentially several PostgreSQL database instances working together as one cohesive database management system (DBMS). It is based on PostgreSQL 14.4 kernel and in most cases it is very similar to PostgreSQL. Database users interact with Cloudberry Database as a regular PostgreSQL DBMS. | ||
|
||
In CloudberryDB, internals of PostgreSQL have been modified and optimized to support parallel structure of Cloudberry Database. For instance, system catalog, optimizer, query executor and transaction manager components have been modified and enhanced to be able to execute queries simultaneously across the parallel PostgreSQL database instances. CloudberryDB interconnect (the networking layer) enables communication between distinct PostgreSQL instances and allows the system to behave as one logical database. | ||
|
||
Cloudberry Database also includes features designed to optimize PostgreSQL for business intelligence (BI) workloads. For example, CloudberryDB has added parallel data loading (external tables), resource management, query optimizations and storage enhancements,. | ||
|
||
_Figure 1. High-Level Cloudberry Database Architecture_ | ||
|
||
 | ||
|
||
The following topics describe the components that make up a Cloudberry Database system and how they work together. | ||
|
||
### CloudberryDB Master(Coordinator) | ||
|
||
The Cloudberry Database master is the entry to the Cloudberry Database system, it accepts client connections, handle SQL queries and then distributs workload to the segment instances. | ||
|
||
Cloudberry Database end-users only interact with Cloudberry Database through master node as a typical PostgreSQL database. They connect to database using client such as psql or drivers like JDBC or ODBC. | ||
|
||
The master stores global system catalog. Global system catalog is set of system tables that contain metadata for Cloudberry Database itself. Master node does not contain any user table data; user table data resides only on segments. Master node would authenticate client connections, processe incoming SQL commands, distribute workloads among segments, collect the results returned by each segment and return the final results to the client. | ||
|
||
### CloudberryDB Segments | ||
|
||
Cloudberry Database segment instances are independent PostgreSQL databases that each of them store a portion of the data and perform the majority of query execution work. | ||
|
||
When a user connects to the database via the Cloudberry master and issues queries, accordingly execution plan would be distributed to each segment instance. For more information about query processes, see About Cloudberry Query Processing. | ||
|
||
The server that has segments running on it is called segment host. A segment host usually has two to eight Cloudberry segments running on it, the number depends on serveral factors, CPU cores, memory, disk, network interfaces or workloads. To get better performance from Cloudberry Database, it is suggested to distribute data and workloads evenly across segments so that execution plan could be finished across all segments and with no bottleneck. | ||
|
||
### CloudberryDB Interconnect | ||
|
||
The interconnect is the networking layer of the Cloudberry Database architecture. | ||
|
||
The interconnect refers to the inter-process communication mechanism in-between segments. By default, interconnect uses User Datagram Protocol (UDP) to send/receive messages over the network. Interconnect provide datagram verification and retransmission mechanism. Reliability is equivalent to Transmission Control Protocol (TCP), performance and scalability exceeds TCP. If user choose TCP in interconnect, Cloudberry would have limit around 1000 segment instances. With UDP and interconncet the limit does not exit. | ||
|
||
Now you can start the following lessons by clicking on the links: | ||
|
||
- [Lesson 1: Create Users and Roles](../101-cbdb-tutorials/101-1-create-users-and-roles.md) | ||
- [Lesson 2: Create and Prepare Database](../101-cbdb-tutorials/101-2-create-and-prepare-database.md) | ||
- [Lesson 3: Create Tables](../101-cbdb-tutorials/101-3-create-tables.md) | ||
- [Lesson 4: Data Loading](../101-cbdb-tutorials/101-4-data-loading.md) | ||
- [Lesson 5: Queries and Performance Tuning](../101-cbdb-tutorials/101-5-queries-and-performance-tuning.md) | ||
- [Lesson 6: Backup and Recovery Operations](../101-cbdb-tutorials/101-6-backup-and-recovery-operations.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 0 additions & 43 deletions
43
101-cbdb-tutorials/introduction-to-the-cloudberry-database-architecture.md
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,10 @@ | ||
<!--  --> | ||
# CloudberryDB Performance Benchmark | ||
|
||
This tutorial will show you how to do the CloudberryDB performance by the CloudberryDB sandbox docker image. | ||
This tutorial will show you how to perform a CloudberryDB performance benchmark in the CloudberryDB Sandbox Docker image. The benchmark process consists of two parts: | ||
|
||
Including two parts, you can click to see more details: | ||
- [Part 1: TPCH benchmark](../103-cbdb-performance-benchmark/tpch.md), which is based on the benchmark tool `TPC-H`. | ||
- [Part 2: TPCDS benchmark](../103-cbdb-performance-benchmark/tpcds.md), which is based on the benchmark tool `Greenplum Database TPC-DS`. | ||
- [Part 1: TPC-H benchmark](../103-cbdb-performance-benchmark/tpch.md) | ||
- [Part 2: TPC-DS benchmark](../103-cbdb-performance-benchmark/tpcds.md) | ||
|
||
These benchmarks are designed to simulate real-world scenarios and measure the performance of decision support systems under various conditions. | ||
|
||
By completing this tutorial, you will gain a comprehensive understanding of CloudberryDB's performance capabilities and how to effectively benchmark its performance using industry-standard tools and techniques. |
Oops, something went wrong.