Separating Orca into a Postgres extension: purpose and implementation details
Introduction
Greengage Database is our fork of Greenplum Database. The main idea is to stay open-sourced and keep the development and improvements of the database. We are going to move Greengage Database to a newer Postgres release, delivering richer feature set to all community users and our customers.
But this task is harder than it might sound. Previously, in order to implement massive parallel features of Greenplum Database, the core Postgres functionality was reworked tremendously. And it introduced a huge difficulty when we speak about bumping the Postgres version. For instance, moving from Postgres 9 to Postgres 12 had demanded enormous efforts — it is almost 5 years of development between major releases.
In order to make the process of moving a new Postgres release smoother and more painless, we have committed ourselves to reduce the internal coupling of Postgres core and MPP features inside Greengage Database.
Postgres itself allows extending its functionality by a powerful and mature mechanism of extensions that are capable to modify standard Postgres behavior via a set of hooks. So, if the majority of MPP features could be organized as external extensions to a pure Postgres core, it would greatly simplify updating the core part.
With that in mind, we have started a large refactoring of Greengage Database core. In scope of it, we are going to separate Postgres core and Greengage Database specific features, using standard existing Postgres means for extensions.
Arenadata’s aim to decouple Greengage’s core from PostgreSQL will enable smoother upgrades to newer PostgreSQL versions. This will allow Greengage Database to benefit from the latest features and security enhancements of PostgreSQL while maintaining its MPP capabilities.
Separation of Orca
As the first step in functionality decoupling, we’ve started with ORCA.
Orca is a powerful query planner within Greengage Database, designed to significantly enhance query performance and efficiency. It achieves this through a top-down optimization approach, cost-based optimization, advanced join optimization techniques, and extensibility.
By leveraging Orca, Greengage Database delivers exceptional performance and scalability for complex analytical workloads. Orca is the brain behind Greengage’s performance, responsible for optimizing query execution plans.
Before refactoring, Orca was deeply embedded in the Greengage Database core code.
Originally, Orca was developed as a standalone module, which was plugged into Greenplum Database. But at some point, it was placed into the Greenplum Database core. Orca sources resided in the Greenplum Database source code tree as a part of its backend. And as time passed, more and more bindings grew between Orca and the rest of Greenplum Database code. Before we could move Orca into an extension, we had to reorganize all the places where the core interacted directly with Orca.
After extensive analysis, we’ve come up with the following plan of actions:
-
Create a separate extension for Orca that will be built into a shared library and connect Orca via a planer hook implemented within the shared library. At this point, Orca’s code is still built and linked together with the main Postgres executable, but planning is performed via the hook.
-
Refactor Greengage code to remove other direct couplings with Orca:
-
memory protection component;
-
explain component;
-
gp_optimizer
functions; -
optimizer
GUC; -
pg_hint_plan
extension.
-
-
Move all Orca source files into the extension.
-
Move Orca-related GUCs into the extension.
-
Fix any leftover issues.
-
Profit!
With the sequence of steps above, during the overall feature development, we could stay in a rather stable state (meaning no major functionality is broken) and deliberately move step by step to our final goal.
Step 1. Create a new extension, connect Orca via a planner hook, and rework init procedures
All Greengage-specific extensions are located in the gpcontrib folder under the root of the project. We have added a new orca
extension there. The extension is built only if the build is configured without the --disable-orca
key. The extension produces a shared library. As a part of installation steps, when the extension is installed, it also modifies the postgresql.conf.sample file by setting:
shared_preload_libraries = 'orca'
At this point, the extension contains the shared library init and deinit functions:
/* Hooks for plugins to get control in planner() */
planner_hook_type planner_hook = NULL;
void
_PG_init(void)
{
if (!process_shared_preload_libraries_in_progress)
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("This module can only be loaded via shared_preload_libraries")));
if (!(IS_QUERY_DISPATCHER() && (GP_ROLE_DISPATCH == Gp_role)))
return;
prev_planner = planner_hook;
planner_hook = orca_planner;
}
void
_PG_fini(void)
{
planner_hook = prev_planner;
}
The init function registers the function PlannedStmt * orca_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
to planner_hook
, which is called from the planner() Postgres function. orca_planner()
is the point of Orca planner invocation. In case Orca could create a plan, the plan is used further for query execution. In case Orca didn’t manage to produce a valid plan, orca_planner()
calls standard_planner()
. It is done only for a coordinator instance, as the planning stage is done on the coordinator. (Note: Strictly speaking, segments can also perform planning of local queries dispatched by the coordinator. But Orca was never used to plan these auxiliary queries, only Postgres planner was used. So, nothing changed from the segment’s point of view.)
And here comes the first challenge. To work with Orca, we first need to do some initialization of it by calling the InitGPOPT()
function. Previously, it was done in the InitPostgres() function. This function is called for each backend at its start. But now we can rely only on _PG_init()
inside the extension, which is called during the init of Postmaster. InitGPOPT()
does memory allocations for internal structures. And memory protection is enabled only in InitPostgres(). So, we couldn’t call InitGPOPT()
before memory protection is enabled, and we were not able to place it into _PG_init()
, which could be seen as the most obvious approach.
Another interesting thing is related to de-init. The TerminateGPOPT()
de-init function of Orca was called from ShutdownPostgres(), which in turn registered as process-exit callback. We could create our own callback and register it with the help of before_shmem_exit()
in _PG_init()
… But, as already mentioned, _PG_init()
is called by Postmaster. And when Postmaster forks a new backend instance, all of its callbacks are cleared. So, no callbacks registered in _PG_init()
are called on the backends exit.
To resolve these limitations, we’ve moved to lazy initialization of Orca from the planner hook. That is a valid place, as the planner can be invoked only by an already initialized coordinator backend when memory protection is properly configured and enabled. And, once the initialization is done, the de-initialization callback is also registered.
So, overall simplified orca_planner()
logic is described in the diagram below.
After the steps above, we’d obtained a basis that allowed us to get rid of other couplings between Orca and Greengage Database core code.
Step 2. Refactor code
Step 2.1. Decouple Orca from memory protection component
Using Orca implies increased memory usage by a Greengage backend process. The GPMemoryProtect_TrackStartupMemory()
function contained a compile-time dependency on Orca that was needed to calculate the per-process committed memory size. As it prevented moving of all Orca’s code into a shared lib and making its work transparent to the GPDB core, we had to rework this part. Also, the old approach didn’t take into account other possible extensions that could affect memory usage.
To overcome the issues above, we have introduced a new GPMemoryProtect_RequestAddinStartupMemory(Size size)
function. Now, the extension should call GPMemoryProtect_RequestAddinStartupMemory()
at its _PG_init()
if the extension affects the per-process startup committed memory. Such a call was added to the Orca’s _PG_init()
. And Orca-related code is removed from GPMemoryProtect_TrackStartupMemory()
. The result accumulated by all calls to GPMemoryProtect_RequestAddinStartupMemory()
is now taken into account by GPMemoryProtect_TrackStartupMemory()
.
Step 2.2. Decouple Orca from explain component
The explain component had two compile-time dependencies on Orca:
-
when explaining in DXL format;
-
when printing out the name of a planner, which created the plan.
DXL is an XML-based language used to encode all the necessary information that comes in or out of Orca, such as query to plan, ready plans, or metadata requested by Orca. The Greengage Database explain
command can print out the execution plan for a query in DXL format. But this functionality is specific only to Orca, so Orca extension is the proper place to encapsulate it.
Postgres provides a ExplainOneQuery_hook hook. We’ve used this hook to plug into the explain component, and all Orca-specific code related to DXL-format explain was moved to the extension. If DXL output is required, the orca_explain()
function invokes Orca to create a plan and prints out the plan in DXL format. If Orca isn’t able to create a plan for the query, an appropriate notification is shown. If DXL output is not required, we just call previously registered ExplainOneQuery_hook
or, in case there is no such, the standard explain routine.
Regarding the planner name, only two options were hardcoded in ExplainPrintPlan()
— "Postgres-based planner" and "GPORCA", which is quite not extendable. The decision which name to print was done based on the planGen field of the PlannedStmt
structure. The type of the field was enum
:
typedef enum PlanGenerator
{
PLANGEN_PLANNER, /* plan produced by the planner*/
PLANGEN_OPTIMIZER, /* plan produced by the optimizer*/
} PlanGenerator;
The PLANGEN_PLANNER
value was supposed to correspond to the standard Postgres planner and the PLANGEN_OPTIMIZER
value — to Orca. It left no room for any other planner to plug in. Thus, this field was removed, and a new plannerName
field with the const char *
type was added. Now, any external planner can set up its name during planning, and it will be output correctly by the explain
command.
Step 2.3. Decouple Orca from gp_optimizer functions
There are several Orca-specific functions that can be called from SQL code:
The functions have become a part of the system catalog and are used in Greengage Database infrastructure tools, so they can’t be easily moved out of the core. In case Greengage Database is built without Orca, these functions should still exist but show an appropriate message if called.
To address it, the underlying implementation of DisableXform()
, EnableXform()
, LibraryVersion()
was moved into Orca shared lib. Implementation of the enable_xform()
, disable_xform()
, gp_opt_version()
wrapper functions was updated — now they try to load the underlying functions from a shared lib. An attempt to load a symbol from a shared lib is wrapped into the PG_TRY
& PG_CATCH
block, as the lib may be missing. If the lib is missing, we catch the error and show an appropriate message.
Step 2.4. Decouple Orca from "optimizer" GUC
There are lots of GUCs related to Orca in Greengage Database. But special attention should be paid to the optimizer
GUC. The semantic of it was "enable or disable Orca". Also, we couldn’t leave it untouched because it also had a compile-time dependency on Orca.
As we had been moving Orca out of the core, a logical thought would be to move such GUC outside together with Orca. But historically, too many things rely on this GUC. Moving it outside could blow up the delta and introduce new issues. So, at the current stage, we’ve left it in the core, but with updated semantic — now it defines if a general external planner is enabled or not. It is now the external planner responsibility to check this GUC value and, if it is turned off, just pass the control to the standard planner.
Also, the default value for this GUC was set to off
. Now the external planner should set it to on
at its _PG_init()
via:
SetConfigOption("optimizer", "on", PGC_POSTMASTER, PGC_S_DYNAMIC_DEFAULT);
Step 2.5. Decouple Orca from pg_hint_plan extension
The pg_hint_plan
extension also contained compile-time dependency on Orca. It was so because the plan_hint_hook
hook was defined inside Orca. Leaving one extension dependent on the other one is no good, so we had to rework it as well.
We’ve moved the definition of plan_hint_hook
from ORCA into src/backend/optimizer/plan/planner.c. Although this hook is currently used only in the Orca extension, it makes it possible to use it with any planner that needs to get a hint list from an extension, such as pg_hint_plan
.
Additionally, we’ve made this hook typed and returning a pointer to HintState, as it actually does. There is no reason to keep it generic and returning void *
.
Step 3. Move source files
Once all the steps above had been done, we were able to move the major part of Orca source files into the gpcontrib/orca
extension. This included:
-
src/backend/gporca/* (here the core of Orca resided);
-
src/backend/gpopt/* (here the "glue" to connect Orca with Greengage resided);
-
src/backend/optimizer/plan/orca.c;
-
src/include/optimizer/orca.h;
-
src/include/gpopt/*.
Besides moving the source files, we’ve updated linter and code format tools (src/tools/fmt and src/tools/tidy) and moved them into gpcontrib/orca/tools, as they are used specifically for Orca.
Step 4. Move remaining Orca related GUCs
Apart from the optimizer
GUC already mentioned above, there is a long list of GUCs used internally in Orca. So, once all Orca sources had been moved to the extension, these GUCs were moved there as well.
Known limitations
As it is seen in the chapters above, the planner hook defined in Orca extension passes control to a previously registered hook only if Orca didn’t manage to create a plan. That is because the other hook may call a standard planner, which we do not want. If Orca has created a plan that is our final station, we do not need to plan more.
But other extensions may want to register a planner hook not for the purpose of a new plan creation, but, for example, to monitor the planning or to assist the planner. One of such examples is the pg_hint_plan
extension.
The same is true for the explain hook. If DXL output format is requested, all previously registered hooks will not take control.
The only viable solution, for now, is to invoke all such hooks before Orca is invoked. In consequence, Orca (or any other similar external planner) should be the first shared library to be loaded. So it should appear at the first place in shared_preload_libraries
.
Conclusions
After we’d performed all the steps described above, we’ve come up with a version of Greengage where Orca can be plugged in and out dynamically by adding/removing the orca
string to the shared_preload_libraries
GUC and reloading the cluster. This change has significantly reduced code coupling and increased code cohesion in the product, which is always a good thing to do in terms of software quality.
The key advantages of Orca separation:
-
easier PostgreSQL upgrades;
-
improved modularity and maintainability;
-
potential for a more flexible and extensible Greengage Database platform.
After the refactoring, it is possible to easily add any other external planner, extending standard functionality of Greengage Database. Thus, it can give a strong boost to all Greengage community efforts.