Delete orphaned files in ADB

Overview

Orphaned files are data files that are not related to any database table. They can remain in the file system after a crash of some backend process associated with a transaction that creates a new table. Possible crash causes include hardware failures, lack of free disk space, and so on. As a result of successful recovery after failure, uncommitted changes are missing in the database (the new table does not exist in the system catalog), but data files of that table may remain on segment hosts. Over time, such abandoned files may increase in number, wasting valuable storage space. The process of searching for and deleting such files is time-consuming, and there are always risks associated with removing valuable data.

Starting with 6.29.0.3, ADB supports automatic tracking and deletion of orphaned files at the DBMS core level in case of the transaction interruption. In previous ADB versions, data files were automatically deleted for those transactions that were aborted normally (via ABORT), but orphaned files could be deleted only manually (for example, by searching them via functions and views of the gp_check_functions extension).

The new approach to tracking and deleting orphaned files that is implemented in ADB 6.29.0.3 is based on the following key concepts:

  • In case of a commit or rollback of a transaction that creates a table, the DBMS core, like before, independently determines how to process the table data files in normal mode. In case of ABORT, files are deleted; in case of COMMIT PREPARED they remain.

  • Only transactions in the TRANSACTION_STATUS_IN_PROGRESS status are considered as candidates for deleting orphaned files. The DBMS core suggests that transactions in that status can abort abnormally and leave orphaned files. Transactions in all other statuses are not considered, as they are processed in the standard way (see the previous point).

  • The decision to delete a certain orphaned file is automatically made at the recovery startup.

IMPORTANT
  • For more information on the causes of orphaned files, approaches to solving that problem in other DBMS, and implementation details from the ADB/Greengage DB team, see Deleting orphaned files in the technical blog.

  • Remember that installation of ADB 6.29.0.3 does not allow you to delete already existing orphaned files, but it will prevent them from being produced in the future.

Configuration

You can manage tracking and deleting orphaned files for interrupted transactions using the gp_track_pending_delete configuration parameter (GUC). The default parameter value is on, which means using the new approach implemented in ADB 6.29.0.3. If for any reason you need to disable that approach, change the parameter value to off. Note that the gp_track_pending_delete parameter value can only be modified by editing the postgresql.conf file on all segments using the gpconfig utility; and you need to restart ADB to apply your changes (reload of the configuration file is not sufficient).

The following is an example of how to change the gp_track_pending_delete value:

  1. Connect to the ADB master host and login under the default user name gpadmin:

    $ sudo su - gpadmin
  2. Get the current parameter value:

    $ gpconfig -s gp_track_pending_delete

    Result:

    Values on all segments are consistent
    GUC          : gp_track_pending_delete
    Master  value: on
    Segment value: on
  3. Change the parameter value on all segments:

    $ gpconfig -c gp_track_pending_delete -v off

    Result:

    20250827:10:10:29:2622827 gpconfig:bds-mdw:gpadmin-[INFO]:-completed successfully with parameters '-c gp_track_pending_delete -v off'
  4. Restart DBMS:

    $ gpstop -r -a
  5. Check the current parameter value again to ensure that your changes have been applied:

    $ gpconfig -s gp_track_pending_delete

    Command output:

    Values on all segments are consistent
    GUC          : gp_track_pending_delete
    Master  value: off
    Segment value: off
Found a mistake? Seleсt text and press Ctrl+Enter to report it