Конференция Arenadata
Новое время — новый Greenplum
Мы приглашаем вас принять участие в конференции, посвященной будущему Open-Source Greenplum 19 сентября в 18:00:00 UTC +3. Встреча будет проходить в гибридном формате — и офлайн, и онлайн. Онлайн-трансляция будет доступна для всех желающих.
Внезапное закрытие Greenplum его владельцем — компанией Broadcom - стало неприятным сюрпризом для всех, кто использует или планирует начать использовать решения на базе этой технологии. Многие ожидают выхода стабильной версии Greenplum 7 и надеются на её дальнейшее активное развитие.
Arenadata не могла допустить, чтобы разрабатываемый годами Open-Source проект Greenplum прекратил своё существование, поэтому 19 сентября мы представим наш ответ на данное решение Broadcom, а участники сообщества получат исчерпывающие разъяснения на все вопросы о дальнейшей судьбе этой технологии.

На конференции вас ждёт обсуждение следующих тем:

  • План возрождения Greenplum;
  • Дорожная карта;
  • Экспертное обсуждение и консультации.
Осталось до события

SSM rule usage examples

Overview

This article contains advanced examples of rule usage in SSM. Before running the examples, complete the steps below:

  1. Deploy SSM.

  2. Configure the storage types such as SSD, DISK, and ARCHIVE in the hdfs-site.xml → dfs.datanode.data.dir parameter in ADCM.

    Storage types
    Storage types

Cache files

 
If you want to move frequently accessed files to cache, you can use the rule like below:

file: accessCount(1min) > 0 and path matches "/user/sergei/demoCacheFile/*" | cache

With this rule active, SSM caches all files in the demoCacheFile directory which were accessed at least once during the last minute.

For test purposes, several files were created in the user/sergei/demoCacheFile directory in HDFS. The command below prints out the filenames:

list -file /user/sergei/demoCacheFile
Action starts at Wed Feb 14 14:05:59 UTC 2024 : List /user/sergei/demoCacheFile
-rw-r--r--     3 sergei	sergei	           10 2024-02-14 14:04 /user/sergei/demoCacheFile/f1.txt
-rw-r--r--     3 sergei	sergei	           13 2024-02-14 14:04 /user/sergei/demoCacheFile/f2.txt
-rw-r--r--     3 sergei	sergei	           12 2024-02-14 14:04 /user/sergei/demoCacheFile/f3.txt

To check that the example works, try accessing one file by running the read action:

read -file /user/sergei/demoCacheFile/f1.txt

If there are no errors, you should be able to see that file on the Cluster → Data Temperature → Files in Cache page.

Files in cache
Files in cache
Files in cache
Files in cache

Move data between cold storage and hot storage

 

IMPORTANT
In order for this example to work, make sure you have configured the SSD and ARCHIVE storage types in HDFS.

To move files from cold storage to hot storage, use the rule like below:

file: accessCount(1min) > 0 and path matches "/user/sergei/demoCold2Hot/*" | allssd

To check the storage type, call the checkstorage action in the Run Action pane:

checkstorage -file /user/sergei/demoCold2Hot/f1.txt

Before the rule application, the command output is:

File offset = 0, Block locations = {10.92.41.28:9866[DISK] 10.92.41.202:9866[DISK] 10.92.41.33:9866[DISK] }

After triggering the condition by reading the file, the rule got applied and the output became:

File offset = 0, Block locations = {10.92.41.33:9866[SSD] 10.92.41.28:9866[SSD] 10.92.41.202:9866[SSD] }

Now, you can also see the file on the Cluster → Data Temperature → Hot files page.

Hot files
Hot files
Hot files
Hot files

To move files from hot storage to cold storage, use the rule below:

file : age > 3min and path matches "/user/sergei/demoHot2Cold/*" | archive

Sync data

 
To synchronize files in a directory with another HDFS cluster, use the rule like below:

file: path matches "/user/sergei/demoSyncSrc/*" | sync -dest hdfs://stikhomirov-adh1.ru-central1.internal/demoSyncDest

If the SSM cluster has a special namespace configured, you can substitute the namespace name for a host name in the command above.

NOTE
In order for the sync command to work, at least one server (source or destination) needs to have SSM installed. Also, the source directory doesn’t have to be local, just as the destination doesn’t have to be on the remote server.

To check the correctness, compare the checksums. You can call the checksum command in the Run Action pane:

checksum -file /user/sergei/demoSyncSrc/f1.txt

The output for this particular file is:

/user/sergei/demoSyncSrc/f1.txt	MD5-of-0MD5-of-512CRC32C	00000200000000000000000049c19b14a1077d0a6fc95bee2db8914c

To check the checksum on the destination server, call the following command there:

$ hadoop fs -checksum /demoSyncDest/f1.txt

The checksum matches the one above:

/demoSyncDest/f1.txt    MD5-of-0MD5-of-512CRC32C        00000200000000000000000049c19b14a1077d0a6fc95bee2db8914c

If the checksums don’t match or there are some errors, open the Actions page and see logs about your action.

Found a mistake? Seleсt text and press Ctrl+Enter to report it