Phoenix performance tuning
This article contains recommendations on how to optimize the performance of Phoenix in ADH.
Configuration tuning
Connection
Parameters described below are related to connection details in Phoenix.
| Parameter | Default and description | Recommendation | 
|---|---|---|
| phoenix.query.timeoutMs | Default:  Defines the time the query is allowed to run before it is terminated on the client side. This prevents long-running or runaway queries from consuming excessive resources and potentially causing performance issues | Set a relatively conservative value ( This parameter has to be added manually in the Custom hbase-site.xml section of the HBase service configuration page | 
| phoenix.query.threadPoolSize | Default:  This parameter defines the maximum number of threads available in the thread pool dedicated to executing incoming Phoenix queries. Each thread handles a single query (or a portion of a complex query) | Start with the default value and run the typical workload. Monitor the key metrics: 
 If the CPUs are underutilized, the query latency is high, or query queue depth is consistently high, increase the parameter by 25%. If the CPUs are maxed out and the RAM is exhausted, decrease the parameter by 25%. After each adjustment, rerun the workload and monitor the metrics again. This parameter has to be added manually in the Custom hbase-site.xml section of the HBase service configuration page | 
| hbase.client.scanner.timeout.period | Default:  This parameter defines a limit for a scanner inactivity before the Region server closes it | Start with the default value and run the typical workload.
Search the logs for  This parameter is located in the hbase-site.xml section of the HBase service configuration page | 
| zookeeper.session.timeout | Default:  This parameter defines the timeout for ZooKeeper sessions. HBase (and therefore Phoenix) relies on ZooKeeper for coordination. If a Region server loses its ZooKeeper session, it can lead to connection issues and instability | In environments with unstable networks or high ZooKeeper load, consider increasing this value. However, a higher value means it will take longer to detect a truly failed Region server. Do not set this too low, as transient network hiccups can cause unnecessary disconnections. This needs to be coordinated with the ZooKeeper configuration itself. This parameter is located in the hbase-site.xml section of the HBase service configuration page | 
Query
Parameters described below are related to query performance in Phoenix.
| Parameter | Default and description | Recommendation | 
|---|---|---|
| phoenix.query.spoolThresholdBytes | Default:  This parameter limits the amount of memory a query can use for sorting or grouping results before spilling data to disk for temporary storage.
It is a critical parameter for queries that use  | Set this parameter to  
 If any of those metrics are too high — double the parameter value, rerun the workload and monitor the metrics again. This parameter has to be added manually in the Custom hbase-site.xml section of the HBase service configuration page | 
| phoenix.query.maxRows | Default:  This parameter limits the maximum number of rows that a single query can return.
If a query attempts to return more rows than this limit, Phoenix will throw an exception ( | Start with the default value and run the typical workload.
Monitor query execution and identify any queries that return unexpectedly large result sets or cause performance problems.
Examine query logs to understand the typical result set sizes for different types of queries.
Identify any queries that consistently return a very large number of rows.
Based on your findings, set the parameter value that is high enough to accommodate most of your legitimate queries but low enough to prevent runaway queries from overwhelming the system.
Set a relatively conservative value (e.g.  This parameter has to be added manually in the Custom hbase-site.xml section of the HBase service configuration page | 
| phoenix.query.resultset.cache.enabled | Default:  This parameter defines whether the client-side caching of query results is enabled. If enabled, Phoenix will cache the results of queries, allowing subsequent identical queries to return results from the cache instead of re-executing on the server | Enable this parameter if you have frequently executed, read-heavy queries with stable data. This can drastically improve read performance. However, it increases client memory consumption and introduces a potential for stale data if the underlying data changes frequently and the cache is not invalidated. Monitor client memory and data freshness. This parameter has to be added manually in the Custom hbase-site.xml section of the HBase service configuration page | 
| phoenix.query.optimizeMaxParallelScans | Default:  This parameter defines the maximum number of parallel scans that Phoenix can initiate for certain types of queries (e.g. queries that involve multiple parallel scans across different regions or tables) | If you have many regions per table or a highly distributed dataset, increasing this value might improve performance for complex analytical queries that can benefit from parallelism. Increase with caution, as a very high value can put significant load on the cluster meta-data services and on the network. Monitor Region server and ZooKeeper activity. This parameter has to be added manually in the Custom hbase-site.xml section of the HBase service configuration page | 
| phoenix.mutate.batchSize | Default:  This parameter specifies the number of mutation operations ( | Increase this value for higher write throughput if you are performing bulk loading or batch updates. Start by doubling the default value and monitor performance. If you are experiencing  This parameter has to be added manually in the Custom hbase-site.xml section of the HBase service configuration page | 
| phoenix.mutate.maxSizeBytes | Default:  Defines the maximum size of a single mutation batch. This prevents very large mutations from overwhelming the region servers | If you are inserting rows with very large values (e.g. large binary data) and the default limit is consistently exceeded, increase this value. Be careful not to set this too high, as it can lead to region server issues. If you are seeing errors related to oversized mutations, decrease this value. This parameter has to be added manually in the Custom hbase-site.xml section of the HBase service configuration page | 
| CAUTIONThe last two properties ( phoenix.mutate.batchSizeandphoenix.mutate.maxSizeBytes) work together.
The batch will be sent when either limit is reached. | 
JDBC URL
Parameters described below can be used in the JDBC URL string when connecting to Phoenix.
| Parameter | Default and description | Recommendation | 
|---|---|---|
| phoenix.query.client.fetchSize | Default:  This parameter specifies the number of rows that the client requests from the server in each fetch operation during query execution. A larger fetch size reduces the number of network round trips, potentially improving performance for queries that return many rows | Increase the value for queries that retrieve a large number of rows to reduce network latency overhead.
Decrease the value if the memory on the client becomes an issue or if network conditions are poor, causing large fetches to time out.
It is often beneficial to match this value with the  | 
| phoenix.statement.prefetchRows | Default:  This parameter specifies the number of rows pre-fetched into the client local buffer.
When set to a value greater than  | Specify a value greater than  | 
| phoenix.schema.isNamespaceMappingEnabled | Default:  This parameter determines whether Phoenix should use HBase namespaces to represent Phoenix schemas | Set this parameter to  | 
| phoenix.use.stats | Default:  This parameter defines whether the use of statistics is enabled when planning queries | Leave this value as  | 
| async | Default:  This parameter defines whether asynchronous queries are enabled. This allows the client to submit queries and receive results later | Set this parameter to  | 
Region server
Parameters described below are related to the Region servers functioning in respect to Phoenix.
| Parameter | Default and description | Recommendation | 
|---|---|---|
| hbase.rpc.timeout | Default:  This parameter defines the general timeout for RPC calls to HBase region servers. If a client does not receive a response from a Region server within this time, the RPC call is considered failed. Phoenix uses RPC calls for virtually all operations | If you are frequently seeing  This parameter has to be added manually in the Custom hbase-site.xml section of the HBase service configuration page | 
| phoenix.coprocessor.maxServerCacheBytes | Default:  This parameter defines the server side cache size for query results within the coprocessor on the Region servers. This can significantly improve performance for frequently executed queries with relatively small result sets. The value is defined in bytes | If you have many repeated queries, start with a modest value (e.g.  This parameter has to be added manually in the Custom hbase-site.xml section of the HBase service configuration page | 
| phoenix.index.builder.threads | Default:  This parameter specifies the number of threads that will be used to build global indexes. Global indexes are built asynchronously in the background, and this parameter controls the level of parallelism used for index building | If you have a large number of rows to index and want to speed up the index building process, increase this value up to the number of available CPU cores on your Region servers. If the index building process is putting too much load on the Region servers, decrease this value. This parameter has to be added manually in the Custom hbase-site.xml section of the HBase service configuration page | 
Database tuning
Row key design
The row key is the most important design consideration in HBase and, therefore, in Phoenix. A well-designed row key can significantly improve query performance and data locality. A poorly designed row key can lead to hotspots, slow queries, and overall system inefficiency.
Below is the list of key considerations for row key design:
- 
Query patterns. The row key should be chosen based on how you intend to query the data. Identify the most common and performance-critical queries and design the row key to optimize for those. Consider the following: - 
Range scans. If you frequently need to retrieve data within a specific range, the row key should include the date or a derivative of it as a prefix. 
- 
Point lookups. If you need to retrieve individual rows quickly based on a unique identifier, make that identifier the row key. 
- 
Prefix-based queries. If you often use queries based on a prefix of a specific attribute, include that attribute at the beginning of the row key. 
 
- 
- 
Hotspotting. Avoid row key designs that lead to hotspotting. It occurs when a disproportionate amount of data is written to or read from a single Region server, creating a bottleneck. Common causes: - 
Monotonically increasing keys. Using timestamps, sequence numbers, or other monotonically increasing values as the row key will direct all new writes to a single region, overloading that server. 
- 
Sequential keys. Similar to monotonically increasing keys, sequential keys can also cause hotspots if they are concentrated within a small range. 
 
- 
- 
Salting. This is a technique to mitigate hotspotting by adding a prefix to the row key to distribute writes across multiple regions. 
- 
Pre-splitting. Pre-splitting involves creating multiple regions upfront to distribute the initial load more evenly. This is especially important for large datasets. 
- 
Key length. Keep row keys as short as possible without sacrificing uniqueness or query performance. Shorter keys reduce storage space and improve scan performance. 
- 
Composite keys. Phoenix supports composite row keys, which combine multiple columns into a single key. This is useful when you need to execute queries based on a combination of attributes. - 
Order of columns. The order of columns in the composite key is significant. Place the most frequently queried columns at the beginning. 
- 
Data types. Choose appropriate data types for each column in the composite key to optimize storage space and performance. 
 
- 
Example
Consider a table storing sensor data with the following columns:
- 
sensor_id
- 
timestamp
- 
temperature
- 
humidity
If you frequently query for sensor data within a specific time range, you might design the composite row key as a combination of sensor_id and timestamp.
However, if the number of sensors is small and writing to specific time ranges is concentrated, you will have the hotspotting issue.
In that case, you would need to salt the sensor_id with some number of salt regions.
Column family design
In HBase and Phoenix, the data is stored in column families. Choosing the right column family design is crucial for optimizing storage efficiency, query performance, and overall system scalability.
Below is the list of key considerations for column family design:
- 
Locality of reference. Columns that are accessed together frequently should be grouped into the same column family. This improves read performance because HBase retrieves entire column families at once. Conversely, columns that are rarely accessed together should be placed in separate column families. 
- 
Number of column families. Minimize the number of column families in a table. Each column family has associated overhead in terms of storage and memory. A small number of well-designed column families is generally preferable to a large number of fragmented families. Aim for a few (2-3) column families. 
- 
Storage characteristics. Consider the storage characteristics of the data within each column family: - 
Data type. Columns with similar data types (e.g. strings, numbers, binaries) can be grouped together to optimize storage and compression. 
- 
Data size. Columns containing large objects (e.g. images, documents) should be placed in their own column family to avoid impacting the performance of other frequently accessed data. 
 
- 
- 
Schema evolution. Column families provide a degree of flexibility for schema evolution. You can add new columns to an existing column family without affecting other families. However, consider the impact of adding a large number of columns to a single column family, as it can affect performance. 
- 
Column family naming. Choose descriptive and meaningful names for your column families, but keep them as short as possible. 
- 
Compression. Column families can be configured with different compression algorithms. Choose the appropriate compression algorithm based on the data type and access patterns within each family. 
- 
In-memory storage. You can configure a column family to be stored in memory for faster access. This is useful for frequently accessed data that is relatively small. However, be mindful of memory usage on your region servers. 
Example
Consider a table storing user profile information with the following columns: user_id, username, email, first_name, last_name, address, city, state, zip, and profile_picture. You might design the column families as follows:
- 
info:username,email,first_name,last_name
- 
address:address,city,state,zip
- 
media:profile_picture
This design groups the frequently accessed user information into the info column family, separates the address information into its own family, and places the large profile picture into a separate family to avoid impacting the performance of other queries.
Indexing strategies
Phoenix provides powerful indexing capabilities to accelerate query performance on HBase data. Unlike traditional relational databases, where indexes are typically created on a small number of columns, Phoenix allows you to create indexes on any combination of columns, including expressions and functions. Choosing the right indexing strategy is crucial for optimizing query performance and minimizing index maintenance overhead.
There are two main types of indexes in Phoenix: global and local.
Global indexes
Global indexes are stored in a separate HBase table and cover all rows in the original table. When a query uses a global index, Phoenix retrieves the data from the index table instead of the original table.
Global indexes are best suited for read-heavy workloads where queries frequently filter on the indexed columns. They are updated synchronously during data writes. This can impact write performance, especially for large datasets.
Global indexes are not usable if the underlying table is not available.
Local indexes
Local indexes are stored on the same region servers as the original table data. When a query uses a local index, Phoenix retrieves the data from the index table and the original table within the same region server.
Local indexes are ideal for write-heavy workloads where queries need to filter on non-leading columns of the row key or columns that are not part of a global index. They are updated asynchronously after data writes. This minimizes the impact on write performance.
Local indexes are available even if the underlying table is not available.
Syntax examples for creating a global and a local index:
CREATE INDEX my_index ON my_table (column1, column2);
CREATE LOCAL INDEX my_index ON my_table (column1, column2);Data compression
Data compression is a crucial technique for reducing storage space and improving I/O performance in HBase and Phoenix. By compressing data before it is stored, you can significantly reduce the amount of disk space required and the time it takes to read and write data. However, compression also introduces overhead due to the CPU cycles required to compress and decompress data. Choosing the right compression algorithm is a trade-off between storage savings, I/O performance, and CPU usage.
HBase supports several compression algorithms listed below, each with its own characteristics:
- 
NONE. No compression is used. This is the default algorithm. It provides the fastest read and write performance but consumes the most storage space. 
- 
LZO. A fast and widely used compression algorithm that provides a good balance between compression ratio and CPU usage. Requires separate installation and configuration. 
- 
GZIP. A popular compression algorithm that offers a high compression ratio but consumes more CPU cycles than LZO. Suitable for data that is not frequently accessed. 
- 
Snappy. A fast compression algorithm that is designed for speed rather than high compression ratios. Ideal for data that is frequently accessed and requires low latency. 
- 
LZ4. Extremely fast compression algorithm but offers lower compression ratios than other algorithms. 
The best compression algorithm for your data depends on several factors:
- 
Data type. Some compression algorithms are better suited for certain data types than others. For example, GZIP is often a good choice for text data, while Snappy is suitable for binary data. 
- 
Access patterns. If your data is frequently accessed, you should choose a compression algorithm that offers low latency, such as Snappy or LZ4. If your data is rarely accessed, you can choose a compression algorithm that provides a high compression ratio, such as GZIP. 
- 
CPU usage. Be mindful of the CPU overhead of compression. If your system is already CPU-bound, you may want to choose a compression algorithm that is less CPU-intensive, such as LZO or Snappy. 
- 
Storage capacity. If your storage capacity is limited, you should choose a compression algorithm that provides a high compression ratio. 
Data locality
Data locality is a key concept in HBase and Phoenix that refers to the proximity of data to the compute resources that need to access it. Maximizing data locality is crucial for achieving high performance because it reduces the amount of data that needs to be transferred over the network, which can be a significant bottleneck.
HBase stores data in regions, which are distributed across Region servers in the cluster. Each Region server is responsible for managing a subset of the data. When a client application requests data, HBase attempts to serve the request from the region server that is closest to the data.
Data locality is affected by the following factors:
- 
Row key design. The row key is the most important factor affecting data locality. A well-designed row key will group related data together on the same region server, making it easier to access the data efficiently. 
- 
Region splitting and merging. HBase automatically splits and merges regions based on data size and load. If regions are not split and merged appropriately, data locality can suffer. It is important to monitor region sizes and configure the region splitting and merging policies accordingly. 
- 
Data placement. HBase attempts to place data on Region servers that are close to the data consumers. However, factors such as network topology and rack awareness can affect data placement. 
- 
Compaction. Compaction is the process of merging and rewriting data files in HBase. Compaction can improve data locality by reorganizing data on disk. 
- 
Bulk loading. Using bulk loading to load data into HBase allows you to pre-sort your data which can increase data locality. 
You can use the following strategies for improving the data locality:
- 
Optimal row key design. A well-designed row key is essential for data locality. Choose a row key that groups related data together on the same region server. Consider salting to prevent hotspotting, while still maintaining locality for related data within the same salted region. 
- 
Region pre-splitting. Pre-splitting the table into multiple regions can improve data locality by distributing the data across multiple Region servers from the start. 
- 
Rack awareness. Configure HBase to be aware of the rack topology of your cluster. This allows HBase to place data on region servers that are in the same rack as the data consumers. 
- 
HDFS configuration. Properly configure HDFS to ensure that data is stored close to the region servers. 
Query optimization techniques
Query execution plan
Understanding the query execution plan is fundamental to optimizing SQL queries in Phoenix. The execution plan outlines the steps Phoenix will take to retrieve and process the data. By analyzing this plan, you can identify potential bottlenecks and areas for improvement.
You can obtain the query execution plan using the EXPLAIN PLAN statement in Phoenix.
For example:
EXPLAIN PLAN SELECT * FROM users WHERE age > 30 AND city = 'New York';Result example:
CLIENT 1-WAY PARALLEL 1/1 [SERVER FILTER]
    SERVER FILTER BY "AGE" > 30
        CLIENT 1-WAY FULL SCAN OVER USERS
The execution plan is typically presented as a tree structure, with the root node representing the final result and the leaf nodes representing the data sources. Each node in the tree describes an operation that Phoenix will perform. Below is a table with the description of the execution plan operations.
| Operation | Description | 
|---|---|
| SERIAL | Indicates that the operation will be performed serially on a single region server. This is generally less efficient than parallel execution | 
| PARALLEL <n>-WAY | Indicates that the operation will be performed in parallel across multiple region servers, with  | 
| FULL SCAN OVER <table_name> | Indicates a full table scan, meaning that Phoenix will read every row in the table to find the matching rows. Full table scans are generally inefficient, especially for large tables. A full scan over an index table is not necessarily inefficient | 
| RANGE SCAN OVER <table_name> <startRow, stopRow> | Indicates a range scan, where Phoenix will read only a subset of the rows in the table based on a range of row keys. This is more efficient than a full table scan, but it can still be inefficient if the range is larger than necessary. An index table will define its ranges differently from the original table | 
| POINT LOOKUP OVER <table_name> <rowKey> | Indicates a point lookup, where Phoenix will directly retrieve a single row based on its row key. This is the most efficient type of read operation | 
| CLIENT <op type> | Indicates that the query execution is happening on the client side | 
| JOIN <join type> | Indicates the type of the  | 
| FILTER <predicate> | Indicates that Phoenix is filtering the data based on a predicate (the  | 
| AGGREGATE <aggregation function> | Indicates that Phoenix is performing an aggregation function (e.g.  | 
| SORT | Indicates that the results are being sorted. Sorting can be expensive, so avoid it if possible. If you can use an index to provide the required sort order, do so | 
Based on the analysis of the execution plan, you can take steps to improve query performance.
In the example above, you could create an index on the age and city columns like this:
CREATE INDEX idx_users_age_city ON users (age, city);After creating the index, the execution plan might look like the following:
CLIENT 1-WAY RANGE SCAN OVER IDX_USERS_AGE_CITY [30,]
    SERVER FILTER BY "CITY" = 'New York'
SQL efficiency
Writing efficient SQL queries is paramount for optimal performance in Phoenix. While Phoenix leverages HBase for storage, inefficient SQL can still lead to slow query execution, even with properly indexed tables. This section focuses on crafting SQL queries that minimize the workload on Phoenix and maximize resource utilization.
Key principles for writing efficient SQL:
- 
Select only necessary columns. Explicitly specify the columns you need in the SELECTclause. Using theSELECT *clause retrieves all columns from the table, even if you only need a few. This increases the amount of data that needs to be read, transferred, and processed.
- 
Filter early and efficiently. Applying filters late in the query execution (e.g. after a full table scan) can be very inefficient. Use the WHEREclause to filter data as early as possible. Ensure that yourWHEREclause leverages available indexes. Place the most restrictive conditions in theWHEREclause first.
- 
Use the correct data types. Inconsistent data types in comparisons and predicates can prevent Phoenix from using indexes effectively and may lead to type conversion overhead. Ensure that the data types in your SQL queries match the data types of the corresponding columns in the table schema. 
- 
Leverage built-in functions efficiently. Some built-in functions can be more efficient than others. Understand the performance characteristics of different built-in functions. Use functions that are optimized for HBase/Phoenix. 
- 
Avoid unnecessary subqueries. Subqueries can often be rewritten as joins, which can be more efficient in Phoenix. Correlated subqueries (subqueries that depend on the outer query) can be particularly problematic. Try to rewrite subqueries as joins whenever possible. Consider denormalization as an alternative if the subquery is used frequently. 
- 
Limit the result set. Retrieving large result sets can put a strain on the client and network. Use the LIMITclause to restrict the number of rows returned. If pagination is needed, useLIMITandOFFSETin conjunction.
- 
Use COUNT(*) carefully. COUNT(*)on a large table without aWHEREclause can be very expensive, as it requires scanning the entire table. If you need to count all rows in a table, consider using a pre-aggregated count or a lightweight approach if the table is not excessively large. If possible, add aWHEREclause to restrict the count to a smaller subset of the data.
- 
Leverage UPSERT SELECT for efficient data updates and inserts. Inserting data row-by-row can be very inefficient. Use UPSERT SELECTto insert or update data in batch. This is especially useful for loading data from other tables or external sources.
JOIN optimization
Joins in Phoenix can be relatively expensive operations compared to point lookups or range scans. Therefore, optimizing joins is crucial for achieving good performance when working with related data across multiple tables. HBase distributed nature makes join performance a more significant concern than it is in traditional relational databases. This section offers strategies for optimizing joins in Phoenix, including choosing the right join type, leveraging data locality, and, when appropriate, avoiding joins altogether through denormalization.
Phoenix supports several join types and related strategies, each with its own performance characteristics:
- 
Star joins. Optimized for scenarios where one large table (the fact table) is joined with multiple smaller dimension tables. Phoenix attempts to broadcast only the parts of the dimension tables required by the query to all relevant region servers (those containing data from the fact table in the join), allowing for efficient local joins. This is often the preferred join strategy in Phoenix when the data model and query patterns support it. Phoenix can typically determine if it is beneficial to execute a star join based on the data statistics available to it. 
- 
Broadcast joins. Suited specifically for scenarios where one table is genuinely small (fits comfortably in memory on each region server) and is joined with one or more larger tables. Phoenix broadcasts the entire small table to all region servers. This is different from a star join where only relevant parts of the dimension tables are sent. This approach can yield excellent performance because the join operations happen locally on each region server, minimizing data transfer. Phoenix makes use of this optimization automatically when applicable. 
- 
Sort-merge joins. While Phoenix supports sort-merge joins, they are generally less efficient than star or broadcast joins, especially in distributed environments like HBase. They involve sorting the input tables based on the join keys and then merging the sorted results. The performance bottleneck arises from the sorting overhead and the potential for data shuffling across the network. Phoenix typically resorts to sort-merge joins only when other more efficient join strategies cannot be applied. This might occur if the tables are very large and cannot be effectively broadcast, or if the join keys are not conducive to a hash-based join. 
Possible strategies for optimizing the joins:
- 
Denormalization. Often the best optimization for joins is to avoid them entirely. Consider denormalizing data by embedding related information directly into the fact table or creating pre-joined views. For example, instead of joining ordersandcustomerstables to retrieve customer information with each order, you could include relevant customer details (e.g. customer name, city) directly in theorderstable. However, denormalization can lead to data redundancy and potential consistency issues, so carefully weigh the pros and cons based on your specific use case.
- 
Data locality. Place related tables on the same Region servers to minimize network traffic during joins. This can be achieved through co-location strategies in HBase. 
- 
Choosing the correct join order. The order in which tables are joined can significantly impact performance. Generally, it is best to start with the table that has the most restrictive filter and join it with smaller tables first. Filtering early reduces the amount of data that needs to be processed in subsequent join operations. 
- 
Filtering before joining. Apply the WHEREclauses to filter data from each table before theJOINoperation. Reducing the size of the tables being joined improves join performance.
- 
Broadcast joins for small tables. For joins involving a small dimension table and a large fact table, use a broadcast join. Broadcast joins copy the smaller table to all region servers, allowing for local joins with the larger table. Phoenix may automatically choose a broadcast join when it detects a small table. You can often influence this decision using hints. 
- 
Leverage the star join optimization. If your data model conforms to a star schema (one large fact table joined with multiple smaller dimension tables), take advantage of Phoenix star join optimization. 
Hints
Hints are directives that you can add to your SQL queries to influence Phoenix query execution plan. They provide a way to guide the optimizer and potentially improve performance when the default execution plan is not optimal. While Phoenix automatically optimizes queries, hints allow you to override its decisions in specific situations. Use hints carefully, as they can make your queries less portable and harder to maintain.
Hints are placed after the SELECT, INSERT, UPDATE, or DELETE keywords and have the following syntax:
SELECT /*+ hint1, hint2, ... */ column1, column2 FROM table_name WHERE condition;Considerations when using hints:
- 
Use with caution. Hints override the default query optimizer, potentially masking underlying performance issues or leading to sub-optimal plans if used incorrectly. 
- 
Test thoroughly. Always test the performance of your queries with and without hints to ensure that the hint is actually improving performance. 
- 
Retain maintainability. Hints can make your queries less portable and harder to maintain. Document the reasons for using a hint and monitor its effectiveness over time. 
- 
Keep in mind the optimizer changes. Phoenix query optimizer might change in future versions, rendering your hints ineffective or even detrimental. Regularly review and update your hints as needed. 
- 
Understand the execution plan. Use the EXPLAIN PLANfunction to understand how your hints are affecting the query execution plan.
Query server
The Phoenix Query Server (PQS) provides a way to access Phoenix from clients that do not have direct access to the HBase cluster. It acts as a proxy, allowing clients to submit SQL queries over a thin JDBC driver, abstracting away the complexities of interacting directly with HBase. This approach is beneficial in scenarios where security restrictions, network configurations, or client-side dependencies prevent direct connections to the HBase cluster.
The Phoenix Query Server follows a client-server architecture:
- 
Client. The client application uses a thin JDBC driver to connect to the PQS. This driver is often more lightweight than the standard Phoenix JDBC driver because it does not need to handle direct HBase connections. 
- 
Phoenix Query Server. The PQS is a standalone process that runs outside of the HBase cluster (although it is often co-located with HBase nodes). It receives SQL queries from clients, translates them into Phoenix queries, executes them against the HBase cluster, and returns the results to the client. The PQS handles tasks such as: - 
Authentication and authorization. 
- 
Query parsing and optimization (though it leverages Phoenix internal optimizer). 
- 
Connection pooling to HBase. 
- 
Result set management. 
 
- 
- 
HBase Cluster. The HBase cluster stores the data and executes the actual data access operations. The PQS connects to the HBase cluster using the standard Phoenix JDBC driver. 
Benefits of using the Phoenix Query Server:
- 
Simplified client access. Clients can access Phoenix data using a standard JDBC interface, without needing to install the full Phoenix client libraries or have direct access to the HBase cluster. 
- 
Security. The PQS provides a central point for authentication and authorization, allowing you to control access to Phoenix data. You can restrict direct access to the HBase cluster, enhancing security. 
- 
Network isolation. The PQS can be deployed in a DMZ or other secure network zone, isolating the HBase cluster from direct client access. This is crucial for security in many organizations. 
- 
Resource management. The PQS can manage connections to the HBase cluster, preventing clients from overloading the cluster with too many connections. It enables connection pooling and resource management. 
- 
Language independence. Clients can be written in any language that supports JDBC, providing flexibility in application development. 
- 
Thin JDBC driver. The JDBC client is lightweight, making it easy to deploy on various platforms.