Use filters in HBase
Overview
HBase in ADH offers a Thrift filter language. It allows you to filter the results of reading the data from HBase using the get
or scan
operations. Filtering occurs on the server side, so it doesn’t reduce the load on the HBase. It does reduce, however, the load on the network as less data is transmitted to a client. Filters can be used both while using the Java API and in the HBase shell.
HBase filter syntax
Simple filter
A simple filter is specified by a string of the following kind:
"<FilterName> (<arg1>, <arg2>, ..., <argN>)"
where:
-
<FilterName>
is a name of one of the individual filters; -
<arg1>
,<arg2>
etc. are the arguments of that filter.
Example of a simple filter usage in HBase shell:
scan 'articles', { FILTER => "ColumnPaginationFilter (2, 1)"}
Full set of arguments is the filter condition. Different individual filters may require zero or more arguments.
Arguments may have the following types:
-
string;
-
integer;
-
boolean;
Timestamps
Timestamps in HBase shell table scan results are shown in human-readable format. However, when a filter requires a timestamp as an argument value, it should be specified in Unix epoch format. To convert timestamps from one format to another, use online tools like EpochConverter.
NOTE
HBase requires millisecond precision for timestamps. You need to manually add milliseconds to the timestamps after conversion to Unix epoch format.
|
Comparison operators and comparators
Comparison operators and comparators are used in filter arguments to compose conditions for lexicographic matches and comparisons.
The following comparison operators are used as filter arguments.
Syntax | Description |
---|---|
< |
Less |
> |
Greater |
= |
Equal |
<= |
Less or equal |
>= |
Greater or equal |
!= |
Not equal |
The following comparators are used as filter arguments.
Name | Description | Example |
---|---|---|
BinaryComparator |
Lexicographically compares against the specified string |
(>, 'binary:arc') This comparator will match everything lexicographically greater than |
BinaryPrefixComparator |
Lexicographically compares against a specified string. It only compares up to the length of this string |
(=, 'binaryprefix:bot') This comparator will match everything that begins with |
RegexStringComparator |
Lexicographically compares against the specified regular expression for a string. Only |
(!=, 'regexstring:be*ng') This comparator will match any string except those beginning with |
SubStringComparator |
Searches for the given substring, case insensitive. Only |
(!=, substring:con) This comparator will match any string except those containing |
Compound filter
A compound filter consists of individual filters united by logical operators. There are binary and unary logical operators. Binary operators unite the filters to the left and right of them:
-
AND
— the key/value is only returned if it satisfies both filters. -
OR
— the key/value is returned if it satisfies either filter.
Unary operators precede the filter and modify its behavior:
-
SKIP
— if any of the key/value pairs does not satisfy the filter condition, the entire row is skipped. -
WHILE
— for a particular row, the key/value pairs keep being emitted until one of them fails to satisfy the filter condition.
To build an order of individual filter processing in compound filters, use parentheses. Example:
(<Filter1> OR <Filter2>) AND (<Filter3> OR <Filter4>)
First, the result of <Filter1>
and <Filter2>
unification by the OR
operator is processed. Next, the same happens to <Filter3>
and <Filter4>
. Finally, those processions are combined by the AND
operator.
Logical operators and parentheses have the following order of precedence, highest to lowest:
-
parentheses;
-
SKIP
andWHILE
(share the same precedence); -
AND
; -
OR
.
Example of a compound filter usage in HBase shell:
scan 'articles', { FILTER => "ColumnPaginationFilter (2, 1) AND PrefixFilter ('co')"}
Individual filters in HBase
The individual filters presented in HBase are listed below along with their syntax. Example queries are executed against the test table loaded with values from test file people.csv. This file is an extended version of a test file used in the Bulk loading via built-in MapReduce jobs article. You can use instructions in that article to create the same test table which is used here. If you do, make sure you change the command for the ImportTsv
job and use the -Dimporttsv.columns=HBASE_ROW_KEY,basic:age,location:town,location:state,location:country
flag instead of -Dimporttsv.columns=HBASE_ROW_KEY,basic:age
. Change the file and table names accordingly if necessary.
The test table’s row keys are the names of the people. The first column family called basic
has the only qualifier called age
. The second column family called location
has three qualifiers called country
, state
, and town
. There are 997 rows in the table.
Returns the key component of each key/value. Takes no arguments.
Syntax:
"KeyOnlyFilter ()"
Command example:
scan 'people', { FILTER => "KeyOnlyFilter ()" }
Result (the last eight keys):
... Zimmerman Gene column=basic:age, timestamp=2024-07-30T08:10:19.297, value= Zimmerman Gene column=location:country, timestamp=2024-07-30T08:10:19.297, value= Zimmerman Gene column=location:state, timestamp=2024-07-30T08:10:19.297, value= Zimmerman Gene column=location:town, timestamp=2024-07-30T08:10:19.297, value= Zimmerman Madge column=basic:age, timestamp=2024-07-30T08:10:19.297, value= Zimmerman Madge column=location:country, timestamp=2024-07-30T08:10:19.297, value= Zimmerman Madge column=location:state, timestamp=2024-07-30T08:10:19.297, value= Zimmerman Madge column=location:town, timestamp=2024-07-30T08:10:19.297, value= 997 row(s) Took 0.5022 seconds
Returns the first key/value from each row. Takes no arguments.
Syntax:
"FirstKeyOnlyFilter ()"
Command example:
scan 'people', { FILTER => "KeyOnlyFilter ()" }
Result (the first and last five key/value pairs):
ROW COLUMN+CELL Abbott Delia column=basic:age, timestamp=2024-07-30T08:10:19.297, value=62 Abbott Howard column=basic:age, timestamp=2024-07-30T08:10:19.297, value=24 Abbott Jack column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Adams Clyde column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Aguilar Myrtie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=23 ... Young Della column=basic:age, timestamp=2024-07-30T08:10:19.297, value=21 Young Josephine column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Young Mattie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=39 Zimmerman Gene column=basic:age, timestamp=2024-07-30T08:10:19.297, value=35 Zimmerman Madge column=basic:age, timestamp=2024-07-30T08:10:19.297, value=46 997 row(s) Took 0.1058 seconds
Returns all key/value pairs from rows the keys of which begin with the prefix specified by the argument. Takes one argument: row key prefix.
Syntax:
"PrefixFilter ('Roy')"
Command example:
scan 'people', { FILTER => "PrefixFilter ('Roy')" }
Result:
ROW COLUMN+CELL Roy Alfred column=basic:age, timestamp=2024-07-30T08:10:19.297, value=55 Roy Alfred column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Roy Alfred column=location:state, timestamp=2024-07-30T08:10:19.297, value=Chihuahua Roy Alfred column=location:town, timestamp=2024-07-30T08:10:19.297, value=Juarez Roy Belle column=basic:age, timestamp=2024-07-30T08:10:19.297, value=35 Roy Belle column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Roy Belle column=location:state, timestamp=2024-07-30T08:10:19.297, value=AZ Roy Belle column=location:town, timestamp=2024-07-30T08:10:19.297, value=Nogales Roy Lora column=basic:age, timestamp=2024-07-30T08:10:19.297, value=52 Roy Lora column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Roy Lora column=location:state, timestamp=2024-07-30T08:10:19.297, value=Yucatan Roy Lora column=location:town, timestamp=2024-07-30T08:10:19.297, value=Uman Roy Ronald column=basic:age, timestamp=2024-07-30T08:10:19.297, value=63 Roy Ronald column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Roy Ronald column=location:state, timestamp=2024-07-30T08:10:19.297, value=California Baja Roy Ronald column=location:town, timestamp=2024-07-30T08:10:19.297, value=Mexicali 4 row(s) Took 0.0252 seconds
Returns all key/value pairs from a column with a qualifier that begins with the prefix specified by the argument. Takes one argument: prefix of the column qualifier.
Syntax:
"ColumnPrefixFilter ('tow')"
Command example:
scan 'people', { FILTER => "ColumnPrefixFilter ('tow')" }
Since all rows of the test table have non-empty values for all columns, this filter returns either zero results if no column qualifier has the specified prefix, or all 997 of them if such column qualifier exists. The command given above results in 997 key/value pairs, the first and last five of which are shown below:
ROW COLUMN+CELL Abbott Delia column=location:town, timestamp=2024-07-30T08:10:19.297, value=Dallas Abbott Howard column=location:town, timestamp=2024-07-30T08:10:19.297, value=Baton Rouge Abbott Jack column=location:town, timestamp=2024-07-30T08:10:19.297, value=Juarez Adams Clyde column=location:town, timestamp=2024-07-30T08:10:19.297, value=Lafayette Aguilar Myrtie column=location:town, timestamp=2024-07-30T08:10:19.297, value=Uman ... Young Della column=location:town, timestamp=2024-07-30T08:10:19.297, value=Chetumal Young Josephine column=location:town, timestamp=2024-07-30T08:10:19.297, value=Redding Young Mattie column=location:town, timestamp=2024-07-30T08:10:19.297, value=San Jose Zimmerman Gene column=location:town, timestamp=2024-07-30T08:10:19.297, value=Tijuana Zimmerman Madge column=location:town, timestamp=2024-07-30T08:10:19.297, value=Nogales 997 row(s) Took 0.1360 seconds
Returns all key/value pairs from columns with qualifiers that begin with any of the prefixes specified by the arguments. Takes one or more arguments: prefixes of the column qualifiers. In the case of a single argument, works the same as the ColumnPrefixFilter
filter.
Syntax:
"MultipleColumnPrefixFilter ('sta', 'tow')"
Command example:
scan 'people', { FILTER => "MultipleColumnPrefixFilter ('sta', 'tow')" }
The same situation as in the previous example (the ColumnPrefixFilter
filter) applies here: since all rows of the test table have non-empty values for all columns, this filter returns either zero results if no column qualifier has any of the specified prefixes, or all 997 of them per prefix specified if respective qualifiers exist. If one of the specified prefixes is the prefix of another (e.g. to
and tow
), the number of results does not multiply. The command given above results in 1994 key/value pairs, the first and last six of which are shown below:
ROW COLUMN+CELL Abbott Delia column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Abbott Delia column=location:town, timestamp=2024-07-30T08:10:19.297, value=Dallas Abbott Howard column=location:state, timestamp=2024-07-30T08:10:19.297, value=LA Abbott Howard column=location:town, timestamp=2024-07-30T08:10:19.297, value=Baton Rouge Abbott Jack column=location:state, timestamp=2024-07-30T08:10:19.297, value=Chihuahua Abbott Jack column=location:town, timestamp=2024-07-30T08:10:19.297, value=Juarez ... Young Mattie column=location:state, timestamp=2024-07-30T08:10:19.297, value=CA Young Mattie column=location:town, timestamp=2024-07-30T08:10:19.297, value=San Jose Zimmerman Gene column=location:state, timestamp=2024-07-30T08:10:19.297, value=California Baja Zimmerman Gene column=location:town, timestamp=2024-07-30T08:10:19.297, value=Tijuana Zimmerman Madge column=location:state, timestamp=2024-07-30T08:10:19.297, value=AZ Zimmerman Madge column=location:town, timestamp=2024-07-30T08:10:19.297, value=Nogales 997 row(s) Took 0.1360 seconds
Returns columns of the row starting with the first and up to the argument value. This filter only works correctly with single rows, so it should be used with the get
command. Using this filter with the scan
command does not lead to an error, but provides an incorrect result. Takes one argument: number of the final column.
Syntax:
"ColumnCountGetFilter (3)"
Command example:
get 'people', 'Abbott Delia', { FILTER => "ColumnCountGetFilter (3)" }
Result:
COLUMN CELL basic:age timestamp=2024-07-30T08:10:19.297, value=62 location:country timestamp=2024-07-30T08:10:19.297, value=USA location:state timestamp=2024-07-30T08:10:19.297, value=TX 1 row(s) Took 0.6243 seconds
Returns rows of the table starting with the first and up to the argument value from each region. Takes one argument: number of the final row.
Syntax:
"PageFilter (2)"
Command example:
scan 'people', { FILTER => "PageFilter (2)" }
Since the test table was created with five regions split at F
, K
, P
, and W
, this command will return two rows with four key/value pairs apiece from each region. Result:
ROW COLUMN+CELL Abbott Delia column=basic:age, timestamp=2024-07-30T08:10:19.297, value=62 Abbott Delia column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Abbott Delia column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Abbott Delia column=location:town, timestamp=2024-07-30T08:10:19.297, value=Dallas Abbott Howard column=basic:age, timestamp=2024-07-30T08:10:19.297, value=24 Abbott Howard column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Abbott Howard column=location:state, timestamp=2024-07-30T08:10:19.297, value=LA Abbott Howard column=location:town, timestamp=2024-07-30T08:10:19.297, value=Baton Rouge Farmer Alan column=basic:age, timestamp=2024-07-30T08:10:19.297, value=24 Farmer Alan column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Farmer Alan column=location:state, timestamp=2024-07-30T08:10:19.297, value=Quintana Roo Farmer Alan column=location:town, timestamp=2024-07-30T08:10:19.297, value=Cancun Farmer Dean column=basic:age, timestamp=2024-07-30T08:10:19.297, value=65 Farmer Dean column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Farmer Dean column=location:state, timestamp=2024-07-30T08:10:19.297, value=LA Farmer Dean column=location:town, timestamp=2024-07-30T08:10:19.297, value=Baton Rouge Keller Elmer column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Keller Elmer column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Keller Elmer column=location:state, timestamp=2024-07-30T08:10:19.297, value=Quintana Roo Keller Elmer column=location:town, timestamp=2024-07-30T08:10:19.297, value=Chetumal Keller Flora column=basic:age, timestamp=2024-07-30T08:10:19.297, value=23 Keller Flora column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Keller Flora column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Keller Flora column=location:town, timestamp=2024-07-30T08:10:19.297, value=Houston Padilla Ethan column=basic:age, timestamp=2024-07-30T08:10:19.297, value=61 Padilla Ethan column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Padilla Ethan column=location:state, timestamp=2024-07-30T08:10:19.297, value=NV Padilla Ethan column=location:town, timestamp=2024-07-30T08:10:19.297, value=Novac Padilla Scott column=basic:age, timestamp=2024-07-30T08:10:19.297, value=28 Padilla Scott column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Padilla Scott column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Padilla Scott column=location:town, timestamp=2024-07-30T08:10:19.297, value=Beaumont Wade Janie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=25 Wade Janie column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Wade Janie column=location:state, timestamp=2024-07-30T08:10:19.297, value=California Baja Wade Janie column=location:town, timestamp=2024-07-30T08:10:19.297, value=Ensenada Wade Jerome column=basic:age, timestamp=2024-07-30T08:10:19.297, value=27 Wade Jerome column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Wade Jerome column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Wade Jerome column=location:town, timestamp=2024-07-30T08:10:19.297, value=Houston 10 row(s) Took 0.0138 seconds
Returns the number of columns specified by the first argument (limit) after the number of columns specified by the second argument (offset). Takes two arguments: limit and offset.
Syntax:
"ColumnPaginationFilter (2, 1)"
Command example:
scan 'people', { FILTER => "ColumnPaginationFilter (2, 1)" }
This command results in 1994 key/value pairs, the first and last six of which are shown below:
ROW COLUMN+CELL Abbott Delia column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Abbott Delia column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Abbott Howard column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Abbott Howard column=location:state, timestamp=2024-07-30T08:10:19.297, value=LA Abbott Jack column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Abbott Jack column=location:state, timestamp=2024-07-30T08:10:19.297, value=Chihuahua ... Young Mattie column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Young Mattie column=location:state, timestamp=2024-07-30T08:10:19.297, value=CA Zimmerman Gene column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Zimmerman Gene column=location:state, timestamp=2024-07-30T08:10:19.297, value=California Baja Zimmerman Madge column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Zimmerman Madge column=location:state, timestamp=2024-07-30T08:10:19.297, value=AZ 997 row(s) Took 1.0474 seconds
Scans the table row by row until it finds the row with the specified key, then returns all rows up to and including the one where the key was found. Takes one argument: row key.
Syntax:
"InclusiveStopFilter (3)"
Command example:
scan 'people', { FILTER => "InclusiveStopFilter ('Allen Austin')" }
Result:
ROW COLUMN+CELL Abbott Delia column=basic:age, timestamp=2024-07-30T08:10:19.297, value=62 Abbott Delia column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Abbott Delia column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Abbott Delia column=location:town, timestamp=2024-07-30T08:10:19.297, value=Dallas Abbott Howard column=basic:age, timestamp=2024-07-30T08:10:19.297, value=24 Abbott Howard column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Abbott Howard column=location:state, timestamp=2024-07-30T08:10:19.297, value=LA Abbott Howard column=location:town, timestamp=2024-07-30T08:10:19.297, value=Baton Rouge Abbott Jack column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Abbott Jack column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Abbott Jack column=location:state, timestamp=2024-07-30T08:10:19.297, value=Chihuahua Abbott Jack column=location:town, timestamp=2024-07-30T08:10:19.297, value=Juarez Adams Clyde column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Adams Clyde column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Adams Clyde column=location:state, timestamp=2024-07-30T08:10:19.297, value=LA Adams Clyde column=location:town, timestamp=2024-07-30T08:10:19.297, value=Lafayette Aguilar Myrtie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=23 Aguilar Myrtie column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Aguilar Myrtie column=location:state, timestamp=2024-07-30T08:10:19.297, value=Yucatan Aguilar Myrtie column=location:town, timestamp=2024-07-30T08:10:19.297, value=Uman Aguilar Terry column=basic:age, timestamp=2024-07-30T08:10:19.297, value=65 Aguilar Terry column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Aguilar Terry column=location:state, timestamp=2024-07-30T08:10:19.297, value=CA Aguilar Terry column=location:town, timestamp=2024-07-30T08:10:19.297, value=Fresno Alexander Derrick column=basic:age, timestamp=2024-07-30T08:10:19.297, value=46 Alexander Derrick column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Alexander Derrick column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Alexander Derrick column=location:town, timestamp=2024-07-30T08:10:19.297, value=Dallas Alexander Gregory column=basic:age, timestamp=2024-07-30T08:10:19.297, value=54 Alexander Gregory column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Alexander Gregory column=location:state, timestamp=2024-07-30T08:10:19.297, value=LA Alexander Gregory column=location:town, timestamp=2024-07-30T08:10:19.297, value=Lafayette Alexander Leon column=basic:age, timestamp=2024-07-30T08:10:19.297, value=42 Alexander Leon column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Alexander Leon column=location:state, timestamp=2024-07-30T08:10:19.297, value=AZ Alexander Leon column=location:town, timestamp=2024-07-30T08:10:19.297, value=Kingman Allen Austin column=basic:age, timestamp=2024-07-30T08:10:19.297, value=34 Allen Austin column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Allen Austin column=location:state, timestamp=2024-07-30T08:10:19.297, value=NV Allen Austin column=location:town, timestamp=2024-07-30T08:10:19.297, value=Primm 10 row(s) Took 0.7167 seconds
Returns all key/value pairs with timestamps matching any of the timestamps specified by the arguments. Takes any number of arguments: timestamps.
Syntax:
"TimestampsFilter (1721203180857, 1721316861863)"
All the key/value pairs in the test table have the same timestamp, so for a more concise and illustrative output you might want to tweak a couple of them using the put
command:
put 'people', 'Young Mattie', 'location:town', 'Carlsbad', 1722327020000
put 'people', 'Young Mattie', 'location:state', 'NM', 1722327020000
put 'people', 'Yates Douglas', 'location:state', 'NM', 1722327020000
put 'people', 'Yates Douglas', 'location:town', 'Albuquerque', 1722327020000
Command example:
scan 'people', { FILTER => "TimestampsFilter (1722327020000)" }
Result:
ROW COLUMN+CELL Yates Douglas column=location:state, timestamp=2024-07-30T08:10:20, value=NM Yates Douglas column=location:town, timestamp=2024-07-30T08:10:20, value=Albuquerque Young Mattie column=location:state, timestamp=2024-07-30T08:10:20, value=NM Young Mattie column=location:town, timestamp=2024-07-30T08:10:20, value=Carlsbad 2 row(s) Took 0.1001 seconds
Checks all rows and returns all key/value pairs in the row if the row key value matches the result of comparison specified by the arguments. This filter does not work with the get
command. Takes two arguments: a comparison operator and a comparator.
Syntax:
"RowFilter (>=, 'binaryprefix:B')"
Command example:
scan 'people', { FILTER => "RowFilter (<, 'binaryprefix:B')" }
This filter should return all key/value pairs from the rows the keys of which begin with something lexicographically less than B
. In case of our test table this basically means all rows about people whose last names begin with an A
. Result:
ROW COLUMN+CELL Abbott Delia column=basic:age, timestamp=2024-07-30T08:10:19.297, value=62 Abbott Delia column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Abbott Delia column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Abbott Delia column=location:town, timestamp=2024-07-30T08:10:19.297, value=Dallas Abbott Howard column=basic:age, timestamp=2024-07-30T08:10:19.297, value=24 Abbott Howard column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Abbott Howard column=location:state, timestamp=2024-07-30T08:10:19.297, value=LA Abbott Howard column=location:town, timestamp=2024-07-30T08:10:19.297, value=Baton Rouge Abbott Jack column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Abbott Jack column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Abbott Jack column=location:state, timestamp=2024-07-30T08:10:19.297, value=Chihuahua Abbott Jack column=location:town, timestamp=2024-07-30T08:10:19.297, value=Juarez Adams Clyde column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Adams Clyde column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Adams Clyde column=location:state, timestamp=2024-07-30T08:10:19.297, value=LA Adams Clyde column=location:town, timestamp=2024-07-30T08:10:19.297, value=Lafayette Aguilar Myrtie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=23 Aguilar Myrtie column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Aguilar Myrtie column=location:state, timestamp=2024-07-30T08:10:19.297, value=Yucatan Aguilar Myrtie column=location:town, timestamp=2024-07-30T08:10:19.297, value=Uman Aguilar Terry column=basic:age, timestamp=2024-07-30T08:10:19.297, value=65 Aguilar Terry column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Aguilar Terry column=location:state, timestamp=2024-07-30T08:10:19.297, value=CA Aguilar Terry column=location:town, timestamp=2024-07-30T08:10:19.297, value=Fresno Alexander Derrick column=basic:age, timestamp=2024-07-30T08:10:19.297, value=46 Alexander Derrick column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Alexander Derrick column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Alexander Derrick column=location:town, timestamp=2024-07-30T08:10:19.297, value=Dallas Alexander Gregory column=basic:age, timestamp=2024-07-30T08:10:19.297, value=54 Alexander Gregory column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Alexander Gregory column=location:state, timestamp=2024-07-30T08:10:19.297, value=LA Alexander Gregory column=location:town, timestamp=2024-07-30T08:10:19.297, value=Lafayette Alexander Leon column=basic:age, timestamp=2024-07-30T08:10:19.297, value=42 Alexander Leon column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Alexander Leon column=location:state, timestamp=2024-07-30T08:10:19.297, value=AZ Alexander Leon column=location:town, timestamp=2024-07-30T08:10:19.297, value=Kingman Allen Austin column=basic:age, timestamp=2024-07-30T08:10:19.297, value=34 Allen Austin column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Allen Austin column=location:state, timestamp=2024-07-30T08:10:19.297, value=NV Allen Austin column=location:town, timestamp=2024-07-30T08:10:19.297, value=Primm Allison Dustin column=basic:age, timestamp=2024-07-30T08:10:19.297, value=54 Allison Dustin column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Allison Dustin column=location:state, timestamp=2024-07-30T08:10:19.297, value=Yucatan Allison Dustin column=location:town, timestamp=2024-07-30T08:10:19.297, value=Merida Alvarado Dominic column=basic:age, timestamp=2024-07-30T08:10:19.297, value=63 Alvarado Dominic column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Alvarado Dominic column=location:state, timestamp=2024-07-30T08:10:19.297, value=California Baja Alvarado Dominic column=location:town, timestamp=2024-07-30T08:10:19.297, value=Ensenada Alvarado Maria column=basic:age, timestamp=2024-07-30T08:10:19.297, value=58 Alvarado Maria column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Alvarado Maria column=location:state, timestamp=2024-07-30T08:10:19.297, value=Chihuahua Alvarado Maria column=location:town, timestamp=2024-07-30T08:10:19.297, value=Chihuahua Alvarado Melvin column=basic:age, timestamp=2024-07-30T08:10:19.297, value=34 Alvarado Melvin column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Alvarado Melvin column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Alvarado Melvin column=location:town, timestamp=2024-07-30T08:10:19.297, value=Austin Alvarado Timothy column=basic:age, timestamp=2024-07-30T08:10:19.297, value=27 Alvarado Timothy column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Alvarado Timothy column=location:state, timestamp=2024-07-30T08:10:19.297, value=CA Alvarado Timothy column=location:town, timestamp=2024-07-30T08:10:19.297, value=San Francisco Alvarez Bessie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=34 Alvarez Bessie column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Alvarez Bessie column=location:state, timestamp=2024-07-30T08:10:19.297, value=NV Alvarez Bessie column=location:town, timestamp=2024-07-30T08:10:19.297, value=Las Vegas Alvarez Bruce column=basic:age, timestamp=2024-07-30T08:10:19.297, value=60 Alvarez Bruce column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Alvarez Bruce column=location:state, timestamp=2024-07-30T08:10:19.297, value=CA Alvarez Bruce column=location:town, timestamp=2024-07-30T08:10:19.297, value=Modesto Alvarez Harvey column=basic:age, timestamp=2024-07-30T08:10:19.297, value=57 Alvarez Harvey column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Alvarez Harvey column=location:state, timestamp=2024-07-30T08:10:19.297, value=NV Alvarez Harvey column=location:town, timestamp=2024-07-30T08:10:19.297, value=Fallon Alvarez Jacob column=basic:age, timestamp=2024-07-30T08:10:19.297, value=49 Alvarez Jacob column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Alvarez Jacob column=location:state, timestamp=2024-07-30T08:10:19.297, value=Yucatan Alvarez Jacob column=location:town, timestamp=2024-07-30T08:10:19.297, value=Merida Anderson Lester column=basic:age, timestamp=2024-07-30T08:10:19.297, value=62 Anderson Lester column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Anderson Lester column=location:state, timestamp=2024-07-30T08:10:19.297, value=Sonora Anderson Lester column=location:town, timestamp=2024-07-30T08:10:19.297, value=Hermosillo Anderson Lucile column=basic:age, timestamp=2024-07-30T08:10:19.297, value=33 Anderson Lucile column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Anderson Lucile column=location:state, timestamp=2024-07-30T08:10:19.297, value=Sonora Anderson Lucile column=location:town, timestamp=2024-07-30T08:10:19.297, value=Hermosillo Anderson Ora column=basic:age, timestamp=2024-07-30T08:10:19.297, value=26 Anderson Ora column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Anderson Ora column=location:state, timestamp=2024-07-30T08:10:19.297, value=Sonora Anderson Ora column=location:town, timestamp=2024-07-30T08:10:19.297, value=Nogales Andrews Caleb column=basic:age, timestamp=2024-07-30T08:10:19.297, value=22 Andrews Caleb column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Andrews Caleb column=location:state, timestamp=2024-07-30T08:10:19.297, value=Quintana Roo Andrews Caleb column=location:town, timestamp=2024-07-30T08:10:19.297, value=Cancun Andrews Lucy column=basic:age, timestamp=2024-07-30T08:10:19.297, value=33 Andrews Lucy column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Andrews Lucy column=location:state, timestamp=2024-07-30T08:10:19.297, value=AZ Andrews Lucy column=location:town, timestamp=2024-07-30T08:10:19.297, value=Kingman Andrews Noah column=basic:age, timestamp=2024-07-30T08:10:19.297, value=63 Andrews Noah column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Andrews Noah column=location:state, timestamp=2024-07-30T08:10:19.297, value=California Baja Andrews Noah column=location:town, timestamp=2024-07-30T08:10:19.297, value=Ensenada Andrews Susan column=basic:age, timestamp=2024-07-30T08:10:19.297, value=48 Andrews Susan column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Andrews Susan column=location:state, timestamp=2024-07-30T08:10:19.297, value=Yucatan Andrews Susan column=location:town, timestamp=2024-07-30T08:10:19.297, value=Merida Armstrong Isabella column=basic:age, timestamp=2024-07-30T08:10:19.297, value=49 Armstrong Isabella column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Armstrong Isabella column=location:state, timestamp=2024-07-30T08:10:19.297, value=Sonora Armstrong Isabella column=location:town, timestamp=2024-07-30T08:10:19.297, value=Hermosillo Armstrong Marie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=25 Armstrong Marie column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Armstrong Marie column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Armstrong Marie column=location:town, timestamp=2024-07-30T08:10:19.297, value=Santa Fe Arnold Bettie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=18 Arnold Bettie column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Arnold Bettie column=location:state, timestamp=2024-07-30T08:10:19.297, value=NV Arnold Bettie column=location:town, timestamp=2024-07-30T08:10:19.297, value=Primm Atkins Daisy column=basic:age, timestamp=2024-07-30T08:10:19.297, value=51 Atkins Daisy column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Atkins Daisy column=location:state, timestamp=2024-07-30T08:10:19.297, value=Quintana Roo Atkins Daisy column=location:town, timestamp=2024-07-30T08:10:19.297, value=Chetumal Atkins Gene column=basic:age, timestamp=2024-07-30T08:10:19.297, value=37 Atkins Gene column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Atkins Gene column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Atkins Gene column=location:town, timestamp=2024-07-30T08:10:19.297, value=Dallas Austin Bertie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=57 Austin Bertie column=location:country, timestamp=2024-07-30T08:10:19.297, value=MEX Austin Bertie column=location:state, timestamp=2024-07-30T08:10:19.297, value=California Baja Austin Bertie column=location:town, timestamp=2024-07-30T08:10:19.297, value=Tijuana Austin Eugene column=basic:age, timestamp=2024-07-30T08:10:19.297, value=64 Austin Eugene column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Austin Eugene column=location:state, timestamp=2024-07-30T08:10:19.297, value=LA Austin Eugene column=location:town, timestamp=2024-07-30T08:10:19.297, value=Lafayette Austin Travis column=basic:age, timestamp=2024-07-30T08:10:19.297, value=53 Austin Travis column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Austin Travis column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Austin Travis column=location:town, timestamp=2024-07-30T08:10:19.297, value=El Paso 34 row(s) Took 0.6767 seconds
Checks all columns and returns all key/value pairs in a column if its family name matches the result of comparison specified by the arguments. Takes two arguments: a comparison operator and a comparator.
Syntax:
"FamilyFilter (\<=, 'binaryprefix:c')"
Command example:
scan 'people', { FILTER => "FamilyFilter (<, 'binaryprefix:c')" }
This filter should return all key/value pairs the column families of which begin with something lexicographically less than c
. In case of our test table this means the whole basic:age
column. Result (the first and last five rows):
ROW COLUMN+CELL Abbott Delia column=basic:age, timestamp=2024-07-30T08:10:19.297, value=62 Abbott Howard column=basic:age, timestamp=2024-07-30T08:10:19.297, value=24 Abbott Jack column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Adams Clyde column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Aguilar Myrtie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=23 ... Young Della column=basic:age, timestamp=2024-07-30T08:10:19.297, value=21 Young Josephine column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Young Mattie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=39 Zimmerman Gene column=basic:age, timestamp=2024-07-30T08:10:19.297, value=35 Zimmerman Madge column=basic:age, timestamp=2024-07-30T08:10:19.297, value=46 997 row(s) Took 0.4836 seconds
Checks all columns and returns all key/value pairs in a column if its qualifier name matches the result of comparison specified by the arguments. Takes two arguments: a comparison operator and a comparator.
Syntax:
"QualifierFilter (=, 'binary:town')"
Command example:
scan 'people', { FILTER => "QualifierFilter (=, 'binary:town')" }
This filter should return all key/value pairs the column qualifier of which is exactly town
. Result (first and last five rows):
ROW COLUMN+CELL Abbott Delia column=location:town, timestamp=2024-07-30T08:10:19.297, value=Dallas Abbott Howard column=location:town, timestamp=2024-07-30T08:10:19.297, value=Baton Rouge Abbott Jack column=location:town, timestamp=2024-07-30T08:10:19.297, value=Juarez Adams Clyde column=location:town, timestamp=2024-07-30T08:10:19.297, value=Lafayette Aguilar Myrtie column=location:town, timestamp=2024-07-30T08:10:19.297, value=Uman ... Young Della column=location:town, timestamp=2024-07-30T08:10:19.297, value=Chetumal Young Josephine column=location:town, timestamp=2024-07-30T08:10:19.297, value=Redding Young Mattie column=location:town, timestamp=2024-07-30T08:10:20, value=Carlsbad Zimmerman Gene column=location:town, timestamp=2024-07-30T08:10:19.297, value=Tijuana Zimmerman Madge column=location:town, timestamp=2024-07-30T08:10:19.297, value=Nogales 997 row(s) Took 0.2112 seconds
Returns all key/value pairs the values of which match the result of comparison specified by the arguments. Takes two arguments: a comparison operator and a comparator.
Syntax:
"ValueFilter (=, 'binary:TX')"
Command example:
scan 'people', { FILTER => "ValueFilter (=, 'binary:TX')" }
This filter should return the list of texans from the test table, i.e. all key/value pairs which are exactly TX
. Result (first and last five rows):
ROW COLUMN+CELL Abbott Delia column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Alexander Derrick column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Alvarado Melvin column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Atkins Gene column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Austin Travis column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX ... Watkins Julian column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Welch Lela column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Willis Travis column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Wilson Grace column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX Wilson Nellie column=location:state, timestamp=2024-07-30T08:10:19.297, value=TX 112 row(s) Took 0.0535 seconds
Searches each row for the column identified by the two mandatory arguments: column family and qualifier. If such a column is found (reference column), returns all key/value pairs in that row that have the same timestamp as does the reference column. If not — nothing is returned.
If a third boolean argument (dropDependentColumn
) is specified (optional), then the reference column is either returned as well (false
) or not (true
).
Two more arguments can be specified: a comparison operator and a comparator. In this case the column value must also match the result of comparison specified by these arguments for the column to be considered a reference one.
Takes two, three, or five arguments: column family (mandatory), column qualifier (mandatory), dropDependentColumn
flag (boolean, optional), comparison operator, and a comparator (last two arguments can only be specified jointly).
Syntax:
"DependentColumnFilter ('location', 'town')"
"DependentColumnFilter ('location', 'town', true)"
"DependentColumnFilter ('location', 'town', true, =, 'binary:Carlsbad')"
Command example:
scan 'people', { FILTER => "DependentColumnFilter ('location', 'town', true, =, 'binary:Carlsbad')" }
This filter should search each row for the exact value Carlsbad
in the column called exactly location:town
, and in case of success return other key/value pairs that share the same timestamp, omitting the found one. Result:
ROW COLUMN+CELL Young Mattie column=location:state, timestamp=2024-07-30T08:10:20, value=NM 1 row(s) Took 0.0269 seconds
This example relies on the tweaking done in the example for the TimestampsFilter
. Without it, the filter will return nothing as the comparison condition will not be met.
Searches each row for the reference column identified by four mandatory arguments: column family, column qualifier, comparison operator, and a comparator. If a column with the specified family and qualifier is found and its value matches the comparison result, all the key/value pairs in the row are returned. If a column with the specified family and qualifier is found but its value does not match the comparison result, no key/value pairs are returned. If no reference column is found, by default, all the key/value pairs in the row are returned.
Two more boolean arguments can be specified jointly: if the first (setFilterIfMissing
) is set to true
(default is false
) and no reference column is found, then no key/value pairs in the row are returned. If the second (setLatestVersionOnly
) is set to false
(default is true
), then all the reference column value versions are checked against the comparison result, and not only the latest one.
Takes four or six arguments: column family (mandatory), column qualifier (mandatory), comparison operator (mandatory), comparison value (mandatory), and setFilterIfMissing
and setLatestVersionOnly
flags (optional, jointly).
Syntax:
"SingleColumnValueFilter ('location', 'town', =, 'binaryprefix:Al')"
"SingleColumnValueFilter ('location', 'town', =, 'binaryprefix:Al', true, false)"
Command example:
scan 'people', { FILTER => "SingleColumnValueFilter ('location', 'town', =, 'binaryprefix:Al', true, false)" }
This filter should search each row for the values beginning with Al
in the column called exactly location:town
, checking all versions of those values. Only in case of success it returns all key/value pairs from that row. Result:
ROW COLUMN+CELL Ball Nelle column=basic:age, timestamp=2024-07-30T08:10:19.297, value=43 Ball Nelle column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Ball Nelle column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Ball Nelle column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Bell Leila column=basic:age, timestamp=2024-07-30T08:10:19.297, value=58 Bell Leila column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Bell Leila column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Bell Leila column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Cohen John column=basic:age, timestamp=2024-07-30T08:10:19.297, value=28 Cohen John column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Cohen John column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Cohen John column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Diaz Anne column=basic:age, timestamp=2024-07-30T08:10:19.297, value=57 Diaz Anne column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Diaz Anne column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Diaz Anne column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Howard Florence column=basic:age, timestamp=2024-07-30T08:10:19.297, value=37 Howard Florence column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Howard Florence column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Howard Florence column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Ingram Barbara column=basic:age, timestamp=2024-07-30T08:10:19.297, value=55 Ingram Barbara column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Ingram Barbara column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Ingram Barbara column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Jefferson Charlie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=46 Jefferson Charlie column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Jefferson Charlie column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Jefferson Charlie column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Kennedy Todd column=basic:age, timestamp=2024-07-30T08:10:19.297, value=44 Kennedy Todd column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Kennedy Todd column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Kennedy Todd column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque McGee Isabelle column=basic:age, timestamp=2024-07-30T08:10:19.297, value=48 McGee Isabelle column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA McGee Isabelle column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM McGee Isabelle column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Page Victoria column=basic:age, timestamp=2024-07-30T08:10:19.297, value=30 Page Victoria column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Page Victoria column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Page Victoria column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Phelps Lida column=basic:age, timestamp=2024-07-30T08:10:19.297, value=43 Phelps Lida column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Phelps Lida column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Phelps Lida column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Phillips Helen column=basic:age, timestamp=2024-07-30T08:10:19.297, value=61 Phillips Helen column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Phillips Helen column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Phillips Helen column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Reyes Marc column=basic:age, timestamp=2024-07-30T08:10:19.297, value=25 Reyes Marc column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Reyes Marc column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Reyes Marc column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Roberts Clayton column=basic:age, timestamp=2024-07-30T08:10:19.297, value=52 Roberts Clayton column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Roberts Clayton column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Roberts Clayton column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Ryan Curtis column=basic:age, timestamp=2024-07-30T08:10:19.297, value=58 Ryan Curtis column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Ryan Curtis column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Ryan Curtis column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Spencer Lucinda column=basic:age, timestamp=2024-07-30T08:10:19.297, value=24 Spencer Lucinda column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Spencer Lucinda column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Spencer Lucinda column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Woods Bessie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=47 Woods Bessie column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Woods Bessie column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Woods Bessie column=location:town, timestamp=2024-07-30T08:10:19.297, value=Albuquerque Yates Douglas column=basic:age, timestamp=2024-07-30T08:10:19.297, value=35 Yates Douglas column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Yates Douglas column=location:state, timestamp=2024-07-30T08:10:20, value=NM Yates Douglas column=location:town, timestamp=2024-07-30T08:10:20, value=Albuquerque 18 row(s) Took 0.0263 seconds
Works the same as the SingleColumnValueFilter
filter except that the reference column value is never returned in the results.
Syntax:
"SingleColumnValueExcludeFilter ('location', 'town', =, 'binaryprefix:Al')"
"SingleColumnValueExcludeFilter ('location', 'town', =, 'binaryprefix:Al', true, false)"
Command example:
scan 'people', { FILTER => "SingleColumnValueExcludeFilter ('location', 'town', =, 'binaryprefix:Al', true, false)" }
This filter should search each row for the values beginning with Al
in the column called exactly location:town
, checking all versions of those values. Only in case of success it returns other key/value pairs from that row but not the found one. Result:
ROW COLUMN+CELL Ball Nelle column=basic:age, timestamp=2024-07-30T08:10:19.297, value=43 Ball Nelle column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Ball Nelle column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Bell Leila column=basic:age, timestamp=2024-07-30T08:10:19.297, value=58 Bell Leila column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Bell Leila column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Cohen John column=basic:age, timestamp=2024-07-30T08:10:19.297, value=28 Cohen John column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Cohen John column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Diaz Anne column=basic:age, timestamp=2024-07-30T08:10:19.297, value=57 Diaz Anne column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Diaz Anne column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Howard Florence column=basic:age, timestamp=2024-07-30T08:10:19.297, value=37 Howard Florence column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Howard Florence column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Ingram Barbara column=basic:age, timestamp=2024-07-30T08:10:19.297, value=55 Ingram Barbara column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Ingram Barbara column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Jefferson Charlie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=46 Jefferson Charlie column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Jefferson Charlie column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Kennedy Todd column=basic:age, timestamp=2024-07-30T08:10:19.297, value=44 Kennedy Todd column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Kennedy Todd column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM McGee Isabelle column=basic:age, timestamp=2024-07-30T08:10:19.297, value=48 McGee Isabelle column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA McGee Isabelle column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Page Victoria column=basic:age, timestamp=2024-07-30T08:10:19.297, value=30 Page Victoria column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Page Victoria column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Phelps Lida column=basic:age, timestamp=2024-07-30T08:10:19.297, value=43 Phelps Lida column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Phelps Lida column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Phillips Helen column=basic:age, timestamp=2024-07-30T08:10:19.297, value=61 Phillips Helen column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Phillips Helen column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Reyes Marc column=basic:age, timestamp=2024-07-30T08:10:19.297, value=25 Reyes Marc column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Reyes Marc column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Roberts Clayton column=basic:age, timestamp=2024-07-30T08:10:19.297, value=52 Roberts Clayton column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Roberts Clayton column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Ryan Curtis column=basic:age, timestamp=2024-07-30T08:10:19.297, value=58 Ryan Curtis column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Ryan Curtis column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Spencer Lucinda column=basic:age, timestamp=2024-07-30T08:10:19.297, value=24 Spencer Lucinda column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Spencer Lucinda column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Woods Bessie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=47 Woods Bessie column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Woods Bessie column=location:state, timestamp=2024-07-30T08:10:19.297, value=NM Yates Douglas column=basic:age, timestamp=2024-07-30T08:10:19.297, value=35 Yates Douglas column=location:country, timestamp=2024-07-30T08:10:19.297, value=USA Yates Douglas column=location:state, timestamp=2024-07-30T08:10:20, value=NM 18 row(s) Took 0.0646 seconds
Returns only key/value pairs with column qualifier names in the range set by the arguments. The range ends can be definitive or empty. Each range end is followed by a boolean argument that defines whether the range end is included or not.
Takes four arguments: left range end, left inclusion flag, right range end, right inclusion flag. If a range end is empty, then its respective inclusion flag value does not matter.
Syntax:
"ColumnRangeFilter ('', true, 'c', false)"
Command example:
scan 'people', { FILTER => "ColumnRangeFilter ('', true, 'c', false)" }
This filter will only return the basic:age
column, since the next column qualifier alphabetically is country
and it does not fall into the specified range. Result (the first and last five rows):
ROW COLUMN+CELL Abbott Delia column=basic:age, timestamp=2024-07-30T08:10:19.297, value=62 Abbott Howard column=basic:age, timestamp=2024-07-30T08:10:19.297, value=24 Abbott Jack column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Adams Clyde column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Aguilar Myrtie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=23 ... Young Della column=basic:age, timestamp=2024-07-30T08:10:19.297, value=21 Young Josephine column=basic:age, timestamp=2024-07-30T08:10:19.297, value=29 Young Mattie column=basic:age, timestamp=2024-07-30T08:10:19.297, value=39 Zimmerman Gene column=basic:age, timestamp=2024-07-30T08:10:19.297, value=35 Zimmerman Madge column=basic:age, timestamp=2024-07-30T08:10:19.297, value=46 997 row(s) Took 0.1788 seconds
Dynamic loading of custom filters
HBase in ADH supports dynamic loading of custom filters. To utilize this feature, you should specify the directory containing the custom filter JAR files:
-
Go to ADCM UI and select your ADH cluster.
-
Navigate to Services → HBase → Primary configuration and toggle Show advanced.
-
Open the Custom hbase-site.xml section and click Add property.
-
For the field name, enter
hbase.dynamic.jars.dir
. For the field value, enter a path of your preference. A good example is${hbase.rootdir}/lib
. Click Apply. -
Save the configuration by clicking Save → Create and restart the service by clicking Actions → Reconfig and graceful restart.
Provided there are JAR files with the custom filters in the specified location, you should be able to use them both in HBase shell and via Java applications using the HBase API.