This topic describes the scenarios for which the sharding feature is suitable. The topic also describes how to use the feature.
Background information
- In the business logic, Table A is used to store business data. If you need to use Table B to store the data, you can modify the configuration.
- Sharding
In most business scenarios, only data that is generated in the previous week is frequently accessed. You can create a table each week to store the latest data. This way, you can query hot data in an efficient manner. However, you must manually create and delete tables. This increases the complexity of the business logic.
Scenarios
- Manage time series data
If your business data is time-sensitive, multiple indexes can be created based on time. This reduces the size of a single index and improves query performance. During this process, you do not need to manually create or delete indexes.
- Rebuild an index
You can rebuild indexes. The rebuild operation does not affect existing index-based data queries. After the indexes are rebuilt, the new indexes are used to query data. During this process, you do not need to modify existing code.
Create an alias for an existing collection
curl "http://solrhost:8983/solr/admin/collections?action=CREATEALIAS&name=your_alias_name&collections=your_collection_name_A"
The preceding curl command is run to create an alias named your_alias_name. The alias points to the collection your_collection_name_A. You can specify the your_alias_name alias in the business logic. Then, the kernel forwards query requests to the collection. If you want to change the name of the collection to your_collection_name_B, run the curl command to change the alias.
Change the alias
curl "http://solrhost:8983/solr/admin/collections?
action=ALIASPROP&name=your_alias_name&collections=your_collection_name_B"
This way, you can query data in the new collection without the need to modify your business code.
Automatic table sharding
LindormSearch can automatically create collections based on values in the time field. This help you simplify your business logic. The following example shows how to configure the system to create a collection each week and automatically delete the collections that are expired.
curl "http://solrhost:8983/solr/admin/collections?action=CREATEALIAS&name=test_router_alias&router.start=NOW-30DAYS/DAY&router.autoDeleteAge=/DAY-90DAYS&router.field=your_timestamp_l&router.name=time&router.interval=%2B7DAY&router.maxFutureMs=8640000000&create-collection.collection.configName=_indexer_default&create-collection.numShards=2"
Parameter | Value | Description |
---|---|---|
router.start | NOW-30DAYS/DAY | The start of the time range for the first collection. In this example, NOW-30DAYS/DAY indicates that the start time is 30 days before the current time. |
router.interval | +7DAY | The interval at which a new collection is created. In this example, a new collection is created every seven days. |
router.autoDeleteAge | /DAY-90DAYS | The time period in which a collection is retained before the collection is automatically deleted. In this example, the value indicates that the system automatically deletes collections that are stored for 90 days. The value of this parameter must be greater than the value of the router.start parameter. |
router.field | your_timestamp_l | The time field that is used to create collections. By default, this field with the specified value must be included in the business data. For example, you can set the value to System.currentTimeMillis(). This value is the current system timestamp. |
router.maxFutureMs | 8640000000 | The maximum difference that is allowed between the value of the your_date_dt field and the current time. This parameter ensures that only the data that is generated within the specified time range is written to the collection. In this example, the value of this parameter indicates that only the data that is generated within the previous 100 days or the next 100 days can be written to the collection. |
collection.collection.configName | _indexer_default | The configuration set on which the collection depends. You can set this parameter to the name of your configuration set. For more information, see Update the configuration set. |
create-collection.numShards | 2 | The number of shards that are created for the collection. Default value: 2. |
The preceding curl command is run to create collections every seven days based on the value of the your_timestamp_l field. The start of the time range for the first collection is 30 days before the current time. The difference between the value of the your_timestamp_l field and the current time is within 100 days. The collections that are created 90 days before are automatically deleted.
The following figure shows the created collections.
- The business data must contain a time field. The data type of the time field can be DATE or LONG.
- By default, all the collections are queried. You can specify a collection to query. To specify a collection to query, run the curl command with the specified URL or call the specific API operation to obtain all the collections. Then, you can obtain the values of the time field from the collections and specify the collection to query. For more information, see the following sample code.
Delete an alias
To delete a common alias, run the following command:
curl "http://solrhost:8983/solr/admin/collections?action=DELETEALIAS&name=your_alias_name"
If an alias is used to automatically create collections, you must delete the collections after you delete the alias. To delete the alias and collections, perform the following steps:
- Query the collections that are created by using the alias.
curl "http://solrhost:8983/solr/admin/collections?action=LIST"
In this example, all the collections whose names start with
test_router_alias
are created by using the alias. - Delete the alias.
curl "http://solrhost:8983/solr/admin/collections?action=DELETEALIAS&name=test_router_alias"
- Delete the collections.
curl "http://solrhost:8983/solr/admin/collections?action=DELETE&name=collection_name"
References
Query a collection based on a time field of the LONG type
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.CloudSolrClient;
import org.apache.solr.client.solrj.impl.ClusterStateProvider;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocument;
import org.apache.solr.common.util.StrUtils;
import java.time.Instant;
import java.time.ZoneOffset;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeFormatterBuilder;
import java.time.temporal.ChronoField;
import java.util.AbstractMap;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.Optional;
public class SolrDemo {
private static final DateTimeFormatter DATE_TIME_FORMATTER = new DateTimeFormatterBuilder()
.append(DateTimeFormatter.ISO_LOCAL_DATE).appendPattern("[_HH[_mm[_ss]]]")
.parseDefaulting(ChronoField.HOUR_OF_DAY, 0)
.parseDefaulting(ChronoField.MINUTE_OF_HOUR, 0)
.parseDefaulting(ChronoField.SECOND_OF_MINUTE, 0)
.toFormatter(Locale.ROOT).withZone(ZoneOffset.UTC);
private static final String zkHost = "localhost:2181/solr";
private CloudSolrClient cloudSolrClient;
private ClusterStateProvider clusterStateProvider;
public SolrDemo() {
cloudSolrClient = new CloudSolrClient.Builder(
Collections.singletonList(zkHost), Optional.empty()).build();
cloudSolrClient.connect();
clusterStateProvider = cloudSolrClient.getClusterStateProvider();
}
public void close() throws Exception {
if (null != cloudSolrClient) {
cloudSolrClient.close();
}
}
private List<String> findCollection(String aliasName, long start, long end) {
List<String> collections = new ArrayList<>();
if (start > end) {
return collections;
}
// Query collections based on the time range specified by the start and end parameters.
if (clusterStateProvider.getState(aliasName) == null) {
// Query all collections that are created by using the alias specified by the aliasName parameter.
// test_router_alias_2020-03-04, test_router_alias_2020-02-26, test_router_alias_2020-02-19, test_router_alias_2020-02-12, test_router_alias_2020-02-05
List<String> aliasedCollections = clusterStateProvider.resolveAlias(aliasName);
// Extract the date and time from the name of each collection.
// 2020-03-04T00:00:00Z=test_router_alias_2020-03-04,
// 2020-02-26T00:00:00Z=test_router_alias_2020-02-26,
// 2020-02-19T00:00:00Z=test_router_alias_2020-02-19,
// 2020-02-12T00:00:00Z=test_router_alias_2020-02-12,
// 2020-02-05T00:00:00Z=test_router_alias_2020-02-05
List<Map.Entry<Instant, String>> collectionsInstant = new ArrayList<>(aliasedCollections.size());
for (String collectionName : aliasedCollections) {
String dateTimePart = collectionName.substring(aliasName.length() + 1);
Instant instant = DATE_TIME_FORMATTER.parse(dateTimePart, Instant::from);
collectionsInstant.add(new AbstractMap.SimpleImmutableEntry<>(instant, collectionName));
}
// Find the required collection based on the query time.
Instant startI = Instant.ofEpochMilli(start);
Instant endI = Instant.ofEpochMilli(end);
for (Map.Entry<Instant, String> entry : collectionsInstant) {
Instant colStartTime = entry.getKey();
if (!endI.isBefore(colStartTime)) {
collections.add(entry.getValue());
System.out.println("find collection: " + entry.getValue());
if (!startI.isBefore(colStartTime)) {
break;
}
}
}
} else {
collections.add(aliasName);
}
System.out.println("query " + collections);
return collections;
}
public void run() throws Exception {
try {
// [2020-03-07 2020-03-10]
long start = 1583538686312L;
long end = 1583797886000L;
String aliasName = "test_router_alias";
String collections = StrUtils.join(findCollection(aliasName, start, end), ',');
QueryResponse res = cloudSolrClient.query(collections, new SolrQuery("*:*"));
for (SolrDocument sd : res.getResults()) {
System.out.println(sd.get("id") + " " + sd.get("gmtCreate_l"));
}
} finally {
cloudSolrClient.close();
}
}
public static void main(String[] args) throws Exception {
SolrDemo solrDemo = new SolrDemo();
solrDemo.run();
solrDemo.close();
}
}