ElasticSearch查询笔记目录
涉及的常用查询内容较多,将分多个章节进行笔记整理,具体如下:
主要是依据精准的查询条件来查询,查询速度快,也是最常用的几类查询方式,具体种类如下:
- term查询
- terms查询
- match_all查询
- match查询
- 布尔match查询
- multi_match查询
- 根据文档id查询(单个id)
- 根据文档ids查询(多个id)
主要是涉及ElasticSearch查询条件相对模糊,查询速度相对慢,实时查询时应尽量避免这些方式,但是这些查询方式又具有自己独特不可代替的功能,还是还有必要,具体如下:
- prefix查询
- fuzzy查询
- wildcard查询
- range查询
- regexp查询
主要涉及ElasticSearch的一些常用的杂项查询;
- 深分页scroll查询
- delete-by-query
- bool查询
- boosting查询
- filter查询
- highlight(高亮)查询
主要涉及ES的聚合查询Aggregations;
- cardinality(去重计数)查询
- range(范围统计)查询
- extended_stats(统计聚合)查询
. 主要涉及ES的地图检索geo相关的查询;
- geo_distance查询
- geo_bounding_box查询
- geo_polygon查询
整体Java代码的测试用例项目
整个章节的Java代码放在CSDN资源ElasticSearch常用查询的Java实现;路径效果如下图,欢迎下载访问;
深分页scroll查询
之前讲过from+size的分页,为何又有scroll+size的深分页呢?这里先对比一下两者的区别;
from+size在ES查询数据的方式步骤如下:
- 先将用户指定的关键字进行分词;
- 将词汇去分词库中进行检索,得到多个文档的id;
- 去各个分片中拉取指定的数据,相对耗时较长;
- 将数据根据score进行排序,耗时相对较长;
- 根据from,size的值,截取满足条件的查询到的数据;
- 返回结果;
优点:每次都能获取到最新的记录;
缺点:同一个查询,展示另一页的from+size时,以上步骤需要再来一遍;
scoll+size在ES查询数据的方式:
实现要求,依据fee字段和moblie字段倒序按照每一页2条scroll查询公司信息;
RESTFUL代码如下;
#步骤1 scoll 查询,返回第一页数据,将ES的id存放在上下文中
#参数scroll=2m表示scroll查询的上下文在内存中存放2分钟,不指定默认生存时间为0,当超时,会自动删除上下文,则下面的步骤2和3会查询报错
#指定size为2
#scroll可以指定字段排序,默认按照文档id排序
POST /sms-logs-index/_search?scroll=2m
{
"query": {
"match_all": {
}
}
, "size": 2
, "sort": [
{
"fee": {
"order": "desc"
}
,"moblie": {
"order": "desc"
}
]
}
#步骤2 根据scroll查询下一页数量,再下一页的话再执行下此语句,再下一页再再执行,直到结束或超时;
# scroll_id指的是上面的查询结果
# scroll还是要继续指定上下文在内存中缓存2分钟
POST /_search/scroll
{
"scroll_id":"FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR"
,"scroll":"2m"
}
# 步骤3 删除scroll在es上下文中的数量
# 可能我查到第一页就知道了结果,对后面的分页不感兴趣了,我想提前删除scroll中的上下文
DELETE /_search/scroll/FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR
RESTFUL代码查询结果如下;
#步骤1 scoll 查询结果
{
"_scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR",
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 12,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "5",
"_score" : null,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "24514635",
"moblie" : 18545427895,
"corpName" : "东东集团",
"smsContent" : "数据驱动,AI推动,新零售模型让你的购买更心怡!",
"state" : "1",
"opratorId" : "1",
"province" : "北京",
"ipAddr" : "10.254.19.45",
"replyTotal" : "1",
"fee" : "6000"
},
"sort" : [
6000.0
]
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "10",
"_score" : null,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "54784641",
"moblie" : 15625584654,
"corpName" : "勾股科技有限公司",
"smsContent" : "智能算法,智慧生活,勾股科技!",
"state" : "1",
"opratorId" : "2",
"province" : "杭州",
"ipAddr" : "10.215.19.45",
"replyTotal" : "6",
"fee" : "4000"
},
"sort" : [
4000.0
]
}
]
}
}
#步骤2 根据scroll查询下一页数量结果
{
"_scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR",
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 12,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "7",
"_score" : null,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "33656412674",
"moblie" : 18956451203,
"corpName" : "华丽网集团",
"smsContent" : "网络安全,华丽靠谱!",
"state" : "1",
"opratorId" : "3",
"province" : "上海",
"ipAddr" : "10.215.254.45",
"replyTotal" : "1",
"fee" : "2000"
},
"sort" : [
2000.0
]
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "11",
"_score" : null,
"_source" : {
"createDate" : "2020-09-22",
"senDate" : "2020-09-22",
"longCode" : "458744536",
"moblie" : 134625584654,
"corpName" : "星雨文化传媒",
"smsContent" : "魅力宣传,星雨传媒!",
"state" : "1",
"opratorId" : "3",
"province" : "杭州",
"ipAddr" : "10.289.19.45",
"replyTotal" : "6",
"fee" : "500"
},
"sort" : [
500.0
]
}
]
}
}
# 步骤3 删除scroll在es上下文中的数量结果
{
"succeeded" : true,
"num_freed" : 5
}
Java代码如下;
static RestHighLevelClient myClient= EsClient.getClient(); //获取操作ES的
String index="sms-logs-index";
@Test
public void scrollQuery() throws IOException
{
//1. 创建SearchRequest
SearchRequest request=new SearchRequest(index);
//2.指定scroll鑫鑫
request.scroll(TimeValue.timeValueMinutes(2L));
//3.指定查询条件
SearchSourceBuilder builder =new SearchSourceBuilder();
builder.size(4);
builder.sort("fee", SortOrder.DESC);
builder.query(QueryBuilders.matchAllQuery());
request.source(builder);
//4. 获取返回结果scrollId,source的首页信息
SearchResponse response = myClient.search(request, RequestOptions.DEFAULT);
String scrollId = response.getScrollId();
System.out.println("-----------------------首页----------------------------");
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
while (true)
{
//5.循环-创建SearchSrollRequest
SearchScrollRequest scrollRequest=new SearchScrollRequest(scrollId);
//6.指定scrollId的生存时间
scrollRequest.scroll(TimeValue.timeValueMinutes(2L));
//7.执行查询获取返回结果
SearchResponse scrollResp=myClient.scroll(scrollRequest,RequestOptions.DEFAULT);
//8.判断这一页是否还有数据,有则输出,没有则跳出循环
SearchHit[] hits = scrollResp.getHits().getHits();
if(hits != null && hits.length>0)
{
System.out.println("-----------------------下一页----------------------------");
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsMap());
}
}
else
{
//9。判断没有查询到数据-退出循环
System.out.println("-----------------------结束----------------------------");
break;
}
}
//10.创建ClearScrollRequest
ClearScrollRequest clearScrollRequest=new ClearScrollRequest();
//11.指定ScrollId
clearScrollRequest.addScrollId(scrollId);
//12.删除ScrollId
ClearScrollResponse clearScrollResponse =myClient.clearScroll(clearScrollRequest,RequestOptions.DEFAULT);
//13.输出结果
System.out.println("删除scroll:"+clearScrollResponse.isSucceeded());
}
Java代码实现的结果如下图;
delete-by-query
根据term,match等查询方式去删除大量的文档
注意:如果需要删除的内容,是该index下的大部分数据,推荐逆向思维,即新建一个新的index,将保留的文档内容添加到新的index,然后再直接访问新的index即可。
实现要求,依据利用range查询fee小于0.2的公司信息,并将这些数据删除。
RESTFUL代码如下;
#步骤1 利用range查询fee小于0.2的公司信息,查看一下查询结果可以发现有2条数据
POST /sms-logs-index/_search
{
"query": {
"range": {
"fee": {
"lt": 0.2
}
}
}
}
#步骤2 利用delete_by_query删除查询结果数据
POST /sms-logs-index/_delete_by_query
{
"query":
{
"range":
{
"fee":
{
"lt": 0.2
}
}
}
}
#步骤3 再次利用range查询fee小于0.2的公司信息,已经无信息
POST /sms-logs-index/_search
{
"query": {
"range": {
"fee": {
"lt": 0.2
}
}
}
}
RESTFUL代码查询结果如下;
#步骤1 利用range查询fee小于0.2的公司信息,查看一下查询结果可以发现有2条数据的反馈结果
# POST /sms-logs-index/_search
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "5784320",
"moblie" : 15236964578,
"corpName" : "花花派",
"smsContent" : "花开花落,魅力女性,买花选我!",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.265.19.45",
"replyTotal" : "1",
"fee" : "0.1"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "10201021",
"moblie" : 13026254898,
"corpName" : "上海智慧软件有限公司",
"smsContent" : "连接你我,智慧软件,让生活更美好",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.215.19.45",
"replyTotal" : "1",
"fee" : "0.1"
}
}
]
}
}
#步骤2 利用delete_by_query删除查询结果数据的反馈结果
# POST /sms-logs-index/_delete_by_query
{
"took" : 107,
"timed_out" : false,
"total" : 2,
"deleted" : 2,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
#步骤3 再次利用range查询fee小于0.2的公司信息,已经无信息的反馈结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
为了保证刚刚的效果,这里先用RESTFul风格代码把刚刚删除的两条记录再加回来,代码如下;
PUT /sms-logs-index/_doc/1
{
"createDate":"2020-09-16"
,"senDate":"2020-09-16"
,"longCode":"10201021"
,"moblie":13026254898
,"corpName":"上海智慧软件有限公司"
,"smsContent":"连接你我,智慧软件,让生活更美好"
,"state":"1"
,"opratorId":"1"
,"province":"上海"
,"ipAddr":"10.215.19.45"
,"replyTotal":"1"
,"fee":"0.1"
}
PUT /sms-logs-index/_doc/9
{
"createDate":"2020-09-16"
,"senDate":"2020-09-16"
,"longCode":"5784320"
,"moblie":15236964578
,"corpName":"花花派"
,"smsContent":"花开花落,魅力女性,买花选我!"
,"state":"1"
,"opratorId":"1"
,"province":"上海"
,"ipAddr":"10.265.19.45"
,"replyTotal":"1"
,"fee":"0.1"
}
Java代码如下;
static RestHighLevelClient myClient= EsClient.getClient(); //获取操作ES的
String index="sms-logs-index";
@Test
public void deleteByQuery() throws IOException {
//1.创建DeleteByQueryRequest
DeleteByQueryRequest request=new DeleteByQueryRequest(index);
//2.指定检索的条件和SearchRequest指定Query的方式不一样
request.setQuery(QueryBuilders.rangeQuery("fee").lt(0.2));
//3.指定删除
BulkByScrollResponse resp = myClient.deleteByQuery(request, RequestOptions.DEFAULT);
//4.输出返回结果
System.out.println(resp.toString());
}
Java代码实现的效果如图2所示;
bool查询
复合过滤器,将你的多个查询条件,以一定的逻辑组合在一起
-must:所有的条件,用must组合在一起,类似于逻辑判断
与
的意思
-must_not:将must_not中的条件,全部不能匹配,类似于逻辑判断非
的意思;
should:所有的条件,只要其中一条满足即可,类似于逻辑判断或
的意思;
实现要求,查询城市为北京或者杭州,运营商id不等于2的,smsContent中包含魅力或者推动的公司的短信内容;
注意RESTFUL代码稍有不慎,可能出现should失效的写法,错误示例
如下;
#查询城市为北京或者杭州
#运营商id不等于2的
#smsContent中包含魅力或者推动的
#bool查询
POST /sms-logs-index/_search
{
"query":
{
"bool":
{
"should": [
{
"terms": {
"province": [
"北京",
"杭州"
]
}
}
]
,"must_not": [
{
"term": {
"opratorId": {
"value": "2"
}
}
}
]
,"must": [
{
"match": {
"smsContent":
{
"query": "魅力 推动"
, "operator": "or"
}
}
}
]
}
}
}
可以看到结果中把上海的点也查出来了,其他条件倒是都是满足的,只是should条件失效了;当使用should查询时,如果包含了must或者filter查询,那么should的查询语句就不是或者的意思了,而是有或者没有都行的含义。但是should里面再嵌套两个must
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 2.0892315,
"hits" : [
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "5",
"_score" : 2.0892315,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "24514635",
"moblie" : 18545427895,
"corpName" : "东东集团",
"smsContent" : "数据驱动,AI推动,新零售模型让你的购买更心怡!",
"state" : "1",
"opratorId" : "1",
"province" : "北京",
"ipAddr" : "10.254.19.45",
"replyTotal" : "1",
"fee" : "6000"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "12",
"_score" : 1.73617,
"_source" : {
"createDate" : "2020-09-22",
"senDate" : "2020-09-22",
"longCode" : "123546241",
"moblie" : 156625584654,
"corpName" : "哈雷天文用具公司",
"smsContent" : "天文研究,放心推动,哈雷天文!",
"state" : "1",
"opratorId" : "3",
"province" : "杭州",
"ipAddr" : "10.289.19.45",
"replyTotal" : "6",
"fee" : "500"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "11",
"_score" : 1.6317747,
"_source" : {
"createDate" : "2020-09-22",
"senDate" : "2020-09-22",
"longCode" : "458744536",
"moblie" : 134625584654,
"corpName" : "星雨文化传媒",
"smsContent" : "魅力宣传,星雨传媒!",
"state" : "1",
"opratorId" : "3",
"province" : "杭州",
"ipAddr" : "10.289.19.45",
"replyTotal" : "6",
"fee" : "500"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "9",
"_score" : 0.56260216,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "5784320",
"moblie" : 15236964578,
"corpName" : "花花派",
"smsContent" : "花开花落,魅力女性,买花选我!",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.265.19.45",
"replyTotal" : "1",
"fee" : "0.1"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.2876821,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "87454120",
"moblie" : 13625789645,
"corpName" : "爱美化妆品有限公司",
"smsContent" : "魅力,势不可挡,爱美爱美",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.258.19.45",
"replyTotal" : "1",
"fee" : "200"
}
}
]
}
}
准确的RESTFul风格代码应该是把should嵌入到must里面,代码参考如下;
#查询城市为北京或者杭州
#运营商id不等于2的
#smsContent中包含魅力或者推动的
#bool查询
POST /sms-logs-index/_search
{
"query":
{
"bool":
{
"must_not": [
{
"term": {
"opratorId": {
"value": "2"
}
}
}
]
,"must":
[
{
"match":
{
"smsContent":
{
"query": "魅力 推动"
, "operator": "or"
}
}
}
,
{
"bool":
{
"should": [
{
"terms": {
"province": [
"北京",
"杭州"
]
}
}
]
}
}
]
}
}
}
结果如下;
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.95882,
"hits" : [
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.95882,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "24514635",
"moblie" : 18545427895,
"corpName" : "东东集团",
"smsContent" : "数据驱动,AI推动,新零售模型让你的购买更心怡!",
"state" : "1",
"opratorId" : "1",
"province" : "北京",
"ipAddr" : "10.254.19.45",
"replyTotal" : "1",
"fee" : "6000"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "11",
"_score" : 1.8187511,
"_source" : {
"createDate" : "2020-09-22",
"senDate" : "2020-09-22",
"longCode" : "458744536",
"moblie" : 134625584654,
"corpName" : "星雨文化传媒",
"smsContent" : "魅力宣传,星雨传媒!",
"state" : "1",
"opratorId" : "3",
"province" : "杭州",
"ipAddr" : "10.289.19.45",
"replyTotal" : "6",
"fee" : "500"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "12",
"_score" : 1.73617,
"_source" : {
"createDate" : "2020-09-22",
"senDate" : "2020-09-22",
"longCode" : "123546241",
"moblie" : 156625584654,
"corpName" : "哈雷天文用具公司",
"smsContent" : "天文研究,放心推动,哈雷天文!",
"state" : "1",
"opratorId" : "3",
"province" : "杭州",
"ipAddr" : "10.289.19.45",
"replyTotal" : "6",
"fee" : "500"
}
}
]
}
}
Java代码如下;
static RestHighLevelClient myClient= EsClient.getClient(); //获取操作ES的
String index="sms-logs-index";
@Test
public void BoolQuery() throws IOException {
//1.创建SearchRequest
SearchRequest request=new SearchRequest(index);
//2.指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
BoolQueryBuilder boolQuery=QueryBuilders.boolQuery();
//#查询城市为北京或者杭州
boolQuery.must(QueryBuilders.termsQuery("province","北京","杭州"));
//#运营商id不等于2的
boolQuery.mustNot(QueryBuilders.termQuery("opratorId",2));
//#smsContent中包含魅力或者推动的
boolQuery.must(QueryBuilders.matchQuery("smsContent","魅力 推动").operator(Operator.OR));
builder.query(boolQuery);
request.source(builder);
//3。职称查询
SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT);
//4.输出结果
for (SearchHit hit : resp.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
}
boosting查询
boosting查询可以帮助我们去影响查询后的score。
positive
:只有匹配上positive的查询内容,才会被放到返回的结果集中;negative
:如果匹配上和positive的内容也匹配上了negative,就可以降低这样的文档的内容;negative_boost
:指定系数,必须小于1.0;关于查询时,分数是如何计算的思路设计:
- 搜索的关键字在文档中出现的频次越高,分数就越高;
- 符合搜索内容的文档内容越短,分数越高;
- 我们在搜索时,指定的关键字也会被分词,这个被分词的内容,被分词库匹配的个数越多,分数越高。
实现要求,依据smsContent字段包含魅力
词语的文档信息,并且把查到的文档smsContent字段也包含传媒
字样的文档得分score降低;
RESTFUL代码先来看一下正常的查询得分,即实现依据smsContent字段包含魅力
词语的文档信息得分;
#实现
POST /sms-logs-index/_search
{
"query": {
"match": {
"smsContent": "魅力"
}
}
}
#结果
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.6317746,
"hits" : [
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "11",
"_score" : 0.6317746,
"_source" : {
"createDate" : "2020-09-22",
"senDate" : "2020-09-22",
"longCode" : "458744536",
"moblie" : 134625584654,
"corpName" : "星雨文化传媒",
"smsContent" : "魅力宣传,星雨传媒!",
"state" : "1",
"opratorId" : "3",
"province" : "杭州",
"ipAddr" : "10.289.19.45",
"replyTotal" : "6",
"fee" : "500"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "9",
"_score" : 0.56260216,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "5784320",
"moblie" : 15236964578,
"corpName" : "花花派",
"smsContent" : "花开花落,魅力女性,买花选我!",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.265.19.45",
"replyTotal" : "1",
"fee" : "0.1"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.2876821,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "87454120",
"moblie" : 13625789645,
"corpName" : "爱美化妆品有限公司",
"smsContent" : "魅力,势不可挡,爱美爱美",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.258.19.45",
"replyTotal" : "1",
"fee" : "200"
}
}
]
}
}
可以发现目前smsContent字段包含魅力
词语的文档信息,并且把查到的文档smsContent字段也包含传媒
字样的文档,得分最高0.6317746分,排在第一;接下来使用RESTFul风格的boosting代码和效果;
#boosting查询
POST /sms-logs-index/_search
{
"query":
{
"boosting": {
"positive": {
"match": {
"smsContent": "魅力"
}
}
, "negative": {
"match": {
"smsContent": "传媒"
}
}
, "negative_boost": 0.2
}
}
}
#效果如下
{
"took" : 33,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.73050237,
"hits" : [
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "9",
"_score" : 0.73050237,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "5784320",
"moblie" : 15236964578,
"corpName" : "花花派",
"smsContent" : "花开花落,魅力女性,买花选我!",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.265.19.45",
"replyTotal" : "1",
"fee" : "0.1"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.2876821,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "87454120",
"moblie" : 13625789645,
"corpName" : "爱美化妆品有限公司",
"smsContent" : "魅力,势不可挡,爱美爱美",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.258.19.45",
"replyTotal" : "1",
"fee" : "200"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "11",
"_score" : 0.16375022,
"_source" : {
"createDate" : "2020-09-22",
"senDate" : "2020-09-22",
"longCode" : "458744536",
"moblie" : 134625584654,
"corpName" : "星雨文化传媒",
"smsContent" : "魅力宣传,星雨传媒!",
"state" : "1",
"opratorId" : "3",
"province" : "杭州",
"ipAddr" : "10.289.19.45",
"replyTotal" : "6",
"fee" : "500"
}
}
]
}
}
这条记录的的score得分变成了是 0.16375022,排在最后;
Java代码如下;
static RestHighLevelClient myClient= EsClient.getClient(); //获取操作ES的
String index="sms-logs-index";
@Test
public void boostingQuery() throws IOException {
//1.创建SearchRequest
SearchRequest request=new SearchRequest(index);
//2.指定查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
BoostingQueryBuilder boostingQuery =QueryBuilders.boostingQuery(
QueryBuilders.matchQuery("smsContent","魅力"),
QueryBuilders.matchQuery("smsContent","传媒")
).negativeBoost(0.2f);
builder.query(boostingQuery);
request.source(builder);
request.source(builder);
//3。职称查询
SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT);
//4.输出结果
for (SearchHit hit : resp.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
}
filter查询
query,根据你的查询条件,去计算文档的匹配得到一个分数
score
,并且根据分数进行排序,不会做缓存;
filter,根据你的查询条件去查询文档,不去计算分数,而且filter会对经常被过滤的数据进行缓存,方便下次快速定位查询;
如果你的查询比较精准,即不太在乎匹配数据的分数score
,建议使用filter,反之,如果匹配条件不确定,需要依赖分数score
来进行产讯结果的排序,则用query;
不依赖分数score的情况下,filter的性能优于query;
实现要求,依据smsContent字段包含魅力的以及fee消费小于400的filter查询公司的短信内容;
RESTFUL代码如下;
POST /sms-logs-index/_search
{
"query": {
"bool": {
"filter":
[
{
"term":
{
"smsContent": "魅力"
}
}
,
{
"range":
{
"fee":
{
"lte": 400
}
}
}
]
}
}
}
RESTFUL代码实现的结果如下,注意看,这些记录的score
都是0.0
,说明没有进行分数统计,如下;
{
"took" : 81,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "9",
"_score" : 0.0,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "5784320",
"moblie" : 15236964578,
"corpName" : "花花派",
"smsContent" : "花开花落,魅力女性,买花选我!",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.265.19.45",
"replyTotal" : "1",
"fee" : "0.1"
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.0,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "87454120",
"moblie" : 13625789645,
"corpName" : "爱美化妆品有限公司",
"smsContent" : "魅力,势不可挡,爱美爱美",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.258.19.45",
"replyTotal" : "1",
"fee" : "200"
}
}
]
}
}
Java代码如下;
static RestHighLevelClient myClient= EsClient.getClient(); //获取操作ES的
String index="sms-logs-index";
@Test
public void filter() throws IOException {
//1.SearchRequest
SearchRequest request=new SearchRequest(index);
//2.查询条件
SearchSourceBuilder builder=new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder=QueryBuilders.boolQuery();
boolQueryBuilder.filter(QueryBuilders.termQuery("smsContent","魅力"));
boolQueryBuilder.filter(QueryBuilders.rangeQuery("fee").lte(400));
builder.query(boolQueryBuilder);
request.source(builder);
//3.执行查询
SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT);
//.返回结果
for (SearchHit hit : resp.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
}
Java代码实现filter的结果如下图5;
highlight(高亮)查询
高亮查询就是你输入的关键字,以一定的特殊字符样式展示给用户,让用户知道为什么这个结果被检索出来,效果展示如图6。
高亮展示的数据,本身九十文档中的一个field,单独讲field以highlight的形式返回给你。
ES提供了一个highlight属性,和query同级别的。
fragment_size
:指定高亮数据展示多少个字符回来;pe_tag
:指定前缀标签,举个栗子<font color="red">
post_tags
:指定后缀标签,举个栗子</font>
·field
:指定那个字段为高亮字段
实现要求,依据smsContent字段包含的魅力字段语法高;
RESTFUL代码如下;
POST /sms-logs-index/_search
{
"query": {
"match": {
"smsContent": "魅力"
}
}
, "highlight":
{
"fields": {
"smsContent": {
}
}
, "pre_tags": "<font color='red'>"
, "post_tags": "</font>"
,"fragment_size":10
}
}
RESTFUL代码实现的结果如下,可以发现他并没有改变返回结果本身,而是在第二个hits
同级别的下面多个highlight
标签,里面的内容就是运用于高亮的html语法,将结果copy到txt文件,把txt后缀的文件改成html后缀,再使用Chrome浏览器打开该文件,就可以,看到图7的效果;
{
"took" : 121,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.81875104,
"hits" : [
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "11",
"_score" : 0.81875104,
"_source" : {
"createDate" : "2020-09-22",
"senDate" : "2020-09-22",
"longCode" : "458744536",
"moblie" : 134625584654,
"corpName" : "星雨文化传媒",
"smsContent" : "魅力宣传,星雨传媒!",
"state" : "1",
"opratorId" : "3",
"province" : "杭州",
"ipAddr" : "10.289.19.45",
"replyTotal" : "6",
"fee" : "500"
},
"highlight" : {
"smsContent" : [
"<font color='red'>魅力</font>宣传,星雨传媒!"
]
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "9",
"_score" : 0.73050237,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "5784320",
"moblie" : 15236964578,
"corpName" : "花花派",
"smsContent" : "花开花落,魅力女性,买花选我!",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.265.19.45",
"replyTotal" : "1",
"fee" : "0.1"
},
"highlight" : {
"smsContent" : [
"花开花落,<font color='red'>魅力</font>女性,买花选我"
]
}
},
{
"_index" : "sms-logs-index",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.2876821,
"_source" : {
"createDate" : "2020-09-16",
"senDate" : "2020-09-16",
"longCode" : "87454120",
"moblie" : 13625789645,
"corpName" : "爱美化妆品有限公司",
"smsContent" : "魅力,势不可挡,爱美爱美",
"state" : "1",
"opratorId" : "1",
"province" : "上海",
"ipAddr" : "10.258.19.45",
"replyTotal" : "1",
"fee" : "200"
},
"highlight" : {
"smsContent" : [
"<font color='red'>魅力</font>,势不可挡,爱美爱美"
]
}
}
]
}
}
Java代码如下;
static RestHighLevelClient myClient= EsClient.getClient(); //获取操作ES的
String index="sms-logs-index";
@Test
public void highLigtQuery() throws IOException {
//1.SearchRequest
SearchRequest request=new SearchRequest(index);
//2.查询条件
SearchSourceBuilder builder =new SearchSourceBuilder();
builder.query(QueryBuilders.matchQuery("smsContent","魅力"));
//2.1 添加高亮
HighlightBuilder highlightBuilder =new HighlightBuilder();
highlightBuilder.field("smsContent",10).preTags("<font color='red'>").postTags("</font>");
builder.highlighter(highlightBuilder);
request.source(builder);
//3.执行查询
SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT);
//4.返回结果
for (SearchHit hit : resp.getHits().getHits()) {
System.out.println(hit.getHighlightFields().get("smsContent"));
}
}
Java代码实现的效果如图8如下;
本文参考链接:https://blog.csdn.net/LXWalaz1s1s/article/details/108975817