ElasticSearch查询笔记目录

  涉及的常用查询内容较多,将分多个章节进行笔记整理,具体如下:

  1. ElasticSearch查询学习笔记章节1——term,terms,match,id查询

   主要是依据精准的查询条件来查询,查询速度快,也是最常用的几类查询方式,具体种类如下:

  • term查询
  • terms查询
  • match_all查询
  • match查询
  • 布尔match查询
  • multi_match查询
  • 根据文档id查询(单个id)
  • 根据文档ids查询(多个id)
  1. ElasticSearch查询学习笔记章节2——prefix,fuzzy,wildcard,range,regexp查询

  主要是涉及ElasticSearch查询条件相对模糊,查询速度相对慢,实时查询时应尽量避免这些方式,但是这些查询方式又具有自己独特不可代替的功能,还是还有必要,具体如下:

  • prefix查询
  • fuzzy查询
  • wildcard查询
  • range查询
  • regexp查询
  1. ElasticSearch查询学习笔记章节3——scroll,delete-by-query,bool,boosting,filter,highlight查询

  主要涉及ElasticSearch的一些常用的杂项查询;

  • 深分页scroll查询
  • delete-by-query
  • bool查询
  • boosting查询
  • filter查询
  • highlight(高亮)查询
  1. ElasticSearch查询学习笔记章节4——cardinality,range,extended_stats聚合统计aggregations查询

  主要涉及ES的聚合查询Aggregations;

  • cardinality(去重计数)查询
  • range(范围统计)查询
  • extended_stats(统计聚合)查询
  1. ElasticSearch查询学习笔记章节5——geo_distance,geo_bounding_box,geo_polygon地图检索geo查询

.   主要涉及ES的地图检索geo相关的查询;

  • geo_distance查询
  • geo_bounding_box查询
  • geo_polygon查询

整体Java代码的测试用例项目

  整个章节的Java代码放在CSDN资源ElasticSearch常用查询的Java实现;路径效果如下图,欢迎下载访问;在这里插入图片描述

深分页scroll查询

之前讲过from+size的分页,为何又有scroll+size的深分页呢?这里先对比一下两者的区别;
from+size在ES查询数据的方式步骤如下:

  1. 先将用户指定的关键字进行分词;
  2. 将词汇去分词库中进行检索,得到多个文档的id;
  3. 去各个分片中拉取指定的数据,相对耗时较长;
  4. 将数据根据score进行排序,耗时相对较长;
  5. 根据from,size的值,截取满足条件的查询到的数据;
  6. 返回结果;
    优点:每次都能获取到最新的记录;
    缺点:同一个查询,展示另一页的from+size时,以上步骤需要再来一遍;

scoll+size在ES查询数据的方式:

  1. 先将用户指定的关键字进行分词;
  2. 将词汇去分词库中进行检索,得到多个文档的id;
  3. 将文档的id存放在内存的一个ES的上下文中;
  4. 根据你指定的size的个数去ES上下文中检索指定个数的数据,拿完了数据的文档id,会从上下文中移除;
  5. 如果需要下一页数据,直接去ES的上下文中,找后续内容;
  6. 循环第4步,第五步,直到数据都取完了;
    优点:数据缓存进了内存,速度快,同一个查询,展示另一页的scoll+size时,只需要循环4,5步;
    缺点:冷加载,不适合做实时,当数据更新时,内存中的上下文id数据不会更新;

  实现要求,依据fee字段和moblie字段倒序按照每一页2条scroll查询公司信息;

  RESTFUL代码如下;

#步骤1 scoll 查询,返回第一页数据,将ES的id存放在上下文中 
#参数scroll=2m表示scroll查询的上下文在内存中存放2分钟,不指定默认生存时间为0,当超时,会自动删除上下文,则下面的步骤23会查询报错 
#指定size为2 
#scroll可以指定字段排序,默认按照文档id排序 
POST /sms-logs-index/_search?scroll=2m 
{
    
  "query": {
    
    "match_all": {
   } 
  } 
  , "size": 2 
  , "sort": [ 
    {
    
      "fee": {
    
        "order": "desc" 
      } 
     ,"moblie": {
    
       "order": "desc" 
    } 
  ] 
} 
 
#步骤2 根据scroll查询下一页数量,再下一页的话再执行下此语句,再下一页再再执行,直到结束或超时; 
# scroll_id指的是上面的查询结果 
# scroll还是要继续指定上下文在内存中缓存2分钟 
 
POST /_search/scroll 
{
    
  "scroll_id":"FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR" 
 ,"scroll":"2m" 
} 
 
# 步骤3 删除scroll在es上下文中的数量 
# 可能我查到第一页就知道了结果,对后面的分页不感兴趣了,我想提前删除scroll中的上下文 
DELETE /_search/scroll/FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR 
 

  RESTFUL代码查询结果如下;

#步骤1 scoll 查询结果 
{
    
  "_scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR", 
  "took" : 7, 
  "timed_out" : false, 
  "_shards" : {
    
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : {
    
    "total" : {
    
      "value" : 12, 
      "relation" : "eq" 
    }, 
    "max_score" : null, 
    "hits" : [ 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "5", 
        "_score" : null, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "24514635", 
          "moblie" : 18545427895, 
          "corpName" : "东东集团", 
          "smsContent" : "数据驱动,AI推动,新零售模型让你的购买更心怡!", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "北京", 
          "ipAddr" : "10.254.19.45", 
          "replyTotal" : "1", 
          "fee" : "6000" 
        }, 
        "sort" : [ 
          6000.0 
        ] 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "10", 
        "_score" : null, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "54784641", 
          "moblie" : 15625584654, 
          "corpName" : "勾股科技有限公司", 
          "smsContent" : "智能算法,智慧生活,勾股科技!", 
          "state" : "1", 
          "opratorId" : "2", 
          "province" : "杭州", 
          "ipAddr" : "10.215.19.45", 
          "replyTotal" : "6", 
          "fee" : "4000" 
        }, 
        "sort" : [ 
          4000.0 
        ] 
      } 
    ] 
  } 
} 
#步骤2 根据scroll查询下一页数量结果 
 
{
    
  "_scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR", 
  "took" : 8, 
  "timed_out" : false, 
  "_shards" : {
    
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : {
    
    "total" : {
    
      "value" : 12, 
      "relation" : "eq" 
    }, 
    "max_score" : null, 
    "hits" : [ 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "7", 
        "_score" : null, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "33656412674", 
          "moblie" : 18956451203, 
          "corpName" : "华丽网集团", 
          "smsContent" : "网络安全,华丽靠谱!", 
          "state" : "1", 
          "opratorId" : "3", 
          "province" : "上海", 
          "ipAddr" : "10.215.254.45", 
          "replyTotal" : "1", 
          "fee" : "2000" 
        }, 
        "sort" : [ 
          2000.0 
        ] 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "11", 
        "_score" : null, 
        "_source" : {
    
          "createDate" : "2020-09-22", 
          "senDate" : "2020-09-22", 
          "longCode" : "458744536", 
          "moblie" : 134625584654, 
          "corpName" : "星雨文化传媒", 
          "smsContent" : "魅力宣传,星雨传媒!", 
          "state" : "1", 
          "opratorId" : "3", 
          "province" : "杭州", 
          "ipAddr" : "10.289.19.45", 
          "replyTotal" : "6", 
          "fee" : "500" 
        }, 
        "sort" : [ 
          500.0 
        ] 
      } 
    ] 
  } 
} 
 
# 步骤3 删除scroll在es上下文中的数量结果 
{
    
  "succeeded" : true, 
  "num_freed" : 5 
} 
 

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的 
    String index="sms-logs-index"; 
 
    @Test 
    public void scrollQuery() throws IOException 
    {
    
        //1. 创建SearchRequest 
        SearchRequest request=new SearchRequest(index); 
 
        //2.指定scroll鑫鑫 
        request.scroll(TimeValue.timeValueMinutes(2L)); 
 
        //3.指定查询条件 
        SearchSourceBuilder builder =new SearchSourceBuilder(); 
        builder.size(4); 
        builder.sort("fee", SortOrder.DESC); 
        builder.query(QueryBuilders.matchAllQuery()); 
        request.source(builder); 
 
        //4. 获取返回结果scrollId,source的首页信息 
        SearchResponse response = myClient.search(request, RequestOptions.DEFAULT); 
        String scrollId = response.getScrollId(); 
        System.out.println("-----------------------首页----------------------------"); 
        for (SearchHit hit : response.getHits().getHits()) {
    
            System.out.println(hit.getSourceAsMap()); 
        } 
 
        while (true) 
        {
    
            //5.循环-创建SearchSrollRequest 
            SearchScrollRequest scrollRequest=new SearchScrollRequest(scrollId); 
 
            //6.指定scrollId的生存时间 
            scrollRequest.scroll(TimeValue.timeValueMinutes(2L)); 
 
            //7.执行查询获取返回结果 
            SearchResponse scrollResp=myClient.scroll(scrollRequest,RequestOptions.DEFAULT); 
 
            //8.判断这一页是否还有数据,有则输出,没有则跳出循环 
            SearchHit[] hits = scrollResp.getHits().getHits(); 
            if(hits != null && hits.length>0) 
            {
    
                System.out.println("-----------------------下一页----------------------------"); 
                for (SearchHit hit : hits) {
    
                    System.out.println(hit.getSourceAsMap()); 
                } 
            } 
            else 
            {
    
                //9。判断没有查询到数据-退出循环 
                System.out.println("-----------------------结束----------------------------"); 
                break; 
            } 
 
        } 
 
        //10.创建ClearScrollRequest 
        ClearScrollRequest clearScrollRequest=new ClearScrollRequest(); 
 
        //11.指定ScrollId 
        clearScrollRequest.addScrollId(scrollId); 
 
        //12.删除ScrollId 
        ClearScrollResponse clearScrollResponse =myClient.clearScroll(clearScrollRequest,RequestOptions.DEFAULT); 
 
        //13.输出结果 
        System.out.println("删除scroll:"+clearScrollResponse.isSucceeded()); 
 
    } 

  Java代码实现的结果如下图;

在这里插入图片描述

图1 Java代码实现scroll深分页的查询结果

delete-by-query

根据term,match等查询方式去删除大量的文档
注意:如果需要删除的内容,是该index下的大部分数据,推荐逆向思维,即新建一个新的index,将保留的文档内容添加到新的index,然后再直接访问新的index即可。

  实现要求,依据利用range查询fee小于0.2的公司信息,并将这些数据删除。

  RESTFUL代码如下;

#步骤1 利用range查询fee小于0.2的公司信息,查看一下查询结果可以发现有2条数据 
POST /sms-logs-index/_search 
{
    
  "query": {
    
    "range": {
    
      "fee": {
    
        "lt": 0.2 
      } 
    } 
  } 
} 
 
#步骤2 利用delete_by_query删除查询结果数据 
POST /sms-logs-index/_delete_by_query 
{
    
    "query":  
    {
    
    "range":  
    {
    
      "fee":  
      {
    
        "lt": 0.2 
      } 
    } 
  } 
} 
 
#步骤3 再次利用range查询fee小于0.2的公司信息,已经无信息 
POST /sms-logs-index/_search 
{
    
  "query": {
    
    "range": {
    
      "fee": {
    
        "lt": 0.2 
      } 
    } 
  } 
} 
 
 
 
 
 

  RESTFUL代码查询结果如下;

#步骤1 利用range查询fee小于0.2的公司信息,查看一下查询结果可以发现有2条数据的反馈结果 
# POST /sms-logs-index/_search 
{
    
  "took" : 2, 
  "timed_out" : false, 
  "_shards" : {
    
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : {
    
    "total" : {
    
      "value" : 2, 
      "relation" : "eq" 
    }, 
    "max_score" : 1.0, 
    "hits" : [ 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "9", 
        "_score" : 1.0, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "5784320", 
          "moblie" : 15236964578, 
          "corpName" : "花花派", 
          "smsContent" : "花开花落,魅力女性,买花选我!", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.265.19.45", 
          "replyTotal" : "1", 
          "fee" : "0.1" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "1", 
        "_score" : 1.0, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "10201021", 
          "moblie" : 13026254898, 
          "corpName" : "上海智慧软件有限公司", 
          "smsContent" : "连接你我,智慧软件,让生活更美好", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.215.19.45", 
          "replyTotal" : "1", 
          "fee" : "0.1" 
        } 
      } 
    ] 
  } 
} 
 
#步骤2 利用delete_by_query删除查询结果数据的反馈结果 
# POST /sms-logs-index/_delete_by_query 
{
    
  "took" : 107, 
  "timed_out" : false, 
  "total" : 2, 
  "deleted" : 2, 
  "batches" : 1, 
  "version_conflicts" : 0, 
  "noops" : 0, 
  "retries" : {
    
    "bulk" : 0, 
    "search" : 0 
  }, 
  "throttled_millis" : 0, 
  "requests_per_second" : -1.0, 
  "throttled_until_millis" : 0, 
  "failures" : [ ] 
} 
 
#步骤3 再次利用range查询fee小于0.2的公司信息,已经无信息的反馈结果 
{
    
  "took" : 1, 
  "timed_out" : false, 
  "_shards" : {
    
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : {
    
    "total" : {
    
      "value" : 0, 
      "relation" : "eq" 
    }, 
    "max_score" : null, 
    "hits" : [ ] 
  } 
} 
 

  为了保证刚刚的效果,这里先用RESTFul风格代码把刚刚删除的两条记录再加回来,代码如下;

 
PUT /sms-logs-index/_doc/1 
{
    
 
"createDate":"2020-09-16" 
,"senDate":"2020-09-16" 
,"longCode":"10201021" 
,"moblie":13026254898 
,"corpName":"上海智慧软件有限公司" 
,"smsContent":"连接你我,智慧软件,让生活更美好" 
,"state":"1" 
,"opratorId":"1" 
,"province":"上海" 
,"ipAddr":"10.215.19.45" 
,"replyTotal":"1" 
,"fee":"0.1" 
} 
 
 
PUT /sms-logs-index/_doc/9 
{
    
 
"createDate":"2020-09-16" 
,"senDate":"2020-09-16" 
,"longCode":"5784320" 
,"moblie":15236964578 
,"corpName":"花花派" 
,"smsContent":"花开花落,魅力女性,买花选我!" 
,"state":"1" 
,"opratorId":"1" 
,"province":"上海" 
,"ipAddr":"10.265.19.45" 
,"replyTotal":"1" 
,"fee":"0.1" 
} 

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的 
    String index="sms-logs-index"; 
 
    @Test 
    public void deleteByQuery() throws IOException {
    
        //1.创建DeleteByQueryRequest 
        DeleteByQueryRequest request=new DeleteByQueryRequest(index); 
 
        //2.指定检索的条件和SearchRequest指定Query的方式不一样 
        request.setQuery(QueryBuilders.rangeQuery("fee").lt(0.2)); 
 
        //3.指定删除 
        BulkByScrollResponse resp = myClient.deleteByQuery(request, RequestOptions.DEFAULT); 
 
        //4.输出返回结果 
        System.out.println(resp.toString()); 
 
    } 
 
 

  Java代码实现的效果如图2所示;
在这里插入图片描述

图2 Java代码实现delete-by-query结果反馈

bool查询

复合过滤器,将你的多个查询条件,以一定的逻辑组合在一起

-must:所有的条件,用must组合在一起,类似于逻辑判断的意思
-must_not:将must_not中的条件,全部不能匹配,类似于逻辑判断的意思;
should:所有的条件,只要其中一条满足即可,类似于逻辑判断的意思;

  实现要求,查询城市为北京或者杭州,运营商id不等于2的,smsContent中包含魅力或者推动的公司的短信内容;

  注意RESTFUL代码稍有不慎,可能出现should失效的写法,错误示例如下;

#查询城市为北京或者杭州 
#运营商id不等于2的 
#smsContent中包含魅力或者推动的 
#bool查询 
POST /sms-logs-index/_search 
{
    
  "query":  
  {
    
    "bool":  
    {
    
      "should": [ 
        {
    
          "terms": {
    
            "province": [ 
              "北京", 
              "杭州" 
            ] 
          } 
     
        } 
      ] 
      ,"must_not": [ 
        {
    
          "term": {
    
            "opratorId": {
    
              "value": "2" 
            } 
          } 
        } 
      ] 
      ,"must": [ 
        {
    
          "match": {
    
            "smsContent":  
            {
    
              "query": "魅力 推动" 
              , "operator": "or" 
            } 
          } 
        } 
      ] 
    } 
  } 
} 

  可以看到结果中把上海的点也查出来了,其他条件倒是都是满足的,只是should条件失效了;当使用should查询时,如果包含了must或者filter查询,那么should的查询语句就不是或者的意思了,而是有或者没有都行的含义。但是should里面再嵌套两个must

{
    
  "took" : 3, 
  "timed_out" : false, 
  "_shards" : {
    
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : {
    
    "total" : {
    
      "value" : 5, 
      "relation" : "eq" 
    }, 
    "max_score" : 2.0892315, 
    "hits" : [ 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "5", 
        "_score" : 2.0892315, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "24514635", 
          "moblie" : 18545427895, 
          "corpName" : "东东集团", 
          "smsContent" : "数据驱动,AI推动,新零售模型让你的购买更心怡!", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "北京", 
          "ipAddr" : "10.254.19.45", 
          "replyTotal" : "1", 
          "fee" : "6000" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "12", 
        "_score" : 1.73617, 
        "_source" : {
    
          "createDate" : "2020-09-22", 
          "senDate" : "2020-09-22", 
          "longCode" : "123546241", 
          "moblie" : 156625584654, 
          "corpName" : "哈雷天文用具公司", 
          "smsContent" : "天文研究,放心推动,哈雷天文!", 
          "state" : "1", 
          "opratorId" : "3", 
          "province" : "杭州", 
          "ipAddr" : "10.289.19.45", 
          "replyTotal" : "6", 
          "fee" : "500" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "11", 
        "_score" : 1.6317747, 
        "_source" : {
    
          "createDate" : "2020-09-22", 
          "senDate" : "2020-09-22", 
          "longCode" : "458744536", 
          "moblie" : 134625584654, 
          "corpName" : "星雨文化传媒", 
          "smsContent" : "魅力宣传,星雨传媒!", 
          "state" : "1", 
          "opratorId" : "3", 
          "province" : "杭州", 
          "ipAddr" : "10.289.19.45", 
          "replyTotal" : "6", 
          "fee" : "500" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "9", 
        "_score" : 0.56260216, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "5784320", 
          "moblie" : 15236964578, 
          "corpName" : "花花派", 
          "smsContent" : "花开花落,魅力女性,买花选我!", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.265.19.45", 
          "replyTotal" : "1", 
          "fee" : "0.1" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "4", 
        "_score" : 0.2876821, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "87454120", 
          "moblie" : 13625789645, 
          "corpName" : "爱美化妆品有限公司", 
          "smsContent" : "魅力,势不可挡,爱美爱美", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.258.19.45", 
          "replyTotal" : "1", 
          "fee" : "200" 
        } 
      } 
    ] 
  } 
} 
 

  准确的RESTFul风格代码应该是把should嵌入到must里面,代码参考如下;

   
#查询城市为北京或者杭州 
#运营商id不等于2的 
#smsContent中包含魅力或者推动的 
#bool查询 
POST /sms-logs-index/_search 
{
    
  "query":  
  {
    
    "bool":  
    {
    
      "must_not": [ 
        {
    
          "term": {
    
            "opratorId": {
    
              "value": "2" 
            } 
          } 
        } 
      ] 
      ,"must":  
      [ 
        {
    
          "match":  
          {
    
            "smsContent":  
            {
    
              "query": "魅力 推动" 
              , "operator": "or" 
            } 
          } 
        } 
        , 
        {
    
          "bool":  
          {
    
            "should": [ 
              {
    
                "terms": {
    
                  "province": [ 
                    "北京", 
                    "杭州" 
                  ] 
                } 
              } 
            ] 
          } 
        } 
      ] 
    } 
  } 
} 
 

  结果如下;

{
    
  "took" : 11, 
  "timed_out" : false, 
  "_shards" : {
    
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : {
    
    "total" : {
    
      "value" : 3, 
      "relation" : "eq" 
    }, 
    "max_score" : 1.95882, 
    "hits" : [ 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "5", 
        "_score" : 1.95882, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "24514635", 
          "moblie" : 18545427895, 
          "corpName" : "东东集团", 
          "smsContent" : "数据驱动,AI推动,新零售模型让你的购买更心怡!", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "北京", 
          "ipAddr" : "10.254.19.45", 
          "replyTotal" : "1", 
          "fee" : "6000" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "11", 
        "_score" : 1.8187511, 
        "_source" : {
    
          "createDate" : "2020-09-22", 
          "senDate" : "2020-09-22", 
          "longCode" : "458744536", 
          "moblie" : 134625584654, 
          "corpName" : "星雨文化传媒", 
          "smsContent" : "魅力宣传,星雨传媒!", 
          "state" : "1", 
          "opratorId" : "3", 
          "province" : "杭州", 
          "ipAddr" : "10.289.19.45", 
          "replyTotal" : "6", 
          "fee" : "500" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "12", 
        "_score" : 1.73617, 
        "_source" : {
    
          "createDate" : "2020-09-22", 
          "senDate" : "2020-09-22", 
          "longCode" : "123546241", 
          "moblie" : 156625584654, 
          "corpName" : "哈雷天文用具公司", 
          "smsContent" : "天文研究,放心推动,哈雷天文!", 
          "state" : "1", 
          "opratorId" : "3", 
          "province" : "杭州", 
          "ipAddr" : "10.289.19.45", 
          "replyTotal" : "6", 
          "fee" : "500" 
        } 
      } 
    ] 
  } 
} 
 

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的 
    String index="sms-logs-index"; 
 
    @Test 
    public void BoolQuery() throws IOException {
    
        //1.创建SearchRequest 
        SearchRequest request=new SearchRequest(index); 
 
        //2.指定查询条件 
        SearchSourceBuilder builder=new SearchSourceBuilder(); 
        BoolQueryBuilder boolQuery=QueryBuilders.boolQuery(); 
        //#查询城市为北京或者杭州 
        boolQuery.must(QueryBuilders.termsQuery("province","北京","杭州")); 
 
        //#运营商id不等于2的 
        boolQuery.mustNot(QueryBuilders.termQuery("opratorId",2)); 
 
        //#smsContent中包含魅力或者推动的 
        boolQuery.must(QueryBuilders.matchQuery("smsContent","魅力 推动").operator(Operator.OR)); 
 
 
        builder.query(boolQuery); 
        request.source(builder); 
        //3。职称查询 
        SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT); 
 
        //4.输出结果 
        for (SearchHit hit : resp.getHits().getHits()) {
    
            System.out.println(hit.getSourceAsMap()); 
        } 
    } 
 

在这里插入图片描述

图3 Java代码实现bool查询结果反馈

boosting查询

boosting查询可以帮助我们去影响查询后的score。

  • positive:只有匹配上positive的查询内容,才会被放到返回的结果集中;
  • negative:如果匹配上和positive的内容也匹配上了negative,就可以降低这样的文档的内容;
  • negative_boost:指定系数,必须小于1.0;

关于查询时,分数是如何计算的思路设计:

  • 搜索的关键字在文档中出现的频次越高,分数就越高;
  • 符合搜索内容的文档内容越短,分数越高;
  • 我们在搜索时,指定的关键字也会被分词,这个被分词的内容,被分词库匹配的个数越多,分数越高。

  实现要求,依据smsContent字段包含魅力词语的文档信息,并且把查到的文档smsContent字段也包含传媒字样的文档得分score降低;

  RESTFUL代码先来看一下正常的查询得分,即实现依据smsContent字段包含魅力词语的文档信息得分;

#实现 
POST /sms-logs-index/_search 
{
    
  "query": {
    
    "match": {
    
      "smsContent": "魅力" 
    } 
  } 
} 
 
 
#结果 
{
    
  "took" : 3, 
  "timed_out" : false, 
  "_shards" : {
    
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : {
    
    "total" : {
    
      "value" : 3, 
      "relation" : "eq" 
    }, 
    "max_score" : 0.6317746, 
    "hits" : [ 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "11", 
        "_score" : 0.6317746, 
        "_source" : {
    
          "createDate" : "2020-09-22", 
          "senDate" : "2020-09-22", 
          "longCode" : "458744536", 
          "moblie" : 134625584654, 
          "corpName" : "星雨文化传媒", 
          "smsContent" : "魅力宣传,星雨传媒!", 
          "state" : "1", 
          "opratorId" : "3", 
          "province" : "杭州", 
          "ipAddr" : "10.289.19.45", 
          "replyTotal" : "6", 
          "fee" : "500" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "9", 
        "_score" : 0.56260216, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "5784320", 
          "moblie" : 15236964578, 
          "corpName" : "花花派", 
          "smsContent" : "花开花落,魅力女性,买花选我!", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.265.19.45", 
          "replyTotal" : "1", 
          "fee" : "0.1" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "4", 
        "_score" : 0.2876821, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "87454120", 
          "moblie" : 13625789645, 
          "corpName" : "爱美化妆品有限公司", 
          "smsContent" : "魅力,势不可挡,爱美爱美", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.258.19.45", 
          "replyTotal" : "1", 
          "fee" : "200" 
        } 
      } 
    ] 
  } 
} 
 
 
 

  可以发现目前smsContent字段包含魅力词语的文档信息,并且把查到的文档smsContent字段也包含传媒字样的文档,得分最高0.6317746分,排在第一;接下来使用RESTFul风格的boosting代码和效果;

#boosting查询 
POST /sms-logs-index/_search 
{
    
  "query":  
  {
    
    "boosting": {
    
      "positive": {
    
        "match": {
    
          "smsContent": "魅力" 
        } 
      } 
      , "negative": {
    
        "match": {
    
          "smsContent": "传媒" 
        } 
      } 
      , "negative_boost": 0.2 
    } 
     
  } 
} 
 
 
#效果如下 
{
    
  "took" : 33, 
  "timed_out" : false, 
  "_shards" : {
    
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : {
    
    "total" : {
    
      "value" : 3, 
      "relation" : "eq" 
    }, 
    "max_score" : 0.73050237, 
    "hits" : [ 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "9", 
        "_score" : 0.73050237, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "5784320", 
          "moblie" : 15236964578, 
          "corpName" : "花花派", 
          "smsContent" : "花开花落,魅力女性,买花选我!", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.265.19.45", 
          "replyTotal" : "1", 
          "fee" : "0.1" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "4", 
        "_score" : 0.2876821, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "87454120", 
          "moblie" : 13625789645, 
          "corpName" : "爱美化妆品有限公司", 
          "smsContent" : "魅力,势不可挡,爱美爱美", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.258.19.45", 
          "replyTotal" : "1", 
          "fee" : "200" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "11", 
        "_score" : 0.16375022, 
        "_source" : {
    
          "createDate" : "2020-09-22", 
          "senDate" : "2020-09-22", 
          "longCode" : "458744536", 
          "moblie" : 134625584654, 
          "corpName" : "星雨文化传媒", 
          "smsContent" : "魅力宣传,星雨传媒!", 
          "state" : "1", 
          "opratorId" : "3", 
          "province" : "杭州", 
          "ipAddr" : "10.289.19.45", 
          "replyTotal" : "6", 
          "fee" : "500" 
        } 
      } 
    ] 
  } 
} 
 
 

  这条记录的的score得分变成了是 0.16375022,排在最后;

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的 
    String index="sms-logs-index"; 
 
 
    @Test 
    public void boostingQuery() throws IOException {
    
        //1.创建SearchRequest 
        SearchRequest request=new SearchRequest(index); 
 
        //2.指定查询条件 
        SearchSourceBuilder builder=new SearchSourceBuilder(); 
        BoostingQueryBuilder boostingQuery =QueryBuilders.boostingQuery( 
                QueryBuilders.matchQuery("smsContent","魅力"), 
                QueryBuilders.matchQuery("smsContent","传媒") 
        ).negativeBoost(0.2f); 
 
        builder.query(boostingQuery); 
        request.source(builder); 
 
        request.source(builder); 
        //3。职称查询 
        SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT); 
 
        //4.输出结果 
        for (SearchHit hit : resp.getHits().getHits()) {
    
            System.out.println(hit.getSourceAsMap()); 
        } 
    } 
 

在这里插入图片描述

图4 Java代码实现boosting查询结果反馈

filter查询

query,根据你的查询条件,去计算文档的匹配得到一个分数score,并且根据分数进行排序,不会做缓存;
filter,根据你的查询条件去查询文档,不去计算分数,而且filter会对经常被过滤的数据进行缓存,方便下次快速定位查询;
如果你的查询比较精准,即不太在乎匹配数据的分数score,建议使用filter,反之,如果匹配条件不确定,需要依赖分数score来进行产讯结果的排序,则用query;
不依赖分数score的情况下,filter的性能优于query;

  实现要求,依据smsContent字段包含魅力的以及fee消费小于400的filter查询公司的短信内容;

  RESTFUL代码如下;

POST /sms-logs-index/_search 
{
    
  "query": {
    
    "bool": {
    
      "filter":  
      [ 
        {
    
          "term":  
          {
    
            "smsContent": "魅力" 
          } 
           
        } 
        ,  
        {
    
          "range":  
          {
    
            "fee": 
            {
    
              "lte": 400 
            } 
          } 
        } 
      ] 
    } 
  } 
} 
 
 

  RESTFUL代码实现的结果如下,注意看,这些记录的score都是0.0,说明没有进行分数统计,如下;

{
    
  "took" : 81, 
  "timed_out" : false, 
  "_shards" : {
    
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : {
    
    "total" : {
    
      "value" : 2, 
      "relation" : "eq" 
    }, 
    "max_score" : 0.0, 
    "hits" : [ 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "9", 
        "_score" : 0.0, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "5784320", 
          "moblie" : 15236964578, 
          "corpName" : "花花派", 
          "smsContent" : "花开花落,魅力女性,买花选我!", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.265.19.45", 
          "replyTotal" : "1", 
          "fee" : "0.1" 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "4", 
        "_score" : 0.0, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "87454120", 
          "moblie" : 13625789645, 
          "corpName" : "爱美化妆品有限公司", 
          "smsContent" : "魅力,势不可挡,爱美爱美", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.258.19.45", 
          "replyTotal" : "1", 
          "fee" : "200" 
        } 
      } 
    ] 
  } 
} 
 

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的 
    String index="sms-logs-index"; 
 
    @Test 
    public void filter() throws IOException {
    
        //1.SearchRequest 
        SearchRequest request=new SearchRequest(index); 
 
        //2.查询条件 
        SearchSourceBuilder builder=new SearchSourceBuilder(); 
        BoolQueryBuilder boolQueryBuilder=QueryBuilders.boolQuery(); 
        boolQueryBuilder.filter(QueryBuilders.termQuery("smsContent","魅力")); 
        boolQueryBuilder.filter(QueryBuilders.rangeQuery("fee").lte(400)); 
        builder.query(boolQueryBuilder); 
        request.source(builder); 
 
        //3.执行查询 
        SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT); 
 
        //.返回结果 
        for (SearchHit hit : resp.getHits().getHits()) {
    
            System.out.println(hit.getSourceAsMap()); 
 
        } 
    } 

  Java代码实现filter的结果如下图5;
在这里插入图片描述

图5 Java实现filter的效果

highlight(高亮)查询

高亮查询就是你输入的关键字,以一定的特殊字符样式展示给用户,让用户知道为什么这个结果被检索出来,效果展示如图6。
高亮展示的数据,本身九十文档中的一个field,单独讲field以highlight的形式返回给你。
ES提供了一个highlight属性,和query同级别的。

  • fragment_size :指定高亮数据展示多少个字符回来;
  • pe_tag:指定前缀标签,举个栗子<font color="red">
  • post_tags:指定后缀标签,举个栗子</font>·
  • field:指定那个字段为高亮字段

在这里插入图片描述

图6 高亮查询的含义效果

  实现要求,依据smsContent字段包含的魅力字段语法高

  RESTFUL代码如下;

 
POST /sms-logs-index/_search 
{
    
  "query": {
    
    "match": {
    
      "smsContent": "魅力" 
    } 
  } 
  , "highlight":  
  {
    
    "fields": {
    
      "smsContent": {
   } 
    } 
    , "pre_tags": "<font color='red'>" 
    , "post_tags": "</font>" 
    ,"fragment_size":10 
  } 
} 
 
 

  RESTFUL代码实现的结果如下,可以发现他并没有改变返回结果本身,而是在第二个hits同级别的下面多个highlight标签,里面的内容就是运用于高亮的html语法,将结果copy到txt文件,把txt后缀的文件改成html后缀,再使用Chrome浏览器打开该文件,就可以,看到图7的效果;

{
    
  "took" : 121, 
  "timed_out" : false, 
  "_shards" : {
    
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : {
    
    "total" : {
    
      "value" : 3, 
      "relation" : "eq" 
    }, 
    "max_score" : 0.81875104, 
    "hits" : [ 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "11", 
        "_score" : 0.81875104, 
        "_source" : {
    
          "createDate" : "2020-09-22", 
          "senDate" : "2020-09-22", 
          "longCode" : "458744536", 
          "moblie" : 134625584654, 
          "corpName" : "星雨文化传媒", 
          "smsContent" : "魅力宣传,星雨传媒!", 
          "state" : "1", 
          "opratorId" : "3", 
          "province" : "杭州", 
          "ipAddr" : "10.289.19.45", 
          "replyTotal" : "6", 
          "fee" : "500" 
        }, 
        "highlight" : {
    
          "smsContent" : [ 
            "<font color='red'>魅力</font>宣传,星雨传媒!" 
          ] 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "9", 
        "_score" : 0.73050237, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "5784320", 
          "moblie" : 15236964578, 
          "corpName" : "花花派", 
          "smsContent" : "花开花落,魅力女性,买花选我!", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.265.19.45", 
          "replyTotal" : "1", 
          "fee" : "0.1" 
        }, 
        "highlight" : {
    
          "smsContent" : [ 
            "花开花落,<font color='red'>魅力</font>女性,买花选我" 
          ] 
        } 
      }, 
      {
    
        "_index" : "sms-logs-index", 
        "_type" : "_doc", 
        "_id" : "4", 
        "_score" : 0.2876821, 
        "_source" : {
    
          "createDate" : "2020-09-16", 
          "senDate" : "2020-09-16", 
          "longCode" : "87454120", 
          "moblie" : 13625789645, 
          "corpName" : "爱美化妆品有限公司", 
          "smsContent" : "魅力,势不可挡,爱美爱美", 
          "state" : "1", 
          "opratorId" : "1", 
          "province" : "上海", 
          "ipAddr" : "10.258.19.45", 
          "replyTotal" : "1", 
          "fee" : "200" 
        }, 
        "highlight" : {
    
          "smsContent" : [ 
            "<font color='red'>魅力</font>,势不可挡,爱美爱美" 
          ] 
        } 
      } 
    ] 
  } 
} 
 

在这里插入图片描述

图7 高亮查询在浏览器中实现

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的 
    String index="sms-logs-index"; 
 
    @Test 
    public void highLigtQuery() throws IOException {
    
        //1.SearchRequest 
        SearchRequest request=new SearchRequest(index); 
 
        //2.查询条件 
        SearchSourceBuilder builder =new SearchSourceBuilder(); 
        builder.query(QueryBuilders.matchQuery("smsContent","魅力")); 
 
        //2.1 添加高亮 
        HighlightBuilder highlightBuilder =new HighlightBuilder(); 
        highlightBuilder.field("smsContent",10).preTags("<font color='red'>").postTags("</font>"); 
 
        builder.highlighter(highlightBuilder); 
        request.source(builder); 
 
        //3.执行查询 
        SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT); 
 
        //4.返回结果 
        for (SearchHit hit : resp.getHits().getHits()) {
    
            System.out.println(hit.getHighlightFields().get("smsContent")); 
        } 
    } 
 

  Java代码实现的效果如图8如下;
在这里插入图片描述

图8 高亮查询在Java中实现

本文参考链接:https://blog.csdn.net/LXWalaz1s1s/article/details/108975817
评论关闭
IT虾米网

微信公众号号:IT虾米 (左侧二维码扫一扫)欢迎添加!

Java多线程中Lock锁如何使用