gb hibernate search 使用

1. 介绍

介绍如何使用hibernate search提供全文检索服务,地理位置服务

默认使用本地lucene提供服务，可配置为使用elasticsearch。

因hibernate search发布版本的限制，本地lucene使用5.5版本，支持elasticSearch 5.6.8版本。本地lucene可使用luke-6.0.0-luke-release查看管理索引。

2. 使用

2.1. 使用gradle 插件

2.1.1. 添加插件

gradle中

implementation ('org.yunchen.gb:gb-plugin-hibernate-search:1.2.0')

3. 配置

配置application.yml文件:

gb:
  hibernatesearch:
    rebuildIndexOnStart: false        (1)
    rebuildIndexConfig:
      batchSizeToLoadObjects: 30
      threadsForSubsequentFetching: 8
      threadsToLoadObjects: 4
      threadsForIndexWriter: 3
      cacheMode: NORMAL # NORMAL,IGNORE,GET,PUT,REFRESH
    throwOnEmptyQuery: false
    #closureInBootstrap: hibernateSearchInitClousre (2)

hibernate:
  configClass: org.yunchen.gb.plugin.hibernate.search.context.HibernateSearchMappingContextConfiguration
  search:
    analyzer: org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer #默认使用中文切词器
    default:
      elasticsearch:
        host:  http://localhost:9200            (3)
        required_index_status: yellow #green    (4)
      #indexmanager: elasticsearch              (5)
      directory_provider: filesystem
      #indexBase: /path/to/your/indexes
      #indexBaseJndiName: java:comp/env/luceneIndexBase

1	是否在系统启动时重建所有的倒排索引
2	开启此项，则重建索引工作安装不再读取上面的配置，改为从startup启动类中的hibernateSearchInitClousre方法获取
3	elasticsearch的地址
4	elasticsearch状态,单节点时状态是yellow，集群时是green；因此开发环境建议使用yellow，生产环境使用green
5	开启此项后，全文检索服务从本地的lucene改为网络的elasticsearch

在Startup类中添加方法：

@Order(0)    (1)
@Transactional
@GbBootstrap
class Startup {
    .....
    public Closure hibernateSearchInitClousre(){   (2)
        Closure initClosure={
            rebuildIndexOnStart {
                batchSizeToLoadObjects 30
                threadsForSubsequentFetching 8
                threadsToLoadObjects 4
                threadsForIndexWriter 3
                cacheMode org.hibernate.CacheMode.NORMAL
            }
            normalizer(name: 'lowercase') {
                filter org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilterFactory
                filter org.apache.lucene.analysis.core.LowerCaseFilterFactory
            }
        }
        return initClosure;
    }
}

1	添加order注解，顺序值不能小于0。从而确保索引重建等工作在系统初始化数据前完成
2	添加hibernateSearchInitClousre方法

在yml中配置了closureInBootstrap: hibernateSearchInitClousre后，激活此功能。自定义的analysis，normalizer 可在此处配置。

4. 功能

插件会在系统启动后，为每个配置了静态search闭包的类，增加两个search方法，一个是类的静态search方法，一个是实例的search方法，都会返回HibernateSearchApi类型的对象。

HibernateSearchApi类的方法：

方法名	参数	描述
list	闭包	获取符合条件列表
count	闭包	获取符合条件数目
criteria	闭包	获取criteria
withTransaction	闭包	设置事务环境
createIndexAndWait	闭包	创建索引
getIndexedProperties		获取domain的静态search闭包的配置
enableHighlighter	String preTag = null，String postTag = null	开启高亮功能
maxResults	int maxResults	设置最大返回记录数
offset	int offset	设置分页偏移值
projection	String… projection	设置返回projection项
sort	String field，String order = ASC，type = null	设置排序
getAnalyzer		获取切词器
getHighlighter		获取高亮设置类
index		强制创建索引
purge		删除此条记录的索引
purgeAll		删除全部记录的索引
filter	String filterName	添加filter过滤器
filter	Map<String，Object> filterParams	添加filter过滤器
filter	String filterName，Map<String，Object> filterParams)	添加filter过滤器
below	String filterName，Map<String，Object> filterParams)	添加filter过滤器
below	String field，below，Map optionalParams = [:]	设置某字段低于固定值的条件
above	String field，above，Map optionalParams = [:]	设置某字段高于固定值的条件
between	String field，from，to，Map optionalParams = [:]	设置某字段在固定区间值的条件
keyword	String field，matching，Map optionalParams = [:]	添加查询
fuzzy	String field，matching，Map optionalParams = [:]	添加模糊查询
wildcard	String field，matching，Map optionalParams = [:]	添加查询
simpleQueryString	String queryString，Map optionalParams = [:]，String field，String… fields	添加查询
spatial	String field = null，double latitude，double longitude，double radius	添加地理位置距离点固定半径的查询

方法名

参数

描述

list

闭包

获取符合条件列表

count

闭包

获取符合条件数目

criteria

闭包

获取criteria

withTransaction

闭包

设置事务环境

createIndexAndWait

闭包

创建索引

getIndexedProperties

获取domain的静态search闭包的配置

enableHighlighter

String preTag = null，String postTag = null

开启高亮功能

maxResults

int maxResults

设置最大返回记录数

offset

int offset

设置分页偏移值

projection

String… projection

设置返回projection项

sort

String field，String order = ASC，type = null

设置排序

getAnalyzer

获取切词器

getHighlighter

获取高亮设置类

index

强制创建索引

purge

删除此条记录的索引

purgeAll

删除全部记录的索引

filter

String filterName

添加filter过滤器

filter

Map<String，Object> filterParams

添加filter过滤器

filter

String filterName，Map<String，Object> filterParams)

添加filter过滤器

below

String filterName，Map<String，Object> filterParams)

添加filter过滤器

below

String field，below，Map optionalParams = [:]

设置某字段低于固定值的条件

above

String field，above，Map optionalParams = [:]

设置某字段高于固定值的条件

between

String field，from，to，Map optionalParams = [:]

设置某字段在固定区间值的条件

keyword

String field，matching，Map optionalParams = [:]

添加查询

fuzzy

String field，matching，Map optionalParams = [:]

添加模糊查询

wildcard

String field，matching，Map optionalParams = [:]

添加查询

simpleQueryString

String queryString，Map optionalParams = [:]，String field，String… fields

添加查询

spatial

String field = null，double latitude，double longitude，double radius

添加地理位置距离点固定半径的查询

5. 使用

5.1. 提供全文检索功能

5.1.1. domain类中设置

@Entity
@JsonIgnoreProperties(["errors", "metaClass", "dirty", "attached", "dirtyPropertyNames", "handler", "target", "session", "entityPersisters", "hibernateLazyInitializer", "initialized", "proxyKey", "children"])
class ExampleAggregateRoot {
        String author
        String body
        Date publishedDate
        String summary
        String title
        Status status
        Double price
        Integer someInteger

        List<ExampleCategory> categories = [];

        static enum Status {
                DISABLED, PENDING, ENABLED
        }
        static hasMany = [categories: ExampleCategory]
        static constraints = {
                body size:0..8000
        }
        static search = {
                // fields
                author index: 'yes', boost: 5.9  //自定义切词器 analyzer: 'ngram'
                body termVector: 'with_positions'
                publishedDate date: 'day', sortable: true
                summary boost: 5.9
                title index: 'yes'
                status index: 'yes'
                categories indexEmbedded: [includeEmbeddedObjectId: true, depth: 1]
                price numeric: 2, analyze: false
//                someInteger index: 'yes', bridge: ['class': PaddedIntegerBridge, params: ['padding': 10]]
        }
}

5.1.2. 服务类中查询

            String wildcardSearch = request.getParameter('q').toLowerCase().trim() + "*"
            int count=ExampleAggregateRoot.search().count{
                should{
                    wildcard "body", wildcardSearch
                    wildcard "title", wildcardSearch
                    wildcard "summary", wildcardSearch
                }
            }
            HibernateSearchApi api=ExampleAggregateRoot.search();
            List list=api.enableHighlighter().list{
                //projection "author", "body"
                //below "publishedDate",  request.getParameter('dateTo')
                //above "publishedDate",  request.getParameter('dateFrom')
                //mustNot {
                //    keyword "status", Status.DISABLED
                //}
                //fuzzy "description", "mi search"
                //simpleQueryString 'war + (peace | harmony)', 'title'
                //simpleQueryString 'war + (peace | harmony)', 'title', 'description'
                //simpleQueryString 'war peace', [withAndAsDefaultOperator: true], 'title'
                //simpleQueryString 'war + (peace | harmony)', ['title':2.0, 'description':0.5]
                should{
                    wildcard "body", wildcardSearch
                    wildcard "title", wildcardSearch
                    wildcard "summary", wildcardSearch
                }
                sort "publishedDate", "asc",Long
                maxResults pageParams.max
                offset pageParams.offset
            }

5.2. 高亮显示结果

承接上面的例子：

            Highlighter highlighter=api.getHighlighter()
            Analyzer analyzer
            if(GbSpringUtils.getConfiginfo("hibernate.search.default.indexmanager")?.equalsIgnoreCase("elasticsearch")){
                analyzer=new StandardAnalyzer()
            }else{
                analyzer=api.getAnalyzer()
            }
            list.each{one->
                one.body=highlighter.getBestFragment(analyzer,'body',one.body?:'')?:one.body
                one.title=highlighter.getBestFragment(analyzer,'title',one.title?:'')?:one.title
                one.summary=highlighter.getBestFragment(analyzer,'summary',one.summary?:'')?:one.summary
            }

5.3. 提供地理位置服务

5.3.1. domain类中设置

示例多个地理位置的情况：

@Spatial
@Spatial(name="work",  spatialMode = SpatialMode.HASH)
@Entity
@Title(zh_CN = "用户地理信息")
@JsonIgnoreProperties(["errors", "metaClass", "dirty", "attached", "dirtyPropertyNames", "handler", "target", "session", "entityPersisters", "hibernateLazyInitializer", "initialized", "proxyKey", "children"])
class UserInfo {
    String username
    @Latitude
    Double homeLatitude;
    @Longitude
    Double homeLongitude;
    @Latitude(of="work")
    Double workLatitude;
    @Longitude(of="work")
    Double workLongitude;
    static search = {
        username index: 'yes'
    }
}

5.3.2. 初始化数据

        new UserInfo(username: 'work1-上海-宁波',homeLongitude: 121.48,homeLatitude:31.22 ,workLongitude: 121.56,workLatitude: 29.86 ).save(flush:true)
        new UserInfo(username: 'work2-北京-石家庄',homeLongitude: 116.46,homeLatitude:39.92 ,workLongitude: 114.48,workLatitude:38.03 ).save(flush:true)
        new UserInfo(username: 'work3-东营-威海',homeLongitude: 118.49,homeLatitude:37.46 ,workLongitude:122.1,workLatitude: 37.5).save(flush:true)
        new UserInfo(username: 'work4-深圳-珠海',homeLongitude: 114.07, homeLatitude: 22.62,workLongitude: 113.52,workLatitude: 22.3).save(flush:true)

5.3.3. 服务类中查询

    @ResponseBody
    public Map spatial(PageParams pageParams,String field,double latitude,double longitude,double radius ){
        Map map=[:]
        int count=UserInfo.search().count{
            spatial('work',latitude,longitude,radius)
        }
        List list=UserInfo.search().list{
            spatial('work',latitude,longitude,radius)
            //sort "id", "asc",String
            maxResults pageParams.max
            offset pageParams.offset
        }
        map.total=count
        map.rows=list
        return map;
    }

访问服务获取结果：

    //距离嘉兴128公里的work有几个？
    http://localhost:8080/hibernateSearch/spatial?field=work&radius=128&latitude=30.77&longitude=120.76
    //距离嘉兴1280公里的work有几个？
    http://localhost:8080/hibernateSearch/spatial?field=work&radius=1280&latitude=30.77&longitude=120.76

5.4. 提供针对附件的自动检索功能

内置的TikaBridge可以自动扫描domain类中的byte[],String，uri类型的字段，进行抽取附件内容进行全文检索的功能。

5.4.1. 配置build.gradle 引入tika parser

因为tika导出的包过多，会造成windows下超过8190字符的问题，所以建议项目中自己引入tika包，并注意cmd字符过长的问题。

compile 'org.apache.tika:tika-parsers:1.22'

5.4.2. 配置domain类

@Entity
@Title(zh_CN = "TIKA示例")
@JsonIgnoreProperties(["errors", "metaClass", "dirty", "attached", "dirtyPropertyNames", "handler", "target", "session", "entityPersisters", "hibernateLazyInitializer", "initialized", "proxyKey", "children"])
class TikaDemo {
    String fileName
    @TikaBridge()
    //byte[] fileData                         (1)
    String filePath                           (2)
    static constraints = {
        fileName(size:0..200)
        //fileData(size: 0..100*1024*1024)
        filePath(size: 0..1000)
    }
    static search = {
        fileName index:'yes'
        //fileData index:'yes'
        filePath index:'yes'
    }
}

1	可配置byte[]类型保存文件数据
2	也可配置String类型保存文件路径

5.4.3. 初始化数据

        File file=new File("/usr/docFile/file")
        file.eachFile {one->
            if(one.isFile()){
                new TikaDemo(fileName: one.name,filePath: one.path).save(flush:true);
            }
        }

5.4.4. 服务类检索

与普通的全文检索方式相同。

    @ResponseBody
    public Map tika(PageParams pageParams,String str){
        Map map=[:]
        int count= TikaDemo.search().count{
            simpleQueryString str, 'filePath'
        }
        List list=TikaDemo.search().list{
            simpleQueryString str, 'filePath'
            sort "id", "asc",long
            maxResults pageParams.max
            offset pageParams.offset
        }
        map.total=count
        map.rows=list
        return map;
    }