admin管理员组

文章数量:1399022

I am working with SQL Server and Apache Solr, and I am facing an issue where matters with no class code are still appearing when I apply a filter for EXISTS.

  • Problem:

In SQL Server, we store trademarkclass_code and trademarkclass_description as multi-value columns. To maintain ordering and prevent NULL values, a zero-width space (\u200B or NCHAR(8203)) was inserted instead of NULL.

This results in Solr treating these fields as non-empty, even when they should be empty. When I filter for records where trademarkclass_code EXISTS, it includes matters that only contain \u200B, making the filter ineffective.

  • SQL Server Example:

Data is stored like this:

2034\u200B ABCHELLO\u200B\u200B

Which is equivalent to:

2034 ABCHELLO

  1. \u200B is invisible but still stored as data.
  2. When sent to Solr, it is indexed as non-empty, causing incorrect search results.
  • Solr Query Filter Code:

Below is the relevant Java method that constructs the Solr filter query:

    if (CollectionUtils.isNotEmpty(matterFieldsFilters)) {
        String solrFieldName = FipSolrMatterField.fromName(fieldName).getSolrName().toLowerCase();

        for (SolrMatterFieldsFilter searchField : matterFieldsFilters) {
            searchField.setName(solrFieldName);
        }

        StringBuilder queryStringBuilder = new StringBuilder();
        for (SolrMatterFieldsFilter mFieldsFilter : matterFieldsFilters) {
            SolrMatterFieldsOperator operation = SolrMatterFieldsOperator.fromOperatorId(mFieldsFilter.getOperatorId());

            if (operation == SolrMatterFieldsOperator.EXISTS) {
                queryStringBuilder.append("(" + mFieldsFilter.getName() + ":[* TO *])");
            } else if (operation == SolrMatterFieldsOperator.NOT_EXISTS) {
                queryStringBuilder.append("(-" + mFieldsFilter.getName() + ":[* TO *])");
            } else if (StringUtils.isNotBlank(mFieldsFilter.getFieldValue())) {
                String fieldValue = escapeQueryCharacters(mFieldsFilter.getFieldValue());
                queryStringBuilder.append(buildSolrMatterFieldsQueryStr(mFieldsFilter.getName(), fieldValue, true, true, false));
            }
        }
        return queryStringBuilder.toString();
    }
    return null;
}
  • Attempted Fixes:

- Removing \u200B from SQL Server Before Indexing

UPDATE TrademarkClass SET trademarkclass_code = REPLACE(trademarkclass_code, NCHAR(8203), '')

✅ This works but requires modifying existing data.

  • Excluding \u200B in Java Code Before Passing to Solr Modified this part in the Java code:
} else if (StringUtils.isNotBlank(mFieldsFilter.getFieldValue()) 
            && !mFieldsFilter.getFieldValue().contains("\u200B")) {

✅ This prevents filtering based on \u200B but doesn't remove it from the index.

  • Questions:
  1. What is the best way to ensure Solr treats \u200B values as empty or NULL?

  2. Is there a better way to handle filtering so that EXISTS works correctly?

  3. Should we use a different placeholder (like '-' or '[EMPTY]') instead of \u200B?

Any suggestions or best practices would be greatly appreciated!

本文标签: javaSolr Filter Query Including ZeroWidth Space (u200B) from SQL Server Instead of NullStack Overflow