Escaping search queries for Googles full text search service

2024/10/6 18:35:27

This is a cross-post of https://groups.google.com/d/topic/google-appengine/97LY3Yfd_14/discussion

I'm working with the new full text search service in gae 1.6.6 and I'm having trouble figuring out how to correctly escape my query strings before I pass them off to the search index. The docs mention that certain characters need to be escaped (namely the numeric operators), however they don't specify how the query parser expects the string to be escaped.

The issue I'm having is two-fold:

  1. Failing to escape the crap out of many characters (more than those that are hinted at in the docs) will cause the parser to raise a QueryException.
  2. When I've escaped the query to the point it won't raise, the numeric operators (>, <, >=, <=) no longer parse correctly (not factored into the search).

I setup a test where I feed string.printable into my_index.search() and found that it would raise QueryException on each of the "printable" control characters, which I'm now stripping out, as well as things that would seem innocent like asterisk, comma, parenthesis, braces, tilde. None of these are mentioned in the docs as needing to be escaped.

So far I've tried:

  • cgi.escape()
  • saxutils.escape() with a mapping of ascii to urlencoded equivalents (eg , -> %2C)
  • saxutils.escape() with a mapping of ascii to html entity encoded ascii codes (eg &#123;)
  • urllib.quote_plus()

I've gotten the best results so far using url-style(%NN) replacements, but >, <, >=, and <= continue to fail to yield the expected results from the index. Also, and this doesn't really seem to have anything to do with the escaping issue, but using NOT in front of a field = value type query seems to not be working as advertised either.

tl;dr

How should I be escaping my queries before sending them to the search service so that the parser doesn't raise QueryException and my query yields expected results?

Answer

As briefly explained in the documentation, the query parameter is a string that should conform our query language. Which we should document better.

For now, I recommend you to wrap your queries (or at least some of the words/terms) in double quotes. In that way you would be able to pass all printable characters, but " and \. The following example shows the result.

import string
from google.appengine.api.search import Query
Query('"%s"' % string.printable.replace('"', '').replace('\\', ''))

and you could even pass non printable characters

Query('"%s"' % ''.join(chr(i) for i in xrange(128)).replace('"','').replace('\\', ''))

EDIT: Note that anything that is enclosed in double quotes is an exact match, that is "foo bar" would match against ...foo bar... but no ...bar foo..

https://en.xdnf.cn/q/70334.html

Related Q&A

dificulty solving a code in O(logn)

I wrote a function that gets as an input a list of unique ints in order,(from small to big). Im supposed to find in the list an index that matches the value in the index. for example if L[2]==2 the out…

Scrapy. How to change spider settings after start crawling?

I cant change spider settings in parse method. But it is definitely must be a way. For example:class SomeSpider(BaseSpider):name = mySpiderallowed_domains = [example.com]start_urls = [http://example.co…

numpy ctypes dynamic module does not define init function error if not recompiled each time

sorry for yet an other question about dynamic module does not define init function. I did go through older questions but I didnt find one which adress my case specifically enought.I have a C++ library …

How do I save Excel Sheet as HTML in Python?

Im working with this library XlsxWriter.Ive opened a workbook and written some stuff in it (considering the official example) - import xlsxwriter# Create a workbook and add a worksheet. workbook = xlsx…

Faster sockets in Python

I have a client written in Python for a server, which functions through LAN. Some part of the algorithm uses socket reading intensively and it is executing about 3-6 times slower, than almost the same …

Python gmail api send email with attachment pdf all blank

I am using python 3.5 and below code is mostly from the google api page... https://developers.google.com/gmail/api/guides/sending slightly revised for python 3.xi could successfully send out the email …

how to find height and width of image for FileField Django

How to find height and width of image if our model is defined as followclass MModel:document = FileField()format_type = CharField()and image is saved in document then how we can find height and width o…

Given a pickle dump in python how to I determine the used protocol?

Assume that I have a pickle dump - either as a file or just as a string - how can I determine the protocol that was used to create the pickle dump automatically? And if so, do I need to read the entir…

Get First element by the recent date of each group

I have following model in djangoBusiness ID Business Name Business Revenue DateHere is the sample data:Business ID | Business Name | Business Revenue | Date 1 B1 1000 …

remote: ImportError: No module named gitlab

I wrote gitlab hook with python. And added to post-receive hooks in gitlab server. When i push to remote origin server from my laptop, i get following error. But it works when i run script manually in …