The primary goal of developing GIN indexes was
to create support for highly scalable full-text search in
PostgreSQL, and there are often situations when
a full-text search returns a very large set of results. Moreover, this
often happens when the query contains very frequent words, so that the
large result set is not even useful. Since reading many
tuples from the disk and sorting them could take a lot of time, this is
unacceptable for production. (Note that the index search itself is very
fast.)
To facilitate controlled execution of such queries,
GIN has a configurable soft upper limit on the
number of rows returned: the
gin_fuzzy_search_limit configuration parameter.
It is set to 0 (meaning no limit) by default.
If a non-zero limit is set, then the returned set is a subset of
the whole result set, chosen at random.
"Soft" means that the actual number of returned results
could differ somewhat from the specified limit, depending on the query
and the quality of the system's random number generator.
From experience, values in the thousands (e.g., 5000 — 20000)
work well.