repoze.catalog Change HistoryΒΆ

0.7.1 (2009-09-27)

- Minimally get docs into shape for First PyPI release; no
  functionality changes from 0.7.0.

0.7.0 (2009-08-03)

- Fixed bug in ``DocumentMap.add`` which left orphan mappings for previous
  addresses when adding an existing docid with a new address.

- Added the ability to sort by text relevance. Use the name of the text
  index as the ``sort_index`` in the query.

0.6.2 (2009-04-15)

- Add metadata-related APIs to ``repoze.catalog.document.DocumentMap``:
  ``add_metadata``, ``remove_metadata``, ``get_metadata``.
  "Metadata" is a freeform set of key/value pairs related to a docid.
  See the API documentation for more information.

0.6.1 (2009-02-25)

- Fixed constructor inheritance issues which kept ``repoze.catalog``
  from working under Python 2.6.  Note that this change involved removing
  the ``*args, **kw`` arguments from index constructors:  those values were
  never used, but had (bogus) tests.

0.6.0 (2009-02-16)

- N-Best ascending fieldindex sort was being chosen incorrectly when
  there was no limit.  Symptom: ``RuntimeError, 'n-best used without

0.5.9 (2009-02-16)

- Add ``reindex_doc`` method as an alias for ``index_doc`` to both
  CatalogFieldIndex and CatalogKeywordIndex (for performance,
  ``index_doc`` for both indexes has special case code for reindexing).

0.5.8 (2009-02-16)

- Speed up path2 index attribute search by using __getitem__ rather
  than .get in some places.

- Override textindex reindex_doc method: calling index_doc only
  instead of calling unindex_doc and then index_doc is much more

0.5.7 (2009-02-14)

- Attributes returned to attribute checker were not correct.

0.5.6 (2009-02-12)

- Add "attribute discriminator" and "attribute checker" support to
  path2 index.  If an index is created with an attribute
  discriminator, when any object is indexed, the value of the
  attribute will be stored in the path index.  The path index will
  know that that attribute belongs to a particular path.  Later, when
  the "attribute checker" feature of the ``apply`` or ``search``
  method is used, a user-supplied attribute checker function will be
  able to filter the result set returned by the index.  This is used
  by the author primarily to support security-filtered searches of a
  path index.  It is not otherwise documented.
0.5.5 (2009-02-11)

- Add a ``reindex_doc`` method to the catalog and to the ``common``
  shared index base class.  The catalog's ``reindex_doc`` calls each
  index's ``reindex_doc`` method when called.  The common shared index
  base class implementation unindexes the docid and then subsequently
  indexes the document using the docid.  This method can be overridden
  for specific indexes to do something different on a reindex.

- ``repoze.catalog.indexes.path2.CatalogPathIndex2`` now takes an
  extra argument to its search method named ``include_path``.  If this
  is true, the docid set returned will include the docid for the path
  specified by the path query parameter.  The ``apply`` method of the
  index allows for the specification of the ``include_path`` as a
  dictionary member in an ``apply`` call which specifies the query as
  a dictionary.

- Give ``path2.CatalogPathIndex2`` index a better ``reindex_doc``
  method than the default.

- The CatalogKeywordIndex's ``apply`` method mutated the query passed
  in if it was a dict.  To fix, we override the ``apply`` method from
  the zope.index implementation.

- Added a Range class importable as ``repoze.catalog.Range``.  The
  Range class should be used to represent a range query to a
  CatalogFieldIndex.  The old-style of passing a 2-tuple to represent
  a range is still supported, but will be eventually removed in favor
  of requring a Range object to represent a Range query.  A Range
  object can be instantiated ala "Range(start, end)".

- It is now possible to pass a sequence of items to the
  CatalogFieldIndex ``apply`` method.  When a sequence of terms that
  is passed in is *not* a tuple with two items in it (the previous API
  representing a range, which is deprecated), it will be considered a
  query for multiple terms.  The docids returned for each term will be
  unioned together to form the result.

- It is now possible to pass a dictionary to the CatalogFieldIndex
  ``apply`` method.  When a dictionary is passed, the member of the
  dictionary named ``query`` is treated as the query.  It may be a
  single term, a sequence of terms, or a Range.  An additional
  dictionary member named ``operator`` may also be specified: when
  this is specified, it must be one of ``or`` or ``and`` (the default
  is ``or``).  If the query specifies multiple terms, and the operator
  is ``or``, the results will be unioned; if the query specifies
  multiple terms and the operator is ``and``, the results will be

0.5.4 (2009-02-05)


- A newer path index implementation importable as
  ``repoze.catalog.path2.CatalogPathIndex2`` has been added as another
  index type.  The path2 index type is an improvement inasmuch as it
  actually uses a graph to represent structure instead of the "levels"
  scheme pioneered within Zope2 (and used by
  ``repoze.catalog.path.CatalogPathIndex``). By eye, the "levels"
  scheme looks like it can return the wrong results for any given path
  for a sufficiently dense tree.

- Catalog indexes must now supply an ``apply_intersect`` method; it
  receives a query and a set of docids (the result intersection "so
  far").  It should have the same sort of return value as the
  ``apply`` method.  Indexes which inherit from
  ``common.CatalogIndex`` will inherit a default implementation.

- It is now possible to specify index query/merge order within a
  catalog query.  See ``Index Query/Merge Order`` in the docs.

0.5.3 (2009-01-05)


- Better detection of when to use fwdscan on ascending sorts in field

- Better detection of when to use nbest vs. timsort on ascending sorts
  in field indexes.

0.5.2 (2009-01-04)


- Allow a new catalog search method keyword: ``sort_type``.  For
  ascending sorts, this can be one of ``nbest``, ``fwscan``, or
  ``timsort``.  For descending sorts, only ``nbest`` and ``timsort``
  are supported.  This argument allows fine-grained control of what
  algorithm should be chosen to perform sorting within FieldIndex

- Better automatic detection of which sort algorithm to use (when it's
  not supplied via ``sort_type``) based on empirical testing.

- Depend on zope.index 3.5.0 rather than any earlier version
  (repoze.catalog fixes migrated upstream in zope.index 3.5.0).

 - Add 'sortbench' script to test various field index sort strategies
   (requires 'benchmark' extra to create charts).

Bug Fixes

- Prevent the potential for a zero division error when attempting to
  sort an empty set of results.

0.5.1 (2008-12-31)


- Optimize the choice of fieldindex sort strategy.

- Speed up keyword index merges slightly.

- Fix a bug in the return value of the catalog: it would try to return
  the minimum of the number of docs or the limit event if there was no

Bug Fixes

- Sean Upton pointed out that the document map code artificially
  limited the number of documents to half the number that it could
  actually handle.

0.5 (2008-11-10)


- Add path index.

- Speed up keyword index 'and' (intersection) queries nominally by
  sorting intersected sets from smallest-to-largest first.

- Benchmarking suite provided by Chris Rossi.

- Add a "facet" index
  (``repoze.catalog.indexes.facet.CatalogFacetIndex``).  This index is
  much like a keyword index, but unlike a keyword index it contains a
  facet list (a sequence of known colon-separated values) and accepts
  values that are sequences of colon-separated terms.  Each term is
  split on its colons, forming a sequence of categories, then each
  concatenation of the categories is indexed.  For example, if you
  indexed a document as ``['style:gucci:handbag']``, and the facet
  list contained ``'style'``, ``'style:gucci'`` and
  ``'style:gucci:handbag'``, the document would be indexed three
  times: as ``style``, as ``style:gucci`` and as
  ``style:gucci:handbag``.  Querying a facet index returns a set of
  document ids that match the facets passed in.  A facet index also
  has a ``counts`` method which provided a set of document ids,
  returns a dictionary containing "further constraint information" for
  use in a narrowing UI.  This count implementation is not meant for
  very large-scale sites; it is naive.

0.4 (2008-10-06)


- Speed up keyword index 'or' (union) queries by using a single
  IFBTree.multiunion instead of multiple calls to IFBTree.union; this
  is most helpful for speeding up 'or' queries where there are lots of
  terms in the query sequence.


- Add ``overview`` page.

0.3 (2008-10-04)


- Add ``repoze.catalog.document.DocumentMap`` class, which provides a
  mechanism to map "addresses" (paths) to document ids.


- Add API documentation for catalog and document map.

Backwards incompatibilities

- Rename ``searchResults`` method to ``search``.

- Removed ``updateIndex`` and ``updateIndexes`` methods of catalog.

- All index implementations moved into ``repoze.catalog.indexes``.

- All interfaces moved to ``repoze.catalog.interfaces``.

0.2 (2008-09-26)

- Provide ``sort_index`` capability.

0.1 (2008-07-26)

- Initial release.

