For a given <package>, search for P<package>,
and then extract all the terms from the document
you find (you could also implement this part by
just looking up the description and applying the
same algorithms to it as in parse-packages, haven't
benchmarked that against each other).
With these terms then make a OR search. Ignore all
results <package>, since these are obvious ;)
The first few matches of the rest are usually
packages very similar to the one we started with.
Packages::Dispatcher: create from code from dispatcher.pl
This is currently only a copy with some small formatting changes.
But this should benefit our mod_perl cleaness and I intend to
wrap that up somewhat nicer in the future, too.
Remove Packages::Template->trailer which essentially forced
any format to have a foot.tmpl even it made no sense for them.
Now formats that want something like that can handle it themself.
Fix handling of languages in the footer so that we get a list
of available translations again. Currently this only reflects
DDTP translations since the po stuff isn't reenabled yet since
the switch to templates.
Fulltext search: Greatly improve by using a more fuzzy approach
Most of this done on suggestion of Enrico Zini <enrico@enricozini.org>
Using OP_OR instead of OP_AND as default can actually lead to better
matches because the ones found with OP_AND often don't actually are
the most relevant ones. This is especially true when using more than
two keywords.
Accordingly sort by relevance on the result page.
Improve indexing:
+ Add both the unstemmed and the stemmed description to index
this will increase the relevance of exact matches. Only the
latter is done with positional information
+ Really index debtags and the package name
This only adds support for displaying the description
itself on the show page, not the short description in
dependencies or in search results.
The latter is more complicated now since we store the
short description in packages_small exactly because these
places are performance critical and to have to
access one more database in these places is bad...
cron.d/120syntrans: retrieve DDTP descriptions again
They are now on the official mirrors and there seems to be a real chance
that the project gets revived again. So start supporting translated
descriptions again.
templates/html/show.tmpl: Fix bug in generation of "Files" column in download table
b688b16487ef2ca8ad7861d7c20da16a9f3f4448 changed the loop to use a local
variable instead of implicit declaration but didn't change the
use of the contents_avail hash key. This broke the "Files" column
to never display a file link.
templates/html/show.tmpl: Make "Version" column of download table more readable
Previously it was sometimes difficult to spot differences in the versions,
especially if the are only in the Debian revision. Color code the
background so that one can spot differences easier. All up-to-date
versions are indicated with a green background. If the version differs
from the latest version only in the Debian revision, use yellow.
If they differ in the upstream revision, use red.
Inspired by discussion with Neil Williams <codehelp debian org>
templates/html/search.tmpl: Add suite as class for search results
And use this to display "unofficial" suites a bit smaller than the
others. Move the CSS information to a new css file packages-site.css
since this information will probably very site specific.
templates/html/show.tmpl: Add experimental tabbing for content
Since the content tends to get very long and confusing, let the
user switch between "Description" (which includes the list of
binaries for source packages and the list of tags), "Dependencies",
and "Download".
Some binary packages are build from official Debian
sources but have no version in the Debian archive
(e.g. libc0.1). Fix searching the source package for
those. Rather hackish solution, needs cleanup.
Get this information by 1) exposing the archive in the
%downloads hash 2) using config/mirrors.tmpl to determine
whether this archive is an unofficial port
extract_changelogs: Merge cron job changes from old site
Merge the new daily cron job scripts from old site that
splits the extract_changelogs from the rest of the
cron job. This avoids delaying the rest of the cronjob for
the changelog extraction. Using two different lock files
also makes the whole site update more robust.
packages.d.o is for end users who are probably not interested in
the various special: tags that are more useful for debtags maintainance.
(This can be moved to the template if ever the need arises to display these
tags)
Use Xapian as Backend for fulltext search in descriptions.
Introduces new do_xapian_search function to be able to switch
to the old do_fulltext_search in case of problems.
static: Add a simple index page and use ttree to build and install
Add a simple index page to static that we can use as homepage if we don't
want to redirect to an external search form like we do on the official
packages.debian.org.
Also use ttree to build and install the contents of static. This allows
to use TT for preprocessing. The first example of such a file is the
added index file.
Don't try to download any non-US information anymore.
Don't delete the various non-US hacks from the parsing and displaying
code yet. This can be done as a second step.
Don't look up the description on the debtags vocabulary in the script
but in the template. Split facet and tag in the script so that we can
merge together multiple tags with the same facet.
Improve error handling:
- Make it possible to control the returned HTTP code
- Use always the html error template, we have no others anyway
- Try to avoid mixing the apache generated and our own error
messages (this needs more work)
Move a remaining HTML message from DoShow.pm to show.tmpl
(this commit contaminates the config.tmpl with the packages.debian.net
settings, but I couldn't care less atm ;)
Experimental support for url and vcs-*. Not sure if the latter really
belongs to p.d.o or if using it in the PTS should be enough.
The former is only used for source packages as many binary packages
include the url in the description, too