X-Git-Url: https://git.deb.at/?a=blobdiff_plain;f=BACKEND;h=445a45db391b046c22b775402eea2e8026bb78b9;hb=e178006b4cca636a5aa78b526e35753895b885b3;hp=cde9ef4d4e4fd554057022d7eb7934a1e1d590c8;hpb=bcbeaca96ba2c409e061b64807b986e4b8464192;p=deb%2Fpackages.git diff --git a/BACKEND b/BACKEND index cde9ef4..445a45d 100644 --- a/BACKEND +++ b/BACKEND @@ -20,14 +20,14 @@ Generated by means of Packages.gz files: | value: \0 separated tuples of "archive suite arch component section priority version shortdescription" | (so you can split on spaces in 8 pieces, but need to not split further | because shortdescription can have spaces) -| arch can also be 'virtual', with c/s/p/v being undefined then, and -| shortdescription being a space-separated list of packages providing -| the package that is the key Notes: - maybe add did right before shortdescription? - - TODO: make sure for each (archive,suite), newest package is shown - first, and all newest versions for each such section is first, so - that one can efficiently lookup just the newest entry for a given - (archive,suite) + - for each suite, newest package is shown first, and (suite, + architecture) is unique, the newest one is choosen. Once you find + the right suite, you know you've got the newest, once you found + your (suite,arch), you know you've found the only unique such entry + - The very first element is different (TODO: maybe should be + different DB then?), a \01 separated hash of suite -> provided-by, + like "suite1\01prov1 prov2\01suite2\01prov1" | package_postfixes.db: | key: a postfix string of a package name @@ -93,3 +93,24 @@ Generated by means of Sources.gz files: | - files: \01 separated list of "md5 size filename" Note: different key from packages_all, is that needed? +********************************************************* +Generated by means of Contents-$arch.gz files: +********************************************************* + +This one is tricky, because it deals with about 1G of raw uncompressed data +per suite. Not all data is updated every day though, so dealing with that +efficiently pays off. + +Each sourcefile will create a filelists_$suite_$arch.db, with prefix +compression. The last updated one will have a symlink from _all.db to it, to +help filelist queries for 'all' packages. + +reverse_$suite_$arch.txt will be the reversed pathnames for that file, +lowercased, sorted, with packagename:arch following it. + +For each suite, the suite-wide indices can then be updated by reading the 11 +or so reverse_$suite_$arch.txt in sorted order with sort -m. Same pathnames +can be put together, and stored in reverse_$suite.db; filenames are then also +incidently coming by grouped uniquely (but reverse sorted, not normal sorted), +and can be written out linearly to filenames_$suite.txt +