X-Git-Url: https://git.deb.at/?p=deb%2Fpackages.git;a=blobdiff_plain;f=BACKEND;h=11f14091cf21eae8474d9dbcb902228c395030e7;hp=cde9ef4d4e4fd554057022d7eb7934a1e1d590c8;hb=404e70717590f860d3db41bef40957e6df651324;hpb=bcbeaca96ba2c409e061b64807b986e4b8464192 diff --git a/BACKEND b/BACKEND index cde9ef4..11f1409 100644 --- a/BACKEND +++ b/BACKEND @@ -20,14 +20,14 @@ Generated by means of Packages.gz files: | value: \0 separated tuples of "archive suite arch component section priority version shortdescription" | (so you can split on spaces in 8 pieces, but need to not split further | because shortdescription can have spaces) -| arch can also be 'virtual', with c/s/p/v being undefined then, and -| shortdescription being a space-separated list of packages providing -| the package that is the key Notes: - maybe add did right before shortdescription? - - TODO: make sure for each (archive,suite), newest package is shown - first, and all newest versions for each such section is first, so - that one can efficiently lookup just the newest entry for a given - (archive,suite) + - for each suite, newest package is shown first, and (suite, + architecture) is unique, the newest one is choosen. Once you find + the right suite, you know you've got the newest, once you found + your (suite,arch), you know you've found the only unique such entry + - The very first element is different (TODO: maybe should be + different DB then?), a \01 separated hash of suite -> provided-by, + like "suite1\01prov1 prov2\01suite2\01prov1" | package_postfixes.db: | key: a postfix string of a package name @@ -41,11 +41,6 @@ Generated by means of Packages.gz files: | key: "packagename version arch" | value: a unique description id, did -| descriptions.txt: -| on each line: -| description with strange characters mangled for proper substring -| searching, linenumber being the did - | descriptions.db: | key: did | value: description, first line being short, the rest being long [no @@ -93,3 +88,24 @@ Generated by means of Sources.gz files: | - files: \01 separated list of "md5 size filename" Note: different key from packages_all, is that needed? +********************************************************* +Generated by means of Contents-$arch.gz files: +********************************************************* + +This one is tricky, because it deals with about 1G of raw uncompressed data +per suite. Not all data is updated every day though, so dealing with that +efficiently pays off. + +Each sourcefile will create a filelists_$suite_$arch.db, with prefix +compression. The last updated one will have a symlink from _all.db to it, to +help filelist queries for 'all' packages. + +reverse_$suite_$arch.txt will be the reversed pathnames for that file, +lowercased, sorted, with packagename:arch following it. + +For each suite, the suite-wide indices can then be updated by reading the 11 +or so reverse_$suite_$arch.txt in sorted order with sort -m. Same pathnames +can be put together, and stored in reverse_$suite.db; filenames are then also +incidently coming by grouped uniquely (but reverse sorted, not normal sorted), +and can be written out linearly to filenames_$suite.txt +