diff options
Diffstat (limited to 'deskutils/py-paperless/files/patch-docs-guesswork.rst')
-rw-r--r-- | deskutils/py-paperless/files/patch-docs-guesswork.rst | 37 |
1 files changed, 37 insertions, 0 deletions
diff --git a/deskutils/py-paperless/files/patch-docs-guesswork.rst b/deskutils/py-paperless/files/patch-docs-guesswork.rst new file mode 100644 index 000000000000..7b43aa3fcdc7 --- /dev/null +++ b/deskutils/py-paperless/files/patch-docs-guesswork.rst @@ -0,0 +1,37 @@ +--- docs/guesswork.rst.orig 2019-01-27 13:48:05 UTC ++++ docs/guesswork.rst +@@ -54,6 +54,34 @@ filename as described above. + + .. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings + ++Transforming filenames for parsing ++---------------------------------- ++Some devices can't produce filenames that can be parsed by the default ++parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in ++``paperless.conf`` one can add transformations that are applied to the filename ++before it's parsed. ++ ++The option contains a list of dictionaries of regular expressions (key: ++``pattern``) and replacements (key: ``repl``) in JSON format, which are ++applied in order by passing them to ``re.subn``. Transformation stops ++after the first match, so at most one transformation is applied. The general ++syntax is ++ ++.. code:: python ++ ++ [{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}] ++ ++The example below is for a Brother ADS-2400N, a scanner that allows ++different names to different hardware buttons (useful for handling ++multiple entities in one instance), but insists on adding ``_<count>`` ++to the filename. ++ ++.. code:: python ++ ++ # Brother profile configuration, support "Name_Date_Count" (the default ++ # setting) and "Name_Count" (use "Name" as tag and "Count" as title). ++ PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}] ++ + .. _guesswork-content: + + Reading the Document Contents |