summaryrefslogtreecommitdiffstats
path: root/conduits/docconduit/bmkSpecification.txt
diff options
context:
space:
mode:
Diffstat (limited to 'conduits/docconduit/bmkSpecification.txt')
-rw-r--r--conduits/docconduit/bmkSpecification.txt199
1 files changed, 199 insertions, 0 deletions
diff --git a/conduits/docconduit/bmkSpecification.txt b/conduits/docconduit/bmkSpecification.txt
new file mode 100644
index 0000000..f8a68d9
--- /dev/null
+++ b/conduits/docconduit/bmkSpecification.txt
@@ -0,0 +1,199 @@
+KPilot PalmDoc Conduit bookmark Specification
+=============================================
+
+(c) 2003 Reinhold Kainhofer, reinhold@kainhofer.com
+
+This document is licensed under the FDL (Free Documentation License)
+as published by the FSF. Any version of the FDL can be applied
+at your convenience.
+
+
+
+
+The PalmDoc conduit has three ways to indicate bookmarks for a text:
+ -) Inline tags of the form <* bookmarkname *>
+ -) Endtags of the form <bookmarkname> at the end of the document
+ -) Regular expressions in a separate textname.bmk file
+ (textname.bmk ist the filename of the text with the .txt replaced by .bmk)
+
+
+In the design of the .bmk file, I tried to stay close to the
+syntac of MakeDocJ bookmark files, but it turned out that I
+needed to extend the syntax a little. Also, MakeDocJ uses Java
+RegExps, while the PalmDoc conduit uses the QRegExp, which have
+some slight differences (especially concerning the ^ and $
+patterns as well as backreferences). So if you used MakeDocJ,
+the .bmk file syntax will be quite familiar, but you will still
+have to adapt your bookmark files for Qt regular expressions
+instead of Java regular expressions
+
+
+
+1) INLINE TAGS
+
+Whenever a tag of the form <* someText *> appears in the text,
+this sequence is removed from the text, and a bookmark is set
+there with the bookmark name "someText" (the part between the
+<* and the *>).
+
+
+2) ENDTAGS
+
+If the text ends with tags of the form <someText>, the string
+in braces is used as bookmark name, and wherever it appears in
+the text, a bookmark is set.
+After the > any number of whitespace is allowed, but no other
+characters like letters, numbers, or punctuation. Also, inside
+the braces no line break must occur. The conduit searches the
+text from the end and if it finds a line break inside a <...>
+sequence, the tag and everything before it is assumed to belong
+to the text and doesn't form a bookmark tag.
+Between endtags any number of whitespace (spaces, tabs, line
+feeds etc.) is allowed.
+
+As an example, assume you have a text ending in:
+... the bad guy was punished, and they lived happily
+ever after!
+<Tag with
+line feed>
+ <bad guy> <princess>
+<married>
+
+The conduit starts at the end, ignores all whitespace between
+the tags, so it finds the tags "married", "princess", and "bad guy".
+The "Tag with line feed" has a line feed, so it is assumed to belong
+to the text.
+Assume now you have a text ending in:
+... the bad guy was punished, and they lived happily
+ever after!
+<bad guy> The End <princess>
+<married>
+
+Here, only "married" and "princess" are found as bookmarks. Because
+of the letters before the "princess" tags, the search for the
+bookmarks ends at the letter "d" of "The End" (the conduit starts
+from the end and moves backward until it finds some text which
+cannot be seen as a endtag.
+
+
+
+
+3) REGULAR EXPRESSIONS IN A SEPARATE FILE
+
+This is by far the most complex way to specify bookmarks, but
+it is also the mose powerful.
+If you have a text with filename "My fairy tale.txt", the
+bookmarks will be specified in a file called "My fairy tale.bmk"
+(just the text filename with the .txt replaced by .bmk). This
+file contains the bookmark definitions, one in each line. Lines
+starting with a # are seen as comments, and empty lines are also
+ignored.
+
+
+In the .bmk file, each bookmark line has one of the following syntaces
+(I will explain all fields later on). Fields in [..] are optional:
+
+bmkName
+bmkPosition, bmkName
++, bmkPatternRegExp[, bmkNameAsString[, firstIncludedBmk[, lastIncludedBmk]]]
++, bmkPatternRegExp[, bmkNameIndexOfSubexpression[, firstIncludedBmk[, lastIncludedBmk]]]
+-, bmkPatternRegExp[, bmkNameAsString]
+-, bmkPatternRegExp[, bmkNameIndexOfSubexpression]
+
+ If the first field is a string, it is used as the bookmark name
+and pattern to search for.
+ If the first field is a number, it means the position of the
+bookmark, and the second field is the name of the bookmark.
+ If the first field is either + or -, the second field gives
+a regular expression that is used to find the position of the
+bookmark. If the first field is a -, the search is done only
+once and only the first match will be added as bookmark. If
+the first field is a +, the search is done until the regular
+expression can no longer be found (the fourth and fifth fields
+can be used to include only a certain range of hits). If there
+is a third field, and it is a string, it gives the name of the
+bookmark as a regular expression (i.e. \1 are replaced by the
+first subexpression of the search, where subexpressions are
+specified by round brackets in the regexp of the second field).
+If there is a third field, and it is a number, it gives the index
+of the subexpression of bmkPatternRegExp that is used as the
+bookmark name.
+If there is no third field, the whole matched text will be used
+as bookmark name.
+The optional fourth and fifth fields can be used to set bookmarks
+only after the first few ocurrences of the regexp in the text, and
+to stop the search after the expression has been found a certain
+number of times.
+
+
+
+If the PDB->PC sync is set up to store the bookmarks in a bookmark file,
+it will create a file "My fairy tale.bm" (no "k") with entries of the form
+position,bmkName
+The .bmk file will be used if it exists, but if no .bmk file exists, the .bm file
+will be used. This way you can override the bookmark settings, while
+at the same time the PDB->TXT sync does not destroy your possibly
+existing .bmk file.
+
+
+
+Examples:
+
+1) Imagine you have a line like:
+frog princess
+In this case, the text is searched for "frog princess", and a
+bookmark is set whenever "frog princess" occurs in the text.
+The name of each of these bookmarks will be "frog princess".
+
+2) A bookmark line:
+55, Bookmark at offset 55
+Here, a bookmark will be set at offset 55 (55th character of
+the text), and it will have the name "Bookmark at offs" (truncated
+to 16 characters)
+
+3) A bookmark line
+-,Chapter \d+
+causes a bookmark to be set at the first ocurrence of "Chapter XXX",
+where XXX denotes one or more digits. The bookmark name will be
+"Chapter XXX" (XXX replaced by the actual digits).
+
+4) A bookmark line
++,Chapter \d+
+causes bookmarks to be set wherever "Chapter XXX" (XXX being one
+or more digits) appears in the text. The bookmark name will again
+be "Chapter XXX", but the search does not stop after the first hit.
+
+5) A bookmark line
++,\n\s*(Chapter \d+)\D+, 1
+causes a bookmark to be set whenever a new line starts with
+"Chapter XXX" (whitespace is allowed before the "Chapter"), and
+uses the first subexpression in (..) as the bookmark name. If you
+have a passage
+ Chapter 15: here it starts
+The regular expression will match, so a bookmark will be set there
+and the subexpression "Chapter 15" (which matches the (Chapter \d+) )
+will be used as bookmark text.
+
+6) A bookmark line
++,\n\s*Part (\d+),\1\. part
+sets a bookmark whenever a line starts with "Part XXX". The XXX
+will be stored as the first matched subexpression. The third field
+"\1\. part" is the regular expression for the bookmark name, where
+\1 is replaced by the first matched subexpression of the search (XXX
+in this case). So if a line starts with " Part 17: ", the bookmark
+name will be "17. part".
+
+7) A bookmark line
++,Table (\d+): ,\1\. Tabelle,5,25
+will match whenever "Table XXX: " appears in the text, and the bookmark
+name will be "XXX. Tabelle". However, the fourth field means that the
+first four hits are ignored (the 5th hit is the first hit to be included
+as a bookmark), and the fifth field means that all further hits after the
+25th will be ignored, too.
+
+8) In law texts, I use a regular expression
++,\n *(§\.? *\d+[a-z]?\.?) +, 1
+to search for all paragraphs starting like "§. 15. " or " §23 ", and set
+a bookmark there using only the part from the § to the last digit or the
+full stop after the last digit (the pattern between the (), in our two
+cases the bookmark names will be "§. 15." and "§23" ).