summaryrefslogtreecommitdiff
path: root/sysutils/uniutils/pkg-descr
diff options
context:
space:
mode:
authorThierry Thomas <thierry@FreeBSD.org>2005-05-15 15:54:36 +0000
committerThierry Thomas <thierry@FreeBSD.org>2005-05-15 15:54:36 +0000
commit71fa9a677947f008497476914e6148689b6caee9 (patch)
tree365200d11a4408487d58d2dc1de81e24c896c74f /sysutils/uniutils/pkg-descr
parentUpgrade to 1.2.3. (diff)
Add unidesc 2.12, Unicode Description Utilities.
Unidesc consists of four programs for finding out what is in a Unicode file. They are useful when working with Unicode files when one doesn't know the writing system, doesn't have the necessary font, needs to inspect invisible characters, needs to find out whether characters have been combined or in what order they occur, or needs statistics on which characters occur.
Diffstat (limited to 'sysutils/uniutils/pkg-descr')
-rw-r--r--sysutils/uniutils/pkg-descr23
1 files changed, 23 insertions, 0 deletions
diff --git a/sysutils/uniutils/pkg-descr b/sysutils/uniutils/pkg-descr
new file mode 100644
index 000000000000..1144e261299f
--- /dev/null
+++ b/sysutils/uniutils/pkg-descr
@@ -0,0 +1,23 @@
+Unidesc consists of four programs for finding out what is in a Unicode file.
+They are useful when working with Unicode files when one doesn't know the
+writing system, doesn't have the necessary font, needs to inspect invisible
+characters, needs to find out whether characters have been combined or in what
+order they occur, or needs statistics on which characters occur.
+
+uniname defaults to printing the character offset of each character, its byte
+offset, its hex code value, its encoding, the glyph itself, and its name.
+
+unidesc reports the character ranges to which different portions of the text
+belong. It can also be used to identify Unicode encodings (e.g. UTF-16be)
+flagged by magic numbers.
+
+unihist generates a histogram of the characters in its input, which must be
+encoded in UTF-8 Unicode. By default, for each character it prints the
+frequency of the character as a percentage of the total, the absolute number of
+tokens in the input, the UTF-32 code in hexadecimal, and, if the character is
+displayable, the glyph itself as UTF-8 Unicode.
+
+ExplicateUTF8 is intended for debugging or for learning about Unicode. It
+determines and explains the validity of a sequence of bytes as a UTF8 encoding.
+
+WWW: http://www.cis.upenn.edu/~wjposer/unidesc.html