<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/udf/unicode.c, branch v4.19</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>udf: Add support for decoding UTF-16 characters</title>
<updated>2018-04-19T14:00:48+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2018-04-16T16:46:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8a0cdef1619ead670a2cba876a3cb97e41aa5e83'/>
<id>8a0cdef1619ead670a2cba876a3cb97e41aa5e83</id>
<content type='text'>
Add support to decode characters outside of Base Multilingual Plane of
UTF-16 encoded in CS0 charset of UDF.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add support to decode characters outside of Base Multilingual Plane of
UTF-16 encoded in CS0 charset of UDF.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udf: Add support for encoding UTF-16 characters</title>
<updated>2018-04-19T14:00:48+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2018-04-16T15:30:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ef2e18f1fa3958f8ee1a38acaebb6991d49dce18'/>
<id>ef2e18f1fa3958f8ee1a38acaebb6991d49dce18</id>
<content type='text'>
Add support to store characters outside of Base Multilingual Plane of
UTF-16 in CS0 encoding of UDF.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add support to store characters outside of Base Multilingual Plane of
UTF-16 in CS0 encoding of UDF.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udf: Push sb argument to udf_name_[to|from]_CS0()</title>
<updated>2018-04-19T14:00:48+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2018-04-16T14:57:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d504adc29142755edda4ef0f24ec81b7088564a4'/>
<id>d504adc29142755edda4ef0f24ec81b7088564a4</id>
<content type='text'>
Push superblock argument to udf_name_[to|from]_CS0() functions so that
we can decide about character conversion functions there.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Push superblock argument to udf_name_[to|from]_CS0() functions so that
we can decide about character conversion functions there.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udf: Convert ident strings to proper charset</title>
<updated>2018-04-19T14:00:48+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2018-04-16T13:44:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e966fc8d9953167fe7c29495495436846467a5d2'/>
<id>e966fc8d9953167fe7c29495495436846467a5d2</id>
<content type='text'>
iocharset= mount option specifies the character set used on *console*
(not on disk). So even dstrings from VRS need to be converted from CS0
to the specified charset and not always UTF-8. This is barely user
visible as those strings are shown only in UDF debug messages.

CC: Andrew Gabbasov &lt;andrew_gabbasov@mentor.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
iocharset= mount option specifies the character set used on *console*
(not on disk). So even dstrings from VRS need to be converted from CS0
to the specified charset and not always UTF-8. This is barely user
visible as those strings are shown only in UDF debug messages.

CC: Andrew Gabbasov &lt;andrew_gabbasov@mentor.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udf: Use UTF-32 &lt;-&gt; UTF-8 conversion functions from NLS</title>
<updated>2018-04-19T14:00:48+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2018-04-12T15:06:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b8a41c44a4ed8bad89b91584a7c7e4610c4b8c88'/>
<id>b8a41c44a4ed8bad89b91584a7c7e4610c4b8c88</id>
<content type='text'>
Instead of implementing our own functions converting to and from UTF-8,
use the ones provided by NLS.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of implementing our own functions converting to and from UTF-8,
use the ones provided by NLS.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udf: Fix leak of UTF-16 surrogates into encoded strings</title>
<updated>2018-04-18T14:34:55+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2018-04-12T15:22:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=44f06ba8297c7e9dfd0e49b40cbe119113cca094'/>
<id>44f06ba8297c7e9dfd0e49b40cbe119113cca094</id>
<content type='text'>
OSTA UDF specification does not mention whether the CS0 charset in case
of two bytes per character encoding should be treated in UTF-16 or
UCS-2. The sample code in the standard does not treat UTF-16 surrogates
in any special way but on systems such as Windows which work in UTF-16
internally, filenames would be treated as being in UTF-16 effectively.
In Linux it is more difficult to handle characters outside of Base
Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
characters only. Just make sure we don't leak UTF-16 surrogates into the
resulting string when loading names from the filesystem for now.

CC: stable@vger.kernel.org # &gt;= v4.6
Reported-by: Mingye Wang &lt;arthur200126@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
OSTA UDF specification does not mention whether the CS0 charset in case
of two bytes per character encoding should be treated in UTF-16 or
UCS-2. The sample code in the standard does not treat UTF-16 surrogates
in any special way but on systems such as Windows which work in UTF-16
internally, filenames would be treated as being in UTF-16 effectively.
In Linux it is more difficult to handle characters outside of Base
Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
characters only. Just make sure we don't leak UTF-16 surrogates into the
resulting string when loading names from the filesystem for now.

CC: stable@vger.kernel.org # &gt;= v4.6
Reported-by: Mingye Wang &lt;arthur200126@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udf: Fix signed/unsigned format specifiers</title>
<updated>2017-10-17T10:00:58+00:00</updated>
<author>
<name>Steve Magnani</name>
<email>steve.magnani@digidescorp.com</email>
</author>
<published>2017-10-12T13:48:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fcbf7637e6647e00de04d4b2e05ece2484bb3062'/>
<id>fcbf7637e6647e00de04d4b2e05ece2484bb3062</id>
<content type='text'>
Fix problems noted in compilion with -Wformat=2 -Wformat-signedness.
In particular, a mismatch between the signedness of a value and the
signedness of its format specifier can result in unsigned values being
printed as negative numbers, e.g.:

  Partition (0 type 1511) starts at physical 460, block length -1779968542

...which occurs when mounting a large (&gt; 1 TiB) UDF partition.

Changes since V1:
* Fixed additional issues noted in udf_bitmap_free_blocks(),
  udf_get_fileident(), udf_show_options()

Signed-off-by: Steven J. Magnani &lt;steve@digidescorp.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix problems noted in compilion with -Wformat=2 -Wformat-signedness.
In particular, a mismatch between the signedness of a value and the
signedness of its format specifier can result in unsigned values being
printed as negative numbers, e.g.:

  Partition (0 type 1511) starts at physical 460, block length -1779968542

...which occurs when mounting a large (&gt; 1 TiB) UDF partition.

Changes since V1:
* Fixed additional issues noted in udf_bitmap_free_blocks(),
  udf_get_fileident(), udf_show_options()

Signed-off-by: Steven J. Magnani &lt;steve@digidescorp.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udf: Fix conversion of 'dstring' fields to UTF8</title>
<updated>2016-04-25T13:18:50+00:00</updated>
<author>
<name>Andrew Gabbasov</name>
<email>andrew_gabbasov@mentor.com</email>
</author>
<published>2016-04-25T11:19:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c26f6c61578852f679787d555e6d07804e1f5f14'/>
<id>c26f6c61578852f679787d555e6d07804e1f5f14</id>
<content type='text'>
Commit 9293fcfbc1812a22ad5ce1b542eb90c1bbe01be1
("udf: Remove struct ustr as non-needed intermediate storage"),
while getting rid of 'struct ustr', does not take any special care
of 'dstring' fields and effectively use fixed field length instead
of actual string length, encoded in the last byte of the field.

Also, commit 484a10f49387e4386bf2708532e75bf78ffea2cb
("udf: Merge linux specific translation into CS0 conversion function")
introduced checking of the length of the string being converted,
requiring proper alignment to number of bytes constituing each
character.

The UDF volume identifier is represented as a 32-bytes 'dstring',
and needs to be converted from CS0 to UTF8, while mounting UDF
filesystem. The changes in mentioned commits can in some cases
lead to incorrect handling of volume identifier:
- if the actual string in 'dstring' is of maximal length and
does not have zero bytes separating it from dstring encoded
length in last byte, that last byte may be included in conversion,
thus making incorrect resulting string;
- if the identifier is encoded with 2-bytes characters (compression
code is 16), the length of 31 bytes (32 bytes of field length minus
1 byte of compression code), taken as the string length, is reported
as an incorrect (unaligned) length, and the conversion fails, which
in its turn leads to volume mounting failure.

This patch introduces handling of 'dstring' encoded length field
in udf_CS0toUTF8 function, that is used in all and only cases
when 'dstring' fields are converted. Currently these cases are
processing of Volume Identifier and Volume Set Identifier fields.
The function is also renamed to udf_dstrCS0toUTF8 to distinctly
indicate that it handles 'dstring' input.

Signed-off-by: Andrew Gabbasov &lt;andrew_gabbasov@mentor.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 9293fcfbc1812a22ad5ce1b542eb90c1bbe01be1
("udf: Remove struct ustr as non-needed intermediate storage"),
while getting rid of 'struct ustr', does not take any special care
of 'dstring' fields and effectively use fixed field length instead
of actual string length, encoded in the last byte of the field.

Also, commit 484a10f49387e4386bf2708532e75bf78ffea2cb
("udf: Merge linux specific translation into CS0 conversion function")
introduced checking of the length of the string being converted,
requiring proper alignment to number of bytes constituing each
character.

The UDF volume identifier is represented as a 32-bytes 'dstring',
and needs to be converted from CS0 to UTF8, while mounting UDF
filesystem. The changes in mentioned commits can in some cases
lead to incorrect handling of volume identifier:
- if the actual string in 'dstring' is of maximal length and
does not have zero bytes separating it from dstring encoded
length in last byte, that last byte may be included in conversion,
thus making incorrect resulting string;
- if the identifier is encoded with 2-bytes characters (compression
code is 16), the length of 31 bytes (32 bytes of field length minus
1 byte of compression code), taken as the string length, is reported
as an incorrect (unaligned) length, and the conversion fails, which
in its turn leads to volume mounting failure.

This patch introduces handling of 'dstring' encoded length field
in udf_CS0toUTF8 function, that is used in all and only cases
when 'dstring' fields are converted. Currently these cases are
processing of Volume Identifier and Volume Set Identifier fields.
The function is also renamed to udf_dstrCS0toUTF8 to distinctly
indicate that it handles 'dstring' input.

Signed-off-by: Andrew Gabbasov &lt;andrew_gabbasov@mentor.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udf: Merge linux specific translation into CS0 conversion function</title>
<updated>2016-02-09T12:05:23+00:00</updated>
<author>
<name>Andrew Gabbasov</name>
<email>andrew_gabbasov@mentor.com</email>
</author>
<published>2016-01-15T08:44:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=484a10f49387e4386bf2708532e75bf78ffea2cb'/>
<id>484a10f49387e4386bf2708532e75bf78ffea2cb</id>
<content type='text'>
Current implementation of udf_translate_to_linux function does not
support multi-bytes characters at all: it counts bytes while calculating
extension length, when inserting CRC inside the name it doesn't
take into account inter-character boundaries and can break into
the middle of the character.

The most efficient way to properly support multi-bytes characters is
merging of translation operations directly into conversion function.
This can help to avoid extra passes along the string or parsing
the multi-bytes character back into unicode to find out it's length.

Signed-off-by: Andrew Gabbasov &lt;andrew_gabbasov@mentor.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Current implementation of udf_translate_to_linux function does not
support multi-bytes characters at all: it counts bytes while calculating
extension length, when inserting CRC inside the name it doesn't
take into account inter-character boundaries and can break into
the middle of the character.

The most efficient way to properly support multi-bytes characters is
merging of translation operations directly into conversion function.
This can help to avoid extra passes along the string or parsing
the multi-bytes character back into unicode to find out it's length.

Signed-off-by: Andrew Gabbasov &lt;andrew_gabbasov@mentor.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udf: Remove struct ustr as non-needed intermediate storage</title>
<updated>2016-02-09T12:05:23+00:00</updated>
<author>
<name>Andrew Gabbasov</name>
<email>andrew_gabbasov@mentor.com</email>
</author>
<published>2016-01-15T08:44:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9293fcfbc1812a22ad5ce1b542eb90c1bbe01be1'/>
<id>9293fcfbc1812a22ad5ce1b542eb90c1bbe01be1</id>
<content type='text'>
Although 'struct ustr' tries to structurize the data by combining
the string and its length, it doesn't actually make much benefit,
since it saves only one parameter, but introduces an extra copying
of the whole buffer, serving as an intermediate storage. It looks
quite inefficient and not actually needed.

This commit gets rid of the struct ustr by changing the parameters
of some functions appropriately.

Also, it removes using 'dstring' type, since it doesn't make much
sense too.

Just using the occasion, add a 'const' qualifier to udf_get_filename
to make consistent parameters sets.

Signed-off-by: Andrew Gabbasov &lt;andrew_gabbasov@mentor.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Although 'struct ustr' tries to structurize the data by combining
the string and its length, it doesn't actually make much benefit,
since it saves only one parameter, but introduces an extra copying
of the whole buffer, serving as an intermediate storage. It looks
quite inefficient and not actually needed.

This commit gets rid of the struct ustr by changing the parameters
of some functions appropriately.

Also, it removes using 'dstring' type, since it doesn't make much
sense too.

Just using the occasion, add a 'const' qualifier to udf_get_filename
to make consistent parameters sets.

Signed-off-by: Andrew Gabbasov &lt;andrew_gabbasov@mentor.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
</feed>
