Attributes Affecting Text Layout
Yesterday I started a radical revamp of the Hebrew Reader code. I had previously noticed that occasionally a vowel marking (a niqqud) would appear a little shifted to one side or the other. Nothing drastic, but definitely noticeable and ugly.
I began to suspect that the problem was caused by some custom attributes I was assigning to the text to mark verses and segments. In other words I was using NSMutableAttributedString’s addAttribute:value:range and the like to assign my own keys and values to bits of text.
Apple’s documentation says:
You may assign any name/value pair you wish to a range of characters, in addition to the standard attributes described in the [Apple docs.]
Nevertheless I suspected that my custom attributes were causing these layout problems. So I wrote a little test application that basically goes through all of the text displayed in an NSTextView and applies an arbitrary attribute to each uncomposed character. So vowel markings get their own attributes separate from the letters they are attached to.
NSTextStorage *ts = [textView textStorage];
NSRange r = {0,1};
for (;r.location+r.length<=[ts length]; r.location++)
{
[ts addAttribute:[NSString stringWithFormat:@"Nonsense%u",r.location]
value:[NSString stringWithFormat:@"Gibberish%u",r.location]
range:r];
}
This code converts this:

into this:

So I posted a question to the cocoa-dev mailing list and promptly got an answer back from Douglas Davidson, a developer at Apple whose name pops up a whole lot in connection with the Cocoa text system. He suggested I try applying the attributes to composed character ranges only. That is, making sure the ranges include both the characters and all of the markings that modify them.
Apple supplies the following two NSString instance methods to help the developer find what the ranges of composed character sequences are:
- (NSRange)rangeOfComposedCharacterSequenceAtIndex:(NSUInteger)anIndex
- (NSRange)rangeOfComposedCharacterSequencesForRange:(NSRange)range
Sure enough, making sure that attribute ranges only contain whole composed character sequence ranges solves the problem. But is this correct behavior on the part of the text system? Probably not, especially for custom attributes that shouldn’t affect layout anyway (like my verse markings). In fact, Mr. Davidson suggested I file a bug report on this issue. (Which I will do soon…) But he also suggested that a good rule of thumb is to treat the composed character sequences as the actual characters, since that it was the user will perceive to be a single character.
