mirror of
				https://github.com/ether/etherpad-lite.git
				synced 2025-10-31 16:21:11 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			134 lines
		
	
	
		
			5.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			134 lines
		
	
	
		
			5.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| 
 | |
| 
 | |
| Copied from the old Etherpad. Found in /infrastructure/ace/
 | |
| 
 | |
| Goals:
 | |
| 
 | |
| - no unicode (for efficient escaping, sightliness)
 | |
| - efficient operations for ACE and collab (attributed text, etc.)
 | |
| - good for time-slider
 | |
| - good for API
 | |
| - line-ending aware
 | |
| X more coherent (deleting or styling text merging with insertion)
 | |
| - server-side syntax highlighting?
 | |
| - unify author map with attribute pool
 | |
| - unify attributed text with changeset rep
 | |
| - not: reversible
 | |
| - force final newline of document to be preserved
 | |
| 
 | |
| - Unicode bad:
 | |
|   - ugly (hard to read)
 | |
|   - more complex to parse
 | |
|   - harder to store and transmit correctly
 | |
|   - doesn't save all that much space anyway
 | |
|   - blows up in size when string-escaped
 | |
|   - embarrassing for API
 | |
| 
 | |
| 
 | |
| # Attributes:
 | |
| 
 | |
| An "attribute" is a (key,value) pair such as (author,abc123456) or
 | |
| (bold,true).  Sometimes an attribute is treated as an instruction to
 | |
| add that attribute, in which case an empty value means to remove it.
 | |
| So (bold,) removes the "bold" attribute.  Attributes are interned and
 | |
| given numeric IDs, so the number "6" could represent "(bold,true)",
 | |
| for example.  This mapping is stored in an attribute "pool" which may
 | |
| be shared by multiple changesets.
 | |
| 
 | |
| Entries in the pool must be unique, so that attributes can be compared
 | |
| by their IDs.  Attribute names cannot contain commas.
 | |
| 
 | |
| A changeset looks something like the following:
 | |
| 
 | |
| Z:5g>1|5=2p=v*4*5+1$x
 | |
| 
 | |
| With the corresponding pool containing these entries:
 | |
| 
 | |
| ...
 | |
| 4 -> (author,1059348573)
 | |
| 5 -> (bold,true)
 | |
| ...
 | |
| 
 | |
| This changeset, together with the pool, represents inserting
 | |
| a bold letter "x" into the middle of a line.  The string consists of:
 | |
| 
 | |
| - a letter Z (the "magic character" and format version identifier)
 | |
| - a series of opcodes (punctuation) and numeric values in base 36 (the
 | |
|   alphanumerics)
 | |
| - a dollar sign ($)
 | |
| - a string of characters used by insertion operations (the "char bank")
 | |
| 
 | |
| If we separate out the operations and convert the numbers to base 10, we get:
 | |
| 
 | |
| Z :196 >1 |5=97 =31 *4 *5 +1 $"x"
 | |
| 
 | |
| Here are descriptions of the operations, where capital letters are variables:
 | |
| 
 | |
| ":N" : Source text has length N (must be first op)
 | |
| ">N" : Final text is N (positive) characters longer than source text (must be second op)
 | |
| "<N" : Final text is N (positive) characters shorter than source text (must be second op)
 | |
| ">0" : Final text is same length as source text
 | |
| "+N" : Insert N characters from the bank, none of them newlines
 | |
| "-N" : Skip over (delete) N characters from the source text, none of them newlines
 | |
| "=N" : Keep N characters from the source text, none of them newlines
 | |
| "|L+N" : Insert N characters from the source text, containing L newlines.  The last
 | |
|          character inserted MUST be a newline, but not the (new) document's final newline.  
 | |
| "|L-N" : Delete N characters from the source text, containing L newlines. The last
 | |
|          character inserted MUST be a newline, but not the (old) document's final newline.
 | |
| "|L=N" : Keep N characters from the source text, containing L newlines.  The last character
 | |
|          kept MUST be a newline, and the final newline of the document is allowed.
 | |
| "*I"   : Apply attribute I from the pool to the following +, =, |+, or |= command.
 | |
|          In other words, any number of * ops can come before a +, =, or | but not
 | |
| 	 between a | and the corresponding + or =.
 | |
|          If +, text is inserted having this attribute.  If =, text is kept but with
 | |
| 	 the attribute applied as an attribute addition or removal.
 | |
| 	 Consecutive attributes must be sorted lexically by (key,value) with key
 | |
| 	 and value taken as strings.  It's illegal to have duplicate keys
 | |
| 	 for (key,value) pairs that apply to the same text.  It's illegal to
 | |
| 	 have an empty value for a key in the case of an insertion (+), the
 | |
| 	 pair should just be omitted.
 | |
| 
 | |
| Characters from the source text that aren't accounted for are assumed to be kept
 | |
| with the same attributes.
 | |
| 
 | |
| Additional Constraints:
 | |
| 
 | |
| - Consecutive +, -, and = ops of the same type that could be combined are not allowed.
 | |
|   Whether combination is possible depends on the attributes of the ops and whether
 | |
|   each is multiline or not.  For example, two multiline deletions can never be
 | |
|   consecutive, nor can any insertion come after a non-multiline insertion with the
 | |
|   same attributes.
 | |
| - "No-op" ops are not allowed, such as deleting 0 characters.  However, attribute
 | |
|   applications that don't have any effect are allowed.
 | |
| - Characters at the end of the source text cannot be explicitly kept with no changes;
 | |
|   if the change doesn't affect the last N characters, those "keep" ops must be left off.
 | |
| - In any consecutive sequence of insertions (+) and deletions (-) with no keeps (=),
 | |
|   the deletions must come before the insertions.
 | |
| - The document text before and after will always end with a newline.  This policy avoids
 | |
|   a lot of special-casing of the end of the document.  If a final newline is
 | |
|   always added when importing text and removed when exporting text, then the
 | |
|   changeset representation can be used to process text files that may or may not
 | |
|   have a final newline.
 | |
| 
 | |
| Attribution string:
 | |
| 
 | |
| An "attribution string" is a series of inserts with no deletions or keeps.
 | |
| For example, "*3+8|1+5" describes the attributes of a string of length 13,
 | |
| where the first 8 chars have attribute 3 and the next 5 chars have no
 | |
| attributes, with the last of these 5 chars being a newline.  Constraints
 | |
| apply similar to those affecting changesets, but the restriction about
 | |
| the final newline of the new document being added doesn't apply.
 | |
| 
 | |
| Attributes in an attribution string cannot be empty, like "(bold,)", they should
 | |
| instead be absent.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| -------
 | |
| Considerations:
 | |
| 
 | |
| - composing changesets/attributions with different pools
 | |
| - generalizing "applyToAttribution" to make "mutateAttributionLines" and "compose"
 |