LRM yaml Support

From Lingoport Wiki
Jump to: navigation, search

Yaml Parser

LRM supports YAML 1.2 and uses the snakeyaml1.33 engine to parse files. Information on snakeyaml can be found at snakeyaml. Information about YAML can be found at YAML

Supported YAML syntax

The following syntax is supported by LRM.

Ordered Map

 # Explicitly typed ordered map (dictionary).
 Bestiary: !!omap
   - aardvark: African pig-like ant eater.
   - anteater: South-American ant eater.
   - anaconda: South-American constrictor snake. 
 # Flow style
 Numbers1: !!omap [ 1: one, 2: two, 
        3: three ]
 
 #Implicit ordered map
 Numbers2:
   - {1: one}
   - {2: two}
   - {3: three}
 
 #Block style ordered map with duplicate keys
 Block style tasks:
   - meeting: Meeting with team.
   - meeting: Meeting with boss.
   - break: Lunch break.
   - meeting: Meeting with client.
 
 #Flow style ordered map with duplicate keys
 Flow tasks: [ meeting: Meeting with team, meeting: Meeting with boss ]

LRM parsing for Ordered Map

Sequences have a separator of _^a^_

The key/value pairs created by LRM are:

Key Value
Bestiary_^a1^__^o^_aardvark African pig-like ant eater.
Bestiary_^a2^__^o^_anteater South-American ant eater.
Bestiary_^a3^__^o^_anaconda South-American constrictor snake.
Numbers1_^a1^__^o^_1 one
Numbers1_^a2^__^o^_2 two
Numbers1_^a3^__^o^_3 three
Numbers2_^a1^__^o^_1 one
Numbers2_^a2^__^o^_2 two
Numbers2_^a3^__^o^_3 three
Block style tasks_^a1^__^o^_meeting Meeting with team.
Block style tasks_^a2^__^o^_meeting Meeting with boss.
Block style tasks_^a3^__^o^_break Lunch break.
Block style tasks_^a4^__^o^_meeting Meeting with client.
Flow tasks_^a1^__^o^_meeting Meeting with team.
Flow tasks_^a2^__^o^_meeting Meeting with boss.

Lists

 twobytwotable:
 - - 'a1'
   - "a2"
 - - b1
   - b2
 twobytwotable2:
 - [a1, a2]
 - ['b1', "b2"]
 list:
 - - 'a1'
 - - "b1"
 - - c1

LRM parsing for Lists

Sequences have a separator of _^a^_

The key/value pairs created by LRM are:

Key Value
twobytwotable_^a1^__^a1^_ 'a1'
twobytwotable_^a1^__^a2^_ "a2"
twobytwotable_^a2^__^a1^_ b1
twobytwotable_^a2^__^a2^_ b2
twobytwotable2_^a1^__^a1 a1
twobytwotable2_^a1^__^a2^_ a2
twobytwotable2_^a2^__^a1^_ 'b1'
twobytwotable2_^a2^__^a2^_ "b2"
list_^a1^__^a1^_ 'a1'
list_^a2^__^a1^_ "b1"
list_^a3^__^a1^_ c1

Literal Blocks

 # Multiple-line strings can be written either as a 'literal block' (using |),
  literal_block: |
     'This entire block of text will be the value of the 'literal_block' key,
     with line breaks being preserved.
     
     The literal continues until de-dented, and the leading indentation is
     stripped.
  
         Any lines that are 'more-indented' keep the rest of their indentation -
         these lines will be indented by 4 spaces.'

LRM parsing for Literal Block

The key/value pairs created by LRM are:

Key Value
literal_block 'This entire block of text will be the value of the 'literal_block' key,
with line breaks being preserved.

The literal continues until de-dented, and the leading indentation is
stripped.

Any lines that are 'more-indented' keep the rest of their indentation -
these lines will be indented by 4 spaces.'

Unordered Set

 Block style:
   login : Log In
   logout : "Log Out"
   name  : 'Name'
 Flow style: { login: 'Log In', logout: Log Out, name: "Name" }

LRM parsing for Unordered Set

Nested keys have a separator of _^o^_.

The key/value pairs created by LRM are:

Key Value
Block style_^o^_login Log In
Block style_^o^_logout "Log Out"
Block style_^o^_name 'Name'
Flow style_^o^_login 'Log In'
Flow style_^o^_logout Log Out
Flow style_^o^_name "Name"

Multi-line Keys

 # Keys can also be complex, like multi-line objects
 # We use ? followed by a space to indicate the start of a complex key.
 ? |
   This is a key
   that has multiple lines
 : "and this is its value"

LRM parsing for Multi-line Keys

The key/value pairs created by LRM are:

Key Value
This is a key

that has multiple lines

"and this is its value"

JSON style

 # Since YAML is a superset of JSON, you can also write JSON-style maps and
 # sequences:
 json_map: {"key": "value"}
 json_seq: [3, 2, 1, 'takeoff']
 and quotes are optional: {key: [3, 2, 1, takeoff]}

LRM parsing for JSON style

Nested keys have a separator of _^o^_. Sequences have a separator of _^a^_

The key/value pairs created by LRM are:

Key Value
json_map_^o^_key "value"
json_seq_^a1^_ 3
json_seq_^a2^_ 2
json_seq_^a3^_ 1
json_seq_^a4^_ 'takeoff'
and quotes are optional_^o^_key_^a1^_ 3
and quotes are optional_^o^_key_^a2^_ 2
and quotes are optional_^o^_key_^a3^_ 1
and quotes are optional_^o^_key_^a4^_ takeoff

Topmost List

A topmost list is a list that does not have a parent key. If a file contains a topmost list, then no other syntax type can be included.

 - key: login
   msg: "Log In"
 - key: logout
   msg: 'Log Out'

LRM parsing for Topmost list

Nested keys have a separator of _^o^_. Sequences have a separator of _^a^_

The key/value pairs created by LRM are:

Key Value
_^a1^__^o^_key login
_^a1^__^o^_msg "Log In"
_^a2^__^o^_key logout
_^a2^__^o^_msg 'Log Out'

Anchors

 # YAML also has a handy feature called 'anchors', which let you easily duplicate
 # content across your document. Both of these keys will have the same value:
 anchored_content: &anchor_name This string will appear as the value of two keys.
 other_anchor: *anchor_name
 
 # Anchors can be used to duplicate/inherit properties
 base: &base
   name: Everyone has same name
   
 foo: &foo
   <<: *base
   age: foo age
   
 bar: &bar
   <<: *base
   age: bar age
 
 # foo and bar would also have name: Everyone has same name

LRM parsing for Anchors

Nested keys have a separator of _^o^_.

The key/value pairs created by LRM are:

Key Value
anchored_content This string will appear as the value of two keys.
base_^o^_name Everyone has same name
foo_^o^_age foo age
bar_^o^_age bar age

Language Tags

A language tag is the language only tag that is at the top of the file.

 en:
   Block style:
     login : Log In
     logout : "Log Out"
     name  : 'Name'
   Flow style: { login: 'Log In', logout: Log Out, name: "Name" }

The pattern of the language tag is configurable

In the above example, 'en' is a language-only language tag. This is the default pattern. The language tag pattern is configurable through the config_lrm_info.properties file. The language tag pattern must match the pattern that are in your resource files. However, your resource files must contain consistent patterns or 'MISSING_KEY' errors may occur.

LRM ignores the YAML language tag when determining the resource keys

If a YAML file contains nested keys or arrays, then LRM flattens out the structure so that a unique key can be created. For example, each field name within a nested object has a separator of _^o^_. The language tag is not included in the creation of the unique key. The following is an example of the unique LRM resource keys created from the above YAML file.

The key/value pairs created by LRM are:

Key Value
Block style_^o^_login Log In
Block style_^o^_logout "Log Out"
Block style_^o^_name 'Name'
Flow style_^o^_login 'Log In'
Flow style_^o^_logout Log Out
Flow style_^o^_name "Name"

Global Tags

A global tag is a static top-level key that is in all files but should be ignored by Localyzer when determining keys to be sent out for translation. Example:

    ProjectLogin: 
       nestedKey: 
          login: 'Log In', 
          logout: "Log Out",
          name: Name       
       

Global tags are configurable

In the above example, 'ProjectLogin' is a global tag and needs to be added to the global.tags attribute of the config_lrm_info.properties file. There can be multiple global tags as long as they are at the top of file. In the above example, both ProjectLogin and nestedKey could be global tags. Global tags are expected to be in all files...both base and target.

Unsupported YAML syntax

The following syntax is not supported by LRM. If a file contains unsupported syntax then an error will occur when reading the file.

Set format

A set is an unordered collection of nodes such that no two nodes are equal.

 baseball players: !!set
   ? Mark McGwire
   ? Sammy Sosa
   ? Ken Griffey
 # Flow style
 baseball teams: !!set { Boston Red Sox, Detroit Tigers, New York Yankees }

Sequence format

A sequence is a collection indexed by sequential integers starting with zero.

 Block style: !!seq
 :- Mercury
 :- Pluto  
 
 Flow style: !!seq [ Mercury,  Pluto ]

Multiple Documents

A file that contains multiple documents, indicated by '---' and '...' is not supported.

 ---
 doc1: value1
 ...
 ---
 doc2: value2
 ...

Blank Values

An empty string, such as is allowed but a blank value is not.

 this_is_valid: ""
 this_is_invalid:
 this_is_valid2: 

Multi-lines that are not literal strings

 key1: 'this\n'
    is \n
   a test'

Folded Style (>)

 folded_style: >
     This entire block of text will be the value of 'folded_style', but this
     time, all newlines will be replaced with a single space.
     
     Blank lines, like above, are converted to a newline character.
      
       'More-indented' lines keep their newlines, too -
       this text will appear over two lines.

Mixing sequence with simple key/values

Sequence and simple key/values cannot be in the same file. See Topmost list

 - key: login
   msg: "Log In"
 - key: logout
   msg: 'Log Out'
 simplekey: Simple value