LRM html Support

From Lingoport Wiki
Jump to: navigation, search

Examples of .html or .htm Files or a file extension that uses the html parser type

1. 
<html>
  <p>This is some text in an html file</p>
</html>

2.  
<body>
  <p>This is a html text fragment</p>
</body>

html parser type

valid html syntax

Files that use the html parser are expected to have valid html syntax

htm/html uses the html parser type

When defining a project containing LRM Standard .html or .html resource files, there is no need to define a <parser-type> as the html parser will always be used.

unique file extension needs to define html parser type

If a unique file extension is a valid html file, then the <parser-type> should be html in the project definition file.

XHTML - well-formed xml needs to define html parser type

A well-formed document in XML, such as .dita files, can be parsed by using the html parser type. If there are well-defined keys/values, such as in a .resx or strings.xml file then the xml parser type should be used.

LRM interaction with html parser type files

Number of keys in file is 1

All files that are parsed using the html parser have only 1 key called key1. The value that corresponds to this key is the entire html type file. Because there are no key/values pairs, html parsed files cannot be instrumented (used in our InContext Reviewer/Translation product).

Prep kit files are always full file

If the checksum of the base file has changed then the file will be sent out in the next prep kit for all target locales. Since the file contains only 1 key, the entire file will be sent out for translation.

File can be pseudo-localized and number of words counted

Since LRM is able to parse the text portion of html parsed files, the files can be pseudo-localized and the number of words counted.

Example of Project Definition for Resources

The following is an example of html resource file definitions. See resource extensions for more information.

  <resource-extensions>
   <resource-extension>
     <!-- parser-type not needed since .html is a standard LRM extension that maps to the html parser type -->
     <extension>html</extension>
     <file-name-pattern>*-l_c_v</file-name-pattern>
     <use-pattern-on-dflt-locale>0</use-pattern-on-dflt-locale>
     <file-location-pattern>l_c_v</file-location-pattern>
     <use-location-pattern-on-dflt-locale>1</use-location-pattern-on-dflt-locale>
     <base-file-encoding>UTF-8</base-file-encoding>
     <localized-file-encoding>UTF-8</localized-file-encoding>
     <parameter-regex-pattern></parameter-regex-pattern>
   </resource-extension>
   <resource-extension>
     <!-- parser-type not needed since .htm is a standard LRM extension that maps to the html parser type -->
     <extension>htm</extension>
     <file-name-pattern>*-l_c_v</file-name-pattern>
     <use-pattern-on-dflt-locale>0</use-pattern-on-dflt-locale>
     <file-location-pattern>l_c_v</file-location-pattern>
     <use-location-pattern-on-dflt-locale>1</use-location-pattern-on-dflt-locale>
     <base-file-encoding>UTF-8</base-file-encoding>
     <localized-file-encoding>UTF-8</localized-file-encoding>
     <parameter-regex-pattern></parameter-regex-pattern>
   </resource-extension>
   <resource-extension>
     <!-- parser-type is required because .myext is not a standard LRM extension -->
     <extension>myext</extension>
     <parser-type>html</parser-type>
     <file-name-pattern>*-l_c_v</file-name-pattern>
     <use-pattern-on-dflt-locale>0</use-pattern-on-dflt-locale>
     <file-location-pattern>l_c_v</file-location-pattern>
     <use-location-pattern-on-dflt-locale>1</use-location-pattern-on-dflt-locale>
     <base-file-encoding>UTF-8</base-file-encoding>
     <localized-file-encoding>UTF-8</localized-file-encoding>
     <parameter-regex-pattern></parameter-regex-pattern>
   </resource-extension>
 </resource-extensions>