OpenGATE Contents | GATE development concepts: Strings

Problem description

Managing Text is a major feature all programming languages and developemnt frameworks should provide. Character strings (or just STRING variables) are present in all C and C++ libraries, but it is highly implementation specific, how their content is encoded and what dependencies are required.

Solution

The GATE Framework defines it own string structure (gate_string_t) to ensure a stable and interchangable API which can project string contents between multiple languages and C or C++ dialects.

  1. GATE strings are consecutive BYTE characters and the managing structure stores a pointer to the first character and a length value.
    Such strings do not need to end with a NULL character. If a NULL character is attached to BYTE buffer, it is NOT counted in the length value.
  2. gate_string_t instances can manage static or external data by just holding the pointer-length pair. And they can contain an additional reference counted string-buffer that keeps track of the allocated content.
  3. Strings are dynamically created within a String Builder instance (gate_strbuilder_t) where any kind of manipulation and appending of content is fully supported. When all required manipulations are applied, the results of a string builder can be transferred into a GATE string structure.
  4. All GATE string instances are immutable and are not allowed to be modified after their creation. The only legal access direction is to read their content.
  5. The one and only text encoding within a GATE string structure is UTF-8. When ever encoding or formating operations are required, the contents of a string needs to be UTF-8 encoded starting at its construction. Notice: There is no separate validation of input data for strings to improve performance. Bytes are just taken and processed. But when it comes to string conversion or operating system communication, only UTF-8 contents can be treated correctly.
  6. String contents can be shared by following methods:
    • copy creation: dynamically allocates a new byte-by-byte copy of the source string
    • cloning: shares a dynamically create string by incrementing its reference counter
      OR: creates a new dynamically allocated string in case of an unmanaged source
    • duplication: Just duplicates the string reference of the source. If the source was dynamically, the duplicate shares the content by reference counting. If the source was unmanaged, the duplicate just copies the unmanaged pointers without any further handling.
  7. It is explicitely allowed to create new shared string references of existing ones, where the new string only references a subset from the full string buffer (but holds a reference count to the full buffer).
    e.g.: gate_string_substr() does no unnecessary copying, it shares the string-buffer of its source but updates its pointer-length pair to address only the desired part of the original string.

C Example

 1#include <gate/strings.h>
 2
 3int main()
 4{
 5  gate_strbuilder_t builder = GATE_INIT_EMPTY;
 6  gate_string_t dynamic_text = GATE_INIT_EMPTY;
 7  gate_string_t suffix_text = GATE_INIT_EMPTY;
 8  gate_size_t position;
 9  /* static non-allocated string: */
10  gate_string_t static_text = 
11    GATE_STRING_INIT_STATIC("world");
12  
13  /* build dynamic string-buffer: */
14  gate_strbuilder_create(&builder, 0);
15  gate_strbuilder_append_cstr(&builder, "Hello ");
16  gate_strbuilder_append_string(&builder, &static_text);
17  gate_strbuilder_append_cstr(&builder, " from ");
18  gate_strbuilder_append_int32(&builder, 42);
19  gate_strbuilder_append_cstr(&builder, " other realms");
20  
21  /* use dynamic string: */
22  gate_strbuilder_to_string(&builder, &dynamic_text);
23  gate_strbuilder_release(&builder);
24  
25  position = gate_string_pos(&dynamic_text, &static_text, 0);
26  if(position != GATE_STR_NPOS)
27  {  
28    gate_string_substr(&suffix_text, &dynamic_text, 
29      position + gate_string_length(&static_text), GATE_STR_NPOS);
30  }
31  
32  /* cleanup */
33  gate_string_release(&suffix_text);
34  gate_string_release(&dynamic_text);
35
36  return 0;
37}

C++ Example

 1#include <gate/strings.hpp>
 2
 3int main()
 4{
 5  using namespace gate;
 6
 7  static String const staticText = 
 8    String::createStatic("world");
 9  StringBuilder builder;
10  
11  builder << "Hello " << staticText << 
12          << " from " 42 << "other realms";
13                  
14  String dynamicText = builder.toString();
15  size_t position = dynamicText.positionOf(staticText);
16  if(position != String::npos)
17  {
18    String suffix = dynamicText.substr(
19          position + staticText.length());
20  }
21  
22  return 0;
23}