The Standard C++ Locale

A version of the following article appeared in Dr. Dobb's Journal.

The Standard C++ Locale

by Nathan C. Myers
<http://www.cantrip.org/>

 

When your users' language matches your own, it is not hard to make your programs use it correctly in menus, dates, messages, and sorted lists. Software travels, though, and these days you and most potential users of your code may have only a machine language in common. The market for your code becomes immensely larger if it can accommodate their needs, too.

The description of where your program is running, and of your user's preferences, is called its locale. Keeping the locale separate from the program code so that it can be changed easily is called internationalization. In C the locale describes only what is common to an entire country, but the C++ locale is more flexible.

Challenges

In supporting internationalization, the C++ Standard Library confronts many challenges. Your server programs may have many clients around the world, so the standard library must support more than one locale per program. You use (or will soon use) multi-threading, so it must be re-entrant. The ways your users' cultures and preferences differ are unlimited, so it must be extensible. Because you may not (yet) care about locales, it must be ignorable. Finally, because it is standard, it must be efficient, easy, and safe to use.

The answer in the Standard C++ library to all these requirements uses the full power of the language, including new standard features not yet implemented in all compilers [in 1997! All current compilers have no problem with any of this code]. This article suggests how the C++ locale library may be used. It also shows how it might be implemented, so you can use the same techniques in your own programs.

The Locale Object

A locale in the C (not C++) standard describes a character encoding (with sorting rules) and the formats for a few value types: numbers, dates, money. These represent only the edge of a vast continent of real cultural and personal choices; others commonly expressed include time zone, measurement system, paper size, window colors and fonts, citizenship, letterhead, e-mail address, sex, and shoe size. No standard or standards can list all your preferences. The C++ standard only provides (as examples) those categories found in the traditional C locale libraries--but you can extend it.

The key to understanding the C++ locale library is the facet. A facet, informally, is a class interface for a service that may be obtained from a locale object. For example, the Standard C++ library facet num_put<> formats numeric values, and collate<> provides an ordering for string values. A facet is also an object contained in a locale. Each locale object contains a set of facet objects to provide these services.

The C++ locale is a simple object that can be (efficiently) passed around, copied, and assigned just like any built-in value. Functions that take a locale argument can declare a default argument value, locale(), which is a copy of the current global locale; this allows you to omit the argument and get reasonable behavior. Each iostream instance keeps a locale object on hand, for use by the operators >> and <<. These measures give the locale facilities a low profile so they won't intrude where their more powerful features are not needed.

A Date Class Example

How does this look, in code? Example 1 is a simple Date class.


Example 1: A simple Date class


  // file: date.h
  #include <iosfwd> // for istream, ostream
  #include <ctime>  // for struct tm

  namespace ChronLib {
  class Date {
    long day;  // days since 1752-09-14
   public:
    enum Month { jan, feb, mar, apr, may, jun, 
                 jul, aug, sep, oct, nov, dec };
  
    Date(int year, Month month, int day);
    void asCLibTime(struct std::tm*) const;  
  };

  std::ostream&
  operator<< (std::ostream& os, Date const& date);

  std::istream&
  operator>> (std::istream& is, Date& date);
  }


The standard headers <iosfwd> and <ctime> declare the standard names used in the example.

Nothing about locales or facets is visible here; just a constructor, stream operators, and a single member function, asCLibTime(), which fills a Standard C library struct tm for communication with other libraries. (See your strftime() manual page for more information about struct tm.) Thus, you don't need to know anything about locales to be able to use class Date.

The formats your users expect to see for dates vary all over the map. If you coded a format into operators >> and <<, you would leave most of your potential users dissatisfied. Instead, when you write those operators, you can delegate the formatting to the locale object kept by the stream. This will be demonstrated in example 3.

An Example Program

How can your users control the format produced by an operator<< that uses locales? Example 2 may be the simplest possible example.


Example 2: A program that uses your preferred locale


  #include <iostream>  /* for cout */
  #include <locale>    /* for locale */
  #include "date.h" 
  
  int main()
  {
    using ChronLib::Date;
    std::cout.imbue( std::locale("") );
    std::cout << Date(1942, Date::dec, 7) << std::endl;
    return 0;
  }


The standard headers, <iostream> and <locale>, declare the names std::locale, std::cout, and std::endl, used here, and "date.h" has the declarations from Example 1.

The constructor call std::locale("") creates a locale object that represents the user's preferences. The standard doesn't say what this means, but on many systems the library substitutes whatever is found in an environment variable (often LANG or LC_ALL) in place of the empty string. A common name for the American locale, for example, is "en_US". (On POSIX systems you can type "locale -a" to list the names of supported locales.)

The call to std::cout.imbue() installs the newly constructed locale in the ostream std::cout, for use by the various operators <<. The next line uses Date's operator<< (declared in Example 1) which delegates the work to a facet of the locale that it obtains from std::cout.

Using Facets

To use a facet of a locale, you call the Standard C++ library global function template std::use_facet<>(), found in the standard header <locale>. Figure 1 shows its declaration.


Figure 1: The standard template use_facet<>()


  namespace std {
    template <class Facet>
      Facet const&  use_facet(locale const& loc);
  };


For a facet class Stats with a member function shoeSize(), for example, and a locale object named loc, a call would look like:

  std::use_facet<Stats>(loc).shoeSize();

This syntax for calling a function template, where you supply the template parameter explicitly instead of the compiler deducing it from one of the arguments, is not implemented in all compilers yet; it is called "explicit template function qualification". It resembles the syntax for the new cast expressions, such as dynamic_cast<>, and in fact use_facet<>() acts as a safe cast. In the example above, the resulting reference is used immediately to call the member function Stats::shoeSize() on the instance of Stats stored in the locale object loc.

An Example operator<<

Example 3 is a complete implementation of Date::operator<<. It uses the standard facet time_put<char>.


Example 3: Operator<< for class Date


  // date_insert.C
  #include <ctime>    /* for struct tm */
  #include <ostream>  /* for ostream */
  #include <locale>
  #include "date.h" 

  namespace ChronLib {
  std::ostream&
  operator<<(std::ostream& os, Date const& date)
  {
    std::ostream::sentry  cerberus(os);            //1
    if (!bool(cerberus)) return os;                //2
    std::tm tmbuf; date.asCLibTm(&tmbuf);          //3
    std::time_put<char> const&  timeFacet = 
      std::use_facet< std::time_put<char> >( os.getloc() );  //4
    if (timeFacet.put(
          os,os,os.fill(),&tmbuf,'x').failed())    //5
      os.setstate(os.badbit);
    return os;                                     //6
  }
  }


A lot is going on here. The lines marked //1 and //2 create and check a standard "ostream::sentry" object. (This is a new class in the standard iostream library; its constructor prepares the ostream for output. In a multi-threaded environment it might lock the stream.) Line //3 fills in the local struct variable tmbuf with the components of the date argument.

The interesting part follows: In line //4, os.getloc() obtains the locale object kept by the ostream argument os, and the call to use_facet<>() gets a reference to the standard facet time_put<char> in that locale. In line //5, the call to time_put<char>::put actually writes the characters out to the stream os and returns a value to report any errors. (Ignore the arguments to put, for now.) Line //6 destroys the locale::sentry object (perhaps unlocking the stream) and returns the stream os.

Reflect on what this means. The header "date.h" didn't mention locales, but because of this code hidden in operator>>, a couple of lines in main() let you format dates appropriately for users anywhere in the world. (Without those lines in main(), you get the default "C" locale behavior.)

Your Own Facet

The standard facets are designed so you can derive from them to get finer control of locale behavior. Such a derived facet inherits the interface of the base facet, but you can override its virtual members to change its behavior.

Derivation is not the only way you can extend a locale. You can make your own facet, and construct a locale to hold it. Example 4 is the sample Stats facet mentioned earlier.


Example 4: The sample Stats facet


  // stats.h
  #include <locale>

  class Stats  : public std::locale::facet {
   public:
    static std::locale::id id;
  
    Stats(int ss)        : shoeSize_(ss) {}
    int shoeSize() const { return shoeSize_; }
  
   private:
    Stats(Stats&);           // not defined:
    void operator=(Stats&);  // not defined:
  
    int shoeSize_;
  };
  
  // stats.C
  #include "stats.h"
  std::locale::id Stats::id;
  

What makes the class Stats a facet? It's derived from locale::facet, it has a public static member named id of type locale::id, and its member functions are declared const. That's all. It does not need a default constructor, copy constructor, or assignment operator. (They are declared here so that anybody who tries to use them will get a compiler error.)

A facet class instance is only useful as part of a locale. Example 5 shows one way to install a facet instance as part of a locale.


Example 5: Using the Stats facet


  std::locale  aLocale( std::locale(), new Stats(48) );
  int s = std::use_facet<Stats>(aLocale).shoeSize();


The first statement constructs a locale object named aLocale as a copy of the current global locale, with the addition of a newly created Stats facet. (In a real program you would probably get the Stats constructor argument from a file.) It uses one of the locale class's template constructors (see figure 5), which deduces the facet type from the pointer argument. (Support for template constructors, as for other member templates, is a recent addition to the language, and is not yet implemented in all compilers.) The locale takes ownership of the facet object, so that you never need to delete it, and its memory can't leak. The second statement demonstrates using it, as in the earlier examples.

Facets are most useful if they are standard, so that you can use them without preparing, or making your users prepare, data files for every language. The most useful of any new facets you invent can be published and standardized, independently of the C++ Standard; then the data files for each language can be collected and posted on the internet for use by anyone's programs.

Under the Hood

How does all this work? It can all be implemented in ordinary C++. You can use the same techniques in your own programs.

First, the locale object itself is efficient to copy and assign because it really contains only a pointer, as in Figure 2.


Figure 2: Standard C++ locale implementation


  class locale {
   public:
    class facet;
    class id;
  
   ~locale() 
      { if (imp_->refs_-- == 0) delete imp_; }
    locale() 
      : imp_(__global_imp) { ++imp_->refs_; }
    locale(locale const& other)
      : imp_(other.imp_)   { ++imp_->refs_; }
    template <class Facet>
      locale(locale const& other, Facet* f);
    explicit locale(char const* name);
    // other constructors
    locale& operator=(locale const& l);
  
    template <class Facet>
      friend Facet const&  use_facet(locale const&);
  
   private:
    struct imp {
      size_t refs_; 
      vector<facet*> facets_;
      imp(const imp&);
     ~imp();
    };
    imp* imp_;
  };


(Only the members used in examples above are listed here.) The only non-const member function is assignment, so all copies can share the same "implementation vector", pointed to by member imp_. The copy constructor copies imp_ and bumps the reference count; the default constructor copies a pointer to a global instance, the same way. (This definition uses one new language feature not mentioned yet: the constructor from "char const *" is declared "explicit" so the compiler will not use it as an implicit conversion.)

The facet base class, locale::facet, shown in Figure 3, is also reference-counted.


Figure 3: locale::facet base class definition


  class locale::facet {
    friend class locale;
    friend class locale::imp;
    size_t refs_;
   protected:
    explicit facet(int refs = 0);
    virtual ~facet();
  };


It has a virtual destructor so that when the count goes to zero, the locale can safely destroy an instance of any class derived from it. (This definition depends on another recent language feature: a nested class can now be defined outside the containing class.)

Note that the examples here follow the convention that a reference count value of zero implies a single reference. This gives any static instances an initial use count of one before any static constructors have been executed. This property is often useful, though it is not actually used in the code presented here.

The only tricky bit is in the class locale::id, in Figure 4.


Figure 4: locale::id class definition


  class locale::id {
    friend class locale;
    size_t index_;
    static size_t mark_;
  };


Recall that each facet type contains a static member of type locale::id. Thus, there is one static instance per facet type. The default constructor id() (carefully) does not initialize the member index_. Because "static constructors" are called at times that are (for most practical purposes) random, they may be called after the value has already been used, so it is essential that initialization not depend on a constructor. The members of each static instance are reliably set to zero by the program loader, and remain zero until they are set to something else. When does the member index_ get set?

Figure 5 shows the definition of the locale template constructor used back in Example 5.


Figure 5: locale template constructor


  template <class Facet>
    locale::locale(
      locale const& other, Facet* f)
  {
    imp_ = new imp(*other.imp_);
    imp_->refs_ = 0;  // one reference
  
    size_t& index = Facet::id.index_;
    if (!index) 
      index = ++Facet::id.mark_;
    if (index >= imp_->facets_.size())
      imp_->facets_.resize(index+1);
  
    ++f->facet::refs_;
    facet*& fpr = imp_->facets_[index];
    if (fpr) --fpr->refs_;
    fpr = f;
  }


The constructor begins by copying the implementation vector from other, which increments all the facet reference counts. Then it sets Facet::id.index_ to assign the facet an identity if it has none yet, and (if necessary) grows the new vector to fit. Finally, it installs the new facet, being careful to keep the reference counts right. Thus, the id::index_ member is zero until it is actually used, and it is considered used only when a locale object exists which contains the facet that owns it. (This code is not thread-safe; the thread-safe version would be a bit harder to read, but otherwise similar.)

Notice that this template constructor, and use_facet<>(), can be instantiated only if the Facet parameter really qualifies as a facet in every way; otherwise, you get a compile or link error. Hence, the library enforces its own interface requirements.

The Function Template use_facet<>()

The template use_facet, declared in Figure 1 and called in several of the examples, is (finally) defined in Figure 6.


Figure 6: The Function Template use_facet<>()


  template <class Facet>
    inline Facet const& 
    use_facet(locale const& loc)
  {
    size_t index = Facet::id.index_;
    locale::facet* fp;
    if (index >= loc.imp_->facets_.size() ||
        (fp = loc.imp_->facets_[index]) == 0)
      throw bad_cast();
    return static_cast<Facet const&>(*fp);
  }


If the facet has not yet been assigned an identity, or if no instance of it (or anything derived from it) is found in the argument locale, use_facet<>() throws an exception. (The test, here, is tricky: if index is bigger than the vector, or if the pointer at offset index is zero, then the facet is not present; the pointer at offset zero, corresponding to an uninitialized facet index, is always zero.)

I have omitted definitions of assignment operators and destructors because they are not very interesting. I also have omitted the definition of the locale constructor from a string (as used in Example 2), because it would not fit in a magazine article.

Conclusion

The Standard C++ locale library offers much more than what is presented here. Still, the most interesting facets are yet to be designed. The C++ Standard committee is closing up shop; it remains for people like you, working with POSIX and ad hoc internet interest groups, to standardize bindings for what now clog the "preferences" menu of every interactive application. Perhaps the most pressing need is for a standard time zone facet which can check the current version of a timezone database out on the internet (such as the "TZ" database at ftp://elsie.nci.nih.gov/pub/).


[Thank you to all my reviewers, but particularly to Chris Lopez and John Gilson.]

Bio:
Nathan developed the facilities found in chapter 22 of the Draft C++ Standard mainly so he would be able to write portable C++ programs without bothering about locales. He can be reached via his web page, http://www.cantrip.org/.

 

Return to The Cantrip Corpus.
Send email: ncm-nospam@cantrip.org
Copyright ©1997 by Nathan Myers. All Rights Reserved. URL: <http://www.cantrip.org/locale.html>