At work I posed an innocent question about displaying large integers (like 12345678) with commas in the thousands separator (like 12,345,678). As I found out, this was quite a loaded question.
My question had to do with implementing it in Python. I couldn't find a built-in method, so I wrote up this awful one-liner:
>>> x = 2493085724309857243980 >>> print ",".join( [str(x/(10**i))[-3:] for i in range(3*10,1,-3) if x/(10**i)>0] + [str(x)[-3:]] ) 2,493,085,724,309,857,243,980
It works for numbers with up to ten commas. Which is big, but not portable. Nor does it work with floats (in fact, it breaks in a fantastic display of numbers.)
I also assumed commas were appropriate. They're not always.
My question came back with this response:
Doesn't python have a printf like function that is handled at the bytecode level? The best (simplest and efficient) way to do it is the interpreter/compiler level.
So I did some research, and found out how to do it. C's printf
has way of separating the thousands place with a comma when the
'
(apostrophe) modifier is applied to to i, d, f, etc. Except Pythons
printf parser doesn't understand it!
After some research I found that since each country separates their long numbers differently (Europeans might write 1.234.567,89 while Americans write 1,234,567.89 -- that's why the central question of this text is "loaded"), UNIX provides "locales" to tune standard output of varous things. From locale(7):
A locale is a set of language and cultural rules. These cover aspects such as language for messages, different character sets, lexigraphic conventions, etc. A program needs to be able to determine its locale and act accordingly to be portable to different cultures.
Back to Python:
>>> import locale >>> locale.format("%d", 3245452, 1) '3245452'
Oops. It seems that default locale has no thousands separator defined:
>>> locale.localeconv()["thousands_sep"] ''
... so you have to switch to a locale that does:
>>> locale.setlocale(locale.LC_NUMERIC, 'en_US.ISO8859-1') 'en_US.ISO8859-1' >>> locale.localeconv()["thousands_sep"] ','
Now things work:
>>> locale.format("%d", 3245452, 1) '3,245,452' >>> locale.format("%d", 324545278968968698, 1) '324,545,278,968,968,698'
Doing things in C is semantically identical:
#include <stdio.h> #include <locale.h> int main(void) { int i = 1234567; /* Print i with default locale, C */ printf("%'15d (%s)\n", i, setlocale(LC_NUMERIC, NULL)); /* Switch locale for numerics, and print i */ setlocale(LC_NUMERIC, "en_US.iso88591"); printf("%'15d (%s)\n", i, setlocale(LC_NUMERIC, NULL)); return 0; }
It outputs:
1234567 (C) 1,234,567 (en_US.iso88591)
On the topic of internationalization, Joel Spolsky has an essay worth reading: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
https://michal.guerquin.com/locales.html
, updated 2004-12-02 01:36 EST