The default GSM Alphabet character set (DGSM), as per the GSM338 standards document, supports only a limited set of characters because it is 7-bit encoded.
If you want to send other characters you will need to encode messages in UCS2, which is 16-bit encoded and allows you to send messages using a variety of characters sets, such as, Chinese, Arabic, Russian and accented characters not present in DGSM.
If Content Providers (CPs) don’t know how to encode messages in UCS2 they should use the Digital Interconnect UTF8 interface (HTTP only). They will need to encode a message written from an operating system in UTF8 in Base64 to force the UTF8 encoding - and submit messages to our platform. The A2P platform will check if any character is outside of the DGCS and if there is at least one, then the system will encode the message in UCS2.
Because UCS2 is using 2 bytes and because an SMS is 140 bytes UCS2 messages have a maximum of 70 characters. CPs will need to control the message length or they can use the SplitText parameter and concatenated messages will be generated if the message length > 140 bytes.
For Latin alphabet languages the normal usage for CPs is NOT to use characters outside of DGCS. End-users are usually used to receive messages with missing accents.
With UCS2 encoding, the maximum of 70 characters is often reduced by the use of other features in the header. Is there a best practice for UCS2 character count planning which can be safely used in most circumstances allowing for header variance as may be needed; I have in view here particularly the impact of concatentation. Is best to plan on using only, say 67 characters? 65? What is a usually safe character count to advise message content creators?