Re: CSCvi87952 - Issues with incoming messages detecting as Japanese

p-mason · ‎04-28-2025

We just recently start running into this situation due to the fact that we were aiming to implement a Japanese version of our mail banner. Many cases it works, but there are also a lot of cases where a message in English is detected as Japanese.

I'm glad that support pointed out this bug after I opened a ticket, but I'm concerned for two reasons:

This bug report was created 7 years ago. What has been going on?
There is no definition of a short message mentioned in the report. And even for a number of messages that might be of decent length, it still detects errantly as Japanese.
Pretending the message is short, if there's no need for a long message, why would we instruct an outside party to make a message longer? And why would it detect as Japanese when there are no Japanese characters present?

Is there any sort of status of timeline?

adena783moze · ‎05-02-2025

Hello,

A 7-Year-Old Bug Still Unresolved
If a bug report has been open for 7 years, that typically signals one of three things:Low prioritization by the product team.Complexity or ambiguity in reproducing or resolving the issue.Inadequate ownership, possibly because the feature depends on a third-party or outdated module.t's fair to ask the vendor:

Has this bug been triaged or put into a backlog?

Is there any plan to rework or replace the underlying language detection engine?

2. Vague Definition of "Short Message"
A reliable language detection engine should work even on short texts, or gracefully fallback when confidence is low. The fact that it fails without a clear guideline on what constitutes a "short" message suggests:

The algorithm relies too heavily on heuristics (like word frequency).

There's no minimum confidence threshold or fallback to English.For example, a message like “Hi, please see the attached” might be mistaken for Japanese simply because it’s too short to match against enough English features.3. False Positives with No Japanese Characters
This is the most critical failure. A language detector misidentifying English as Japanese without any Japanese characters likely stems from:Misweighted language models, especially for character sets that overlap with Latin punctuation.A default fallback to Japanese when detection confidence is low (possibly due to regional settings or past behavior). e-zpassnh com