KEncodingProber Class

Provides encoding detection(probe) capabilities. More...

Header: #include <KEncodingProber>
CMake: find_package(KF6 REQUIRED COMPONENTS Codecs)
target_link_libraries(mytarget PRIVATE KF6::Codecs)

Public Types

enum ProberState { FoundIt, NotMe, Probing }
enum ProberType { None, Universal, Arabic, Baltic, CentralEuropean, …, WesternEuropean }

Public Functions

KEncodingProber(KEncodingProber::ProberType proberType = Universal)
float confidence() const
(since 4.2.2) QByteArray encoding() const
KEncodingProber::ProberState feed(QByteArrayView data)
void reset()
void setProberType(KEncodingProber::ProberType proberType)
KEncodingProber::ProberState state() const

Static Public Members

QString nameForProberType(KEncodingProber::ProberType proberType)
KEncodingProber::ProberType proberTypeForName(const QString &lang)

Detailed Description

Probe the encoding of raw data only. In the case it can't find it, return the most possible encoding it guessed.

Always do Unicode probe regardless the ProberType

Feed data to it several times with feed() until ProberState changes to FoundIt/NotMe, or confidence() returns a value you find acceptable.

Intended lifetime of the object: one instance per ProberType.

Typical use:

QByteArray data, moredata;
...
KEncodingProber prober(KEncodingProber::Chinese);
prober.feed(data);
prober.feed(moredata);
if (prober.confidence() > 0.6)
   encoding  = prober.encoding();

At least 256 characters are needed to change the ProberState from Probing to FoundIt. If you don't have so many characters to probe, decide whether to accept the encoding it guessed so far according to the Confidence by yourself.

Member Type Documentation

enum KEncodingProber::ProberState

ConstantValueDescription
KEncodingProber::FoundIt0Sure find the encoding
KEncodingProber::NotMe1Sure not included in current ProberType's all supported encodings
KEncodingProber::Probing2Need more data to make a decision

enum KEncodingProber::ProberType

ConstantValue
KEncodingProber::None0
KEncodingProber::Universal1
KEncodingProber::Arabic2
KEncodingProber::Baltic3
KEncodingProber::CentralEuropean4
KEncodingProber::ChineseSimplified5
KEncodingProber::ChineseTraditional6
KEncodingProber::Cyrillic7
KEncodingProber::Greek8
KEncodingProber::Hebrew9
KEncodingProber::Japanese10
KEncodingProber::Korean11
KEncodingProber::NorthernSaami12
KEncodingProber::Other13
KEncodingProber::SouthEasternEurope14
KEncodingProber::Thai15
KEncodingProber::Turkish16
KEncodingProber::Unicode17
KEncodingProber::WesternEuropean18

Member Function Documentation

KEncodingProber::KEncodingProber(KEncodingProber::ProberType proberType = Universal)

Default ProberType is Universal(detect all possible encodings)

float KEncodingProber::confidence() const

Returns the confidence(sureness) of encoding it guessed so far (0.0 ~ 0.99), not very reliable for single byte encodings

[since 4.2.2] QByteArray KEncodingProber::encoding() const

Returns a QByteArray with the name of the best encoding it has guessed so far

This function was introduced in 4.2.2.

KEncodingProber::ProberState KEncodingProber::feed(QByteArrayView data)

The main class method

Feed data to the prober

Returns the ProberState after probing the fed data.

[static] QString KEncodingProber::nameForProberType(KEncodingProber::ProberType proberType)

map ProberType to language string

proberType the proper type

Returns the language string

[static] KEncodingProber::ProberType KEncodingProber::proberTypeForName(const QString &lang)

Returns the ProberType for lang (e.g. proberTypeForName("Chinese Simplified") will return KEncodingProber::ChineseSimplified

void KEncodingProber::reset()

reset the prober's internal state and data.

void KEncodingProber::setProberType(KEncodingProber::ProberType proberType)

change current prober's ProberType and reset the prober

proberType the new type

KEncodingProber::ProberState KEncodingProber::state() const

Returns the prober's current ProberState