Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help understanding unicode and findrx
#1
Hi

I think I've found a bug in my understanding and was hoping for an upgrade Big Grin

I'm using findrx to find text in the window of another application and then uses windows messages like EM_SETSEL and outp to update the text in the other application; it seems to work fine in ANSI mode but I think I want it to run in Unicode mode (since this seems more general). Everything works fine until certain characters appear in the text (like an n with a tilde - ñ). When this happens the selections are off by the number of 'special' characters that occur in the text (it is as if each of these characters counts as two). If the encodings were inconsistent between Unicode and ANSI this would make sense - maybe all I need to know is how to make these consistent.

Here is a simple function that displays the text matching a regular expression from another application:

Function TestReplaceUnicodeAnsi
Code:
Copy      Help
function'int int'hwndre str'findthis

str windowContents.getwintext(hwndre); if(!windowContents.len) ret -3
windowContents.findreplace("[]" "[10]")
str findString = findthis

ARRAY(CHARRANGE) a
int flag = 4
int isFound = findrx(windowContents findString 0 flag&3|4|8|32 a)
out F"isFound = {isFound}"
if (isFound)
,for _i 0 a.len
,,out F"a[{_i}] min ={a[_i 0].cpMin},  max={a[_i 0].cpMax}"
,,int nc = a[_i 0].cpMax - a[_i 0].cpMin
,,str substr.get(windowContents a[_i 0].cpMin nc)
,,out F"Selected string is '{substr}'"

and this function is invoked like this:
Macro RunTestReplaceUnicodeAnsi
Code:
Copy      Help
int w=win("PowerScribe 360 | Reporting" "WindowsForms10.Window.8.app.0.3ce0bb8_r13_ad1")
int c=child("" "*.RICHEDIT50W.*" w 0x0 "wfName=rtbReport") ;;editable text

str findThis =  "(?m)(?<=IMPRESSION:)\S?\s{{0,2}\w?"

int testResult = TestReplaceUnicodeAnsi(c findThis)

out F"TestResult is {testResult}"

If the original text contains this string:
Quote:ññññ

IMPRESSION:
Nodule in the left lung

then the output in the console when Unicode is not selected in QM is:
Quote:isFound = 1
a[0] min =1209, max=1211
Selected string is '
N'
TestResult is 0
which is what I want - the first character after "IMPRESSION:" and some whitespace.

If I run the same code after setting Unicode on in Tools->Options - I get
Quote:Unicode:

isFound = 1
a[0] min =1213, max=1215
Selected string is '
N'
TestResult is 0
which is also fine - I get exactly the selected string I want. Since there are four 'ñ' characters the offsets are different by 4. But when I want to update the text in the other window (say I want to highlight it) it is misaligned:
Code:
Copy      Help
SendMessage hwndre EM_SETSEL a[0 i].cpMin a[0 i].cpMax
works just fine in the non-Unicode case - but if Unicode is enabled in Tools->Options the offsets are off.
Code:
Copy      Help
IsWindowUnicode(hwndre)
returns true so I am assuming that I should be using Unicode.

My confused question: Is there a way to generate the offsets using Unicode such that when I do a selection for pasting or highlighting that the offsets are consistent? Or do I have a bad mental model for how this works? Thanks.


Messages In This Thread

Forum Jump:


Users browsing this thread: 1 Guest(s)