PDA

View Full Version : Solved: Something Else worth discussing



stanl
09-09-2007, 08:35 AM
How SQL works with SQL Server's nVarcharMAX and Unicode.:dunno Stan

[I've found some frustration here]

stanl
09-10-2007, 11:58 AM
Arrrrgghhh:banghead: it appears that statistically, 90% of the questions I post go unanswered. So I 'spose you can create a new rating for me as DUMBF___K.

To clarify the first post. Assume you can input Unicode {let's say html converted to Chinese via Google.translate} into an nVarchar MAX field... However, when you extract the data in SQL or to XML and use any string functions, the Unicode appears to be internally converted to ANSI. But in your text box etc... you want the data to appear as Unicode translated to the charset.

Hope this makes sense

Stan

malik641
09-10-2007, 12:50 PM
Hang in there, Stan. I saw this thread when you first posted it and I would like to at least replicate what you're getting and (hopefully) help you with this.

I'm not saying I'm going to save the day. Just letting you know I've been subscribed to this thread since it started (I've just been a bit busy...plus I have to read up on Unicode as I don't fully understand it...then try to put this code into a nVarCharMAX column in SQL Server 2005 and figure out what's going on). And anyone who can join in is more than welcome!

stanl
09-11-2007, 06:31 AM
Hang in there, Stan.

No problem... I was just goofing with the Aaargghhh! theng:friends:

This issue can tend to get complicated real fast, but let me start with a real-word example from another forum: the user needed to convert sections of their Excel invoices into the Japenese charset. I posted a simple script that passed the translation through google.translate {which actually worked}. So the issue soon moved to storing the Japenese as unicode on SQL Server 2005 free editions - in an Nvarchar field.

What I found was that retrieval of the field into with scripting screen functions appeared to perform an auto-convert to ansi, which displayed crap [or utf-8 that would not render properly]. I felt that perhaps using the ADODB.Stream object was a solution... and that is where I got lost.

Consider the attached file unitest.txt [in the zip]. If you issue a type command from a Cmd window [dos] the file will show as all ???? - unicode]. Open it in Notepad and it should display with the Russian charset as unicode [it is from Pushkin]... You can cut/paste from notepad into Excel and it still is correct... but now, assuming the text was in an Nvarchar field, how to programatically perform the same.

Hope this makes sense :banghead:

stanl
09-11-2007, 06:42 AM
[oh] here is the zip

Tommy
09-11-2007, 09:32 AM
Hi stanl,
If you do a search on:
Issues Specific to the Double-Byte Character Set (DBCS)

It may help in resoling some of your problems or I may have just missed the boat again. ;)

stanl
09-11-2007, 10:00 AM
Hi stanl,
If you do a search on:
Issues Specific to the Double-Byte Character Set (DBCS)

It may help in resoling some of your problems or I may have just missed the boat again. ;)

no, maybe I missed it. Can you explain what DBCS has to do with unicode and charsets; I always thought it was reserved for EDCIDC?

I do plan to post a worksheet to further illustrate my confusion. Stan

Tommy
09-11-2007, 10:57 AM
I don't know what a EDCIDC is, I did a search and it had something to do with IBM Websphere z/OS.

DBCS is Double Byte Character Set
SBCS which is a Single .....
The English version of Visual Basic is SBCS, so when you run code it converts the unicode/dbcs to sbcs and back for you. I guess what I am getting at is what are you actually inserting into the database (unicode or ascii) and what are you getting back? is the test server on a dbcs?

What is the name of the character set sent(original?) and the one used for display?

stanl
09-11-2007, 11:51 AM
I guess what I am getting at is what are you actually inserting into the database (unicode or ascii)

There lies the conundrum. The issue is not so much with inserting unicode into Nvarchar or NvarCharMax, but retireving the data and rendering it properly.

I have attached a pretty worthless Excel workbook to illustrate. The 'russian' text is in the texbox. The sub [executed from the button] makes several tries to render the translated text, all of which show the basic ANSI ?????

I even try it in Explorer [which works with other scripting languages].

Also, in the xls is a rendering if you were to Copy/Paste from the Excel text box into Google Translate...

Get my drift.... this may be one of the reasons VBA may die out as more code needs to be international - ized.

Just my .02 Stan

stanl
09-12-2007, 01:38 AM
I don't know what a EDCIDC

My bad... I meant EBCIDC. As for the Server, I'm just playing with SQL Server 2005 [Lite]. I would run a test like


'Open ADO Connection
cConn = "Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Data Source=.\SQLEXPRESS;Initial Catalog=NCS"
Set oConn = CreateObject("ADODB.Connection")
oConn.CommandTimeOut=0
oConn.Open cConn

'Create 2 test Tables, one to hold Unicode
cSQL = "CREATE TABLE [dbo].[N_MaxTest] ( [NMaxVar] NVARCHAR(MAX) NOT NULL );"
oConn.Execute cSQL, ,adExecuteNoRecords
cSQL = "CREATE TABLE [dbo].[MaxTest] ( [MaxVar] VARCHAR(MAX) NOT NULL );"
oConn.Execute cSQL, ,adExecuteNoRecords

text = "This is a Test" & vbcrlf

'insert into both tables
cSQL="INSERT INTO [dbo].[N_MaxTest] ( [NMaxVar] ) VALUES (REPLICATE(CAST(N'" & text & "' AS NVARCHAR(MAX)), 100));"
oConn.Execute cSQL, ,adExecuteNoRecords
cSQL="INSERT INTO [dbo].[MaxTest] ( [MaxVar] ) VALUES (REPLICATE(CAST('" & text & "' AS VARCHAR(MAX)), 100));"
oConn.Execute cSQL, ,adExecuteNoRecords
oConn.Close


and I expected to have the same data in ANSI and Unicode, so that I might retrieve and compare. But both would come out as ANSI, same length.

So, I am assuming I need to set a Code Page [I think 65001 is the code page for utf-8], or have some sort of unicode/ansi conversion method for string data.:dunno Stan

P.S. What I would really like to do is insert the unitest.txt [I attached in a zip to a previous post in this thread] into an NVARCHAR field as Unicode.

stanl
09-12-2007, 04:18 AM
I can mark this 'solved'. What I was missing is that functions like len() will give the same results for Unicode or ANSI.


cSQL = "SELECT len(NMaxVar) AS NMaxLen, NMaxVar FROM [dbo].[N_MaxTest];"
'or
cSQL = "SELECT len(MaxVar) AS MaxLen, MaxVar FROM [dbo].[MaxTest];"

'will have the same length


What I needed was a ByteCount, which I am not sure is native to VBA:dunno .

I then resolved the Russian Unicode by using an ADODB.Stream - which inserted into an NvarcharMax field nicely. Stan

Tommy
09-12-2007, 06:06 AM
What I needed was a ByteCount, which I am not sure is native to VBA:dunno .

LenB()?

stanl
09-12-2007, 08:29 AM
LenB()?

Isn't that for non-unicode strings? Hasn't it also been decremented?

Tommy
09-12-2007, 10:40 AM
From Help (VB 6)

Use the LenB function with byte data contained in a string, as in double-byte character set (DBCS) languages. Instead of returning the number of characters in a string, LenB returns the number of bytes used to represent that string. With user-defined types, LenB returns the in-memory size, including any padding between elements.

stanl
09-12-2007, 11:35 AM
I might have been thinking LeftB$, becuase LenB is VB6, not VBA [again I coild be wrong]. I think the real issue is unicode data needs to be turned into a bytearray.:dunno Stan

I appreciate your sticking with this thread.

Tommy
09-12-2007, 11:55 AM
LOL simple example that works in VBA acad, excel, word ...

Sub aa()
Dim mC() As Byte, mD As String
Dim mA As String
mA = "jhjhjh"
ReDim mC(LenB(mA))
MsgBox LenB(mA)
MsgBox Len(mA)
mC = mA
mD = mC
End Sub

There are some string functions - ChrW ChrB AscB AscW that are used for unicode coding.


I appreciate your sticking with this thread.

LOL I'm in and out, here today, back next week. Not trying to spam you. ;)

stanl
09-12-2007, 12:03 PM
LOL I'm in and out, here today, back next week. Not trying to spam you. ;)

Good! when you return we can take on a real SQLServer example:friends:

Tommy
09-12-2007, 12:13 PM
LOL I'll have to brush up but sounds like fun ;D

SQL 2005 express or pro
Access 2000-2003
PostgreSql ver 2.5-7.2