PDA

View Full Version : strange characters input from text file



johnshaw
07-27-2005, 08:17 AM
I have a VBA Excel macro that takes text from txt files to add to another text file. Part of the code is:

Open FilePath For Input As #3
Do Until EOF(3)
Line Input #3, texttowrite
MsgBox texttowrite 'this is just to test
Print #2, texttowrite
Loop

I have built text files that I open as #3 using wordpad, notepad, and MS Word (saved as txt). Regardless, the very first line of the text file (as found in file #2 and displayed in the MsgBox) starts with three strange characters: ?

Those characters do not show up in the file I read from (opened as #3) using notepad, word pad, etc. However, they do show up in the file I am writing to (#2) using any program as well as in the MsgBox I added to debug.

The characters are not on any other line. Other than those characters at the start of the first line there are no problems. The program is building html web pages from data and files listed in an Excel spread sheet and works perfectly except for the ? in the first lines.

Where do the characters come from?

Jacob Hilderbrand
07-27-2005, 08:50 AM
I'm not sure where they come from. Perhaps you can attach the file that causes this problem. But when you use a Message Box to display the text, are those characters shown as the first line? If so, you can just use an If statement to not write those to the new file.

johnshaw
07-27-2005, 09:31 AM
Right now I have a quick workaround of putting something I don't need in the first line, and then, before the Do until EOF loop I input the first line and ignore it. Then I start copying. But I don't like having to add a wasted line at the beginning of each file. Most importantly, I am extremely curious about those three characters and where they come from.

mark007
07-27-2005, 06:30 PM
My first thought is: are you sure it's a plain text file and not a unicode or rich-text file? Make sure you save-as the file as text only.

:)

johnshaw
07-27-2005, 07:44 PM
Yes, the files are saved as plain text. Several files have no problem. Others have the characters "?" at the very beginning of the file, but nowhere else in the file. These characters are not seen when I open the file in Notepad, wordpad, MS Word, or Front Page. However, when I do the Line Input and display the results in a MsgBox the characters show up. Then when I print the input to another file, the characters show up in that file using any of the above programs.

I use a workaround now of reading the first line and, for each character in the first line is greater than 131 ascii, replacing that character with a blank.

But I am still curious about the source of the characters.

Jacob Hilderbrand
07-27-2005, 08:15 PM
Can you post the code you are using and attach the file that has the problem? Perhaps we can notice the problem.

Also, to follow up on what Mark said. Open one of the problem files and copy all the text to Notepad (now Wordpad). Then save, and try your macro on the new file. Does the problem persist?

mark007
07-28-2005, 02:25 AM
Please attach an example of a file for which this is happening for us to take a look at. Odd initial characters are almost always headers to specify the encoding.

:)

johnshaw
07-28-2005, 07:33 AM
Please attach an example of a file for which this is happening for us to take a look at. Odd initial characters are almost always headers to specify the encoding.

:)

Mark007,
I have attached a zip file with the text file I am using (navcol.txt) and another text file with the Excel macro.

Thanks,
John

Norie
07-28-2005, 07:56 AM
John

When I open your files both of them have strange characters (?) in the first line.

johnshaw
07-28-2005, 08:06 AM
Norie,
What are you using to open them? I just opened them (opened the uploaded files from my message) with notepad and didn't see any strange characters. I do see the characters at the start when I use the macro in micro.txt in Excel.
Thanks

Norie
07-28-2005, 08:11 AM
I used notepad.

mark007
07-28-2005, 08:16 AM
Your files are both encoded with UTF-8 and therefore have the byte order marks EF BB BF (in hex). When you create the file in notepad and click save-as make sure that the encoding in the second dropdown is set to ANSI rather than UTF-8.

:)

johnshaw
07-28-2005, 08:53 AM
Mark007,


Your files are both encoded with UTF-8 and therefore have the byte order marks EF BB BF (in hex). When you create the file in notepad and clcik save-as make sure that the encoding in the second dropdown is set to ANSI rather than UTF-8.
Thanks. That's it. I don't know how I got UTF-8 on the files I was using. I tried another time and the default on notepad was ansii, and the characters were not there when I read them using the excle macro. Then I tried saving as UTF-8 and got the characters. I am not sure how I ended up saving the earlier files as UTF-8.

Since the files the macro will be using will come from many sources I have added a filter to remove inappropriate characters from the first line.

Do you know how low, in hex, the codes might be for any type of encoding? I will filter out characters below a certain value.

Thanks,
John

mark007
07-28-2005, 09:01 AM
Here's a list of possible byte order marks for different formats:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_42jv.asp

:)

johnshaw
07-28-2005, 10:53 AM
Mark,
Thanks. I will set my filter to cut out those codes.
John

MWE
08-01-2005, 05:43 AM
Mark,
Thanks. I will set my filter to cut out those codes.
John
John: if your problem is now resolved, mark the thread "solved"

johnshaw
08-01-2005, 09:22 AM
MWE, Thanks. I didn't know about that. And thanks to Mark007 for solving the problem.

Hisam
07-20-2017, 02:17 AM
I have same problem, but I am working with utf-8 csv files on purpose . When macro merged them, the first data entry of second and other following merging documents in the merged document (utf-8 file) have these strange format characters and I cannot find the way to get rid of them. The outcome file has to be without these characters because other system doesn't work and procedure stops with error. Probably the reason why it is hapenning is because it's chucking in the file header which specifies that the file is in UTF-8 format. Is it possible to solve it?

Paul_Hossler
07-20-2017, 05:48 AM
Welcome to VBX

It's better to start your own question, instead of tacking it on to a 12 year thread

If you can attach a small example CSV it would be easier to see what's going on

Use [Go Advanced] at the bottom right and then the paperclip icon.

SamT
07-20-2017, 11:22 AM
Paul, When you're done, can you Close this old thread? Thanks. Sam.

Paul_Hossler
07-23-2017, 07:56 AM
@Hisam -- if you still have a question, please start a new thread

Since this one is SO old, I'm going to close it