PDA

View Full Version : Read Image Pdf Attachment Using Modi Ocr Then Extract Certain Text & Use As Filename



nickj
09-26-2012, 08:14 AM
Hi Wondering if anyone can help me?

Im a newbie when it comes to programming, after extensive research on the subject ive come up with a few bits of code. I was wondering if there is someone out there to help me get this to work!

Basically what i want to do is set up a outlook rule that saves pdf attachments as they come into my inbox, but before saving them perform ocr on them as they are image pdf's. Then save the now text readable pdf using the words found within the file between Purchase Order: and Job Number: as the filename.

I realise modi does not support pdf as its a microsoft addon but i can open pdf's manually in modi so im guesing this can be automated aswell?

I've used this code to successfully save image pdf files to a folder based on a outlook rule:

Public Sub saveattachtoDisk(itm As Outlook.MailItem)
Dim objAtt As Outlook.Attachment
Dim saveFolder As String
saveFolder = "c:\temp\"
For Each objAtt In itm.Attachments
objAtt.SaveAsFile saveFolder & "\" & objAtt.DisplayName
Set objAtt = Nothing
Next
End Sub

This next snippet of code is supposed to run ocr using modi:

Function GetOCRText(TheFile As String) As String
On Error GoTo PROC_ERR
If TheFile = "" Then Exit Function
Dim MyDoc As Object ' MODI.document
Dim MyLayout As Object ' MODI.Layout
Set MyDoc = CreateObject("MODI.document") ' New MODI.document
MyDoc.Create TheFile
MyDoc.Images(0).OCR
Set MyLayout = MyDoc.Images(0).Layout
For Each TheWord In MyLayout.Words
Result = Result & " " & TheWord.Text
Next TheWord
Result = Result & vbCrLf & vbCrLf
GetOCRText = Result
Set MyLayout = Nothing
MyDoc.Close False
Set MyDoc = Nothing
PROC_ERR:
End Function

Please could someone point me in the right direction or give me some code to work with.

Many Thanks

nickj
09-26-2012, 08:21 AM
Just a quick note to add:

Im on Windows xp using outlook 2007 and Microsoft Office Document Imaging 12.0 Type Library

nickj
10-01-2012, 03:14 AM
Hi, is there anyone out there that can help me with this?

Many Thanks :)

Crocus Crow
10-04-2012, 01:51 PM
Hi Wondering if anyone can help me?Why have you posted the same question in another thread in this forum?



Basically what i want to do is set up a outlook rule that saves pdf attachments as they come into my inbox, but before saving them perform ocr on them as they are image pdf's. Then save the now text readable pdf using the words found within the file between Purchase Order: and Job Number: as the filename.Try the following code to OCR the file and extract the file name as you describe. I think you'll need to write code which saves the .pdf file attachment to a temporary file/folder before running the code below on it, because it does the OCR on a local file. The code uses early binding, so you must set a reference to the MODI library in your VBA project.

Sub Test()
Dim purchaseOrderFileName As String
purchaseOrderFileName = Get_Purchase_Order("c:\folder1\folder2\attachment.pdf")
End Sub

Function Get_Purchase_Order(fileName As String) As String

Dim MDoc As MODI.Document
Dim MLayout As MODI.Layout
Dim MWord As MODI.Word
Dim OCRtext As String
Dim p1 As Long, p2 As Long

Set MDoc = New MODI.Document

MDoc.Create fileName
MDoc.Images(0).OCR

Set MLayout = MDoc.Images(0).Layout
OCRtext = ""
For Each MWord In MLayout.Words
OCRtext = OCRtext & " " & MWord.Text
Next
MDoc.Close False

Get_Purchase_Order = ""

p1 = InStr(OCRtext, "Purchase Order:")
If p1 > 0 Then
p1 = p1 + Len("Purchase Order:")
p2 = InStr(p1, OCRtext, "Job Number:")
If p2 > 0 Then Get_Purchase_Order = Mid(OCRtext, p1, p2 - p1)
End If

Set MLayout = Nothing
Set MDoc = Nothing

End Function

nickj
10-05-2012, 06:54 AM
Hi Crocus Crow,

Thank you for the snippet of code..to answer your questions, the reason why I reposted the question was because i didnt get any response on this thread i posted, So i decided to post it on a thread that was relevant.

I have tried to run your code but i get an error Run-time error '-959967229 (c6c81003)': file is empty or corrupted

I then tried changing the file extension in line 3 of your code to .tif and then i started getting the error:

Run-time error '-959966950 (c6c8111a)': IO error

Finally when you say that the code uses early binding does that mean i need to set a reference to the MODI library in the code or is selecting the reference in the tools menu in VBA enough?

Many Thanks

Nick

Crocus Crow
10-07-2012, 12:38 PM
I have tried to run your code but i get an error Run-time error '-959967229 (c6c81003)': file is empty or corruptedDoes the file OCR successfully when you do it manually in MODI (the MS Office Document Imaging application). If it does then the code should also work and OCR the text the successfully.



I then tried changing the file extension in line 3 of your code to .tif and then i started getting the error:

Run-time error '-959966950 (c6c8111a)': IO errorThe code should work with .tif and .jpg files amongst others. As above, try the file manually in MODI.



Finally when you say that the code uses early binding does that mean i need to set a reference to the MODI library in the code or is selecting the reference in the tools menu in VBA enough?Early binding means the code uses named MODI object data types (instead of the generic VBA Object type) as in the following lines:

Dim MDoc As MODI.Document
Dim MLayout As MODI.Layout
Dim MWord As MODI.Word
Set MDoc = New MODI.Document

Therefore you must set a reference to the library in the VBA editor in the Tools - References menu, otherwise VBA won't recognise the MODI data types and give an error.

nickj
10-31-2012, 05:24 AM
Hi Crocus Crow, thank you so much for your feedback! I didnt recieve a email saying you had replied to my post so I thought it had not been responded to!

I have since seen your post thank you. I have managed to get rid of the errors by using a .tif extension. Only problem now is the code runs without errors but does not change the filename of attachment.tif it remains the same! Why would this be?

Please be advised I want to run this code everytime this snippet of code runs. How would I achieve this? Here is the snippet:

Public Sub saveattachtoDisk(itm As Outlook.MailItem)
Dim objAtt As Outlook.Attachment
Dim saveFolder As String
saveFolder = "c:\temp\"
For Each objAtt In itm.Attachments
objAtt.SaveAsFile saveFolder & "\" & objAtt.DisplayName
Set objAtt = Nothing
Next
End Sub

Charlize
10-31-2012, 07:18 AM
This coding will do what you want. No error checking for double names !!!
You could use the dir statement to count the no of files and to add a sequential number to the filename.
But, will save all attachments, also pictures used as signature.
Public Sub saveattachtoDisk(itm As Outlook.MailItem)
'attachment
Dim objAtt As Outlook.Attachment
'number of attachments
Dim Attcount As Long
'savefolder
Dim saveFolder As String
saveFolder = "c:\temp\"
'if no attachments, skip
If itm.Attachments.Count <> 0 Then
'loop through attachments
For Attcount = 1 To itm.Attachments.Count
Set objAtt = itm.Attachments.item(Attcount)
objAtt.SaveAsFile saveFolder & "\" & objAtt.DisplayName
Set objAtt = Nothing
Next Attcount
End If
End SubCharlize

nickj
10-31-2012, 07:48 AM
Hey Charlize, thank you so much for your feedback :) I'm wanting to use this piece of code:

Public Sub saveattachtoDisk(itm As Outlook.MailItem)
'attachment
Dim objAtt As Outlook.Attachment 'number of attachments
Dim Attcount As Long
'savefolder
Dim saveFolder As String
saveFolder = "c:\temp\"
'if no attachments, skip
If itm.Attachments.Count <> 0 Then
'loop through attachments
For Attcount = 1 To itm.Attachments.Count
Set objAtt = itm.Attachments.item(Attcount)
objAtt.SaveAsFile saveFolder & "\" & objAtt.DisplayName
Set objAtt = Nothing
Next Attcount
End If
End Sub


With this piece of code, so they work together:



Sub Test()
Dim purchaseOrderFileName As String
purchaseOrderFileName = Get_Purchase_Order("c:\folder1\folder2\attachment.pdf")
End Sub

Function Get_Purchase_Order(fileName As String) As String

Dim MDoc As MODI.Document
Dim MLayout As MODI.Layout
Dim MWord As MODI.Word
Dim OCRtext As String
Dim p1 As Long, p2 As Long

Set MDoc = New MODI.Document

MDoc.Create fileName
MDoc.Images(0).OCR

Set MLayout = MDoc.Images(0).Layout
OCRtext = ""
For Each MWord In MLayout.Words
OCRtext = OCRtext & " " & MWord.Text
Next
MDoc.Close False

Get_Purchase_Order = ""

p1 = InStr(OCRtext, "Purchase Order:")
If p1 > 0 Then
p1 = p1 + Len("Purchase Order:")
p2 = InStr(p1, OCRtext, "Job Number:")
If p2 > 0 Then Get_Purchase_Order = Mid(OCRtext, p1, p2 - p1)
End If

Set MLayout = Nothing
Set MDoc = Nothing

End Function


How can I get these two pieces of code to work with one another? I'm a programming newbie, so you probally laughing at me right now :P

Basically in a nut shell what im trying to achieve is saving a pdf attachment from outlook into a folder, then doing ocr on the saved image pdf file using modi (microsoft office document imaging) and then saving the now text readable pdf with a file name extracted from within the string of the file between the words Purchase Order: and Job No: calling the macro from within outlook using a script mailing rule. If you need a template of the image pdf file I can provide it.

Any help or pointers would be great.

Many Thanks

Nick

evanpan
03-03-2016, 12:07 AM
Hi, Nick.
Thanks for sharing those code. I will check it later and send you feedback soon.

SamT
03-03-2016, 09:38 AM
Why have you posted the same question in another thread in this forum?
Please post a link here to that thread so I can delete that post