PDA

View Full Version : VBA code to find fraudulent activity in data



mleblanc001
10-14-2016, 06:58 AM
Hi All you VBA experts,
It's a pleasure to be part of an elite VBA programming group. I've been creating some code with the help of Forums like this one but after doing some research, I found that this Forum is the elite of the elite, so I'm trying it out for the first time. I don't know if this is the best place to post my request for help and if it's not please let me know. I've been trying to find where to post my VBA question but couldn't find out any other way to do it and if this isn't the way to do it, if someone can tell me how, I would greatly appreciate it. Here's my question. I have an excel sheet with about 125000 lines in it. I would like a VBA code to search the sheet through a set of criteria to see if data matches the criteria and post the results on another sheet. I've included the test file for your review and the Criteria is as follows:


1) Same "Employee Numb" used with different name / spelling in same "Emp Code"


2) Same "Employee Numb" used with different date of birth in same "Emp Code"


3) Same "Employee Numb" used with different name / spelling in more than one "Emp Code"


4) Same "Employee Numb" used with different date of birth in more than one "Emp Code"


5) For all "Employee Numb" that worked as Cook AND had hours on Waitress or Waiters


6a) For all "Employee Numb" and Names that worked multiple positions on same "Trans Date" in same "Emp Code".


6b). Same as 7a but in multiple "Emp Code".


7). List of all "Emp Code" with people with same last name


8) List of all "Emp Code" with multiple people at same address


9) People at same address who worked in multiple "Emp Code".


10) People in other "position desc" other than Chef listed at same address as the Chef in an "Emp Code"



Thank you so much in advance for your help in this issue.:)

SamT
10-14-2016, 08:59 AM
VBA code to search the sheet through a set of criteria to see if data matches the criteria and post the results on another sheet.

How should the results be formatted? IMO, a different report sheet for each "test."

However, it would be much faster if the data were sorted differently for each test to make the test itself faster.

Where and how will the corrections be made? The reports must facilitate that process. That might be a different process for different tests.

In any case, it will take some time for the complete analysis.



This looks like a Database output. Depending. . . If so, it might be better to have the database perform the analysis.



@ All,
Any IDEAS?

snb
10-14-2016, 09:51 AM
Sub M_snb()
Sheet1.Columns(30).ClearContents
Sheet1.Columns(3).AdvancedFilter 2, , Sheet1.Cells(1, 30), True

sn = Sheet1.Cells(1, 30).CurrentRegion

Sheet1.Range("B1,F1:J1,L1:N1").EntireColumn.Hidden = True
With Cells(1).CurrentRegion

For j = 2 To UBound(sn)
.AutoFilter 3, sn(j, 1)
If .Columns(1).SpecialCells(12).Count > 2 Then Stop
.AutoFilter
Next
End With
End Sub

mleblanc001
10-14-2016, 11:54 AM
Wow, this is way beyond my knowledge of VBA and it works great. I do have other programming languages under my belt, C++, DBASE III and COBOL but I'm new to VBA. I think I understand part of it, such as the clear contents of 30 columns and I believe that the current region means the array of cells between column 1 to 30 and I understand the hidden cells but I don't understand much of the rest. Would it be possible to REM Comments on the For/Next loop such as what is the For j = 2 To UBound(sn), SpecialCells.Count >2. Thank you so much for your Guru help, it is greatly appreciated.

SamT
10-14-2016, 04:38 PM
Columns(Index number).Clear 'Range("AD:AD")

Sheet1.Cells(Row number, Column Index) 'Range("AD1")

CurrentRegion: A Block of cells bounded by empty cells. An island of non-empty cells.

Set sn to a memory array of those Range Objects

Note:

sn = Sheet1.Cells(1, 30).CurrentRegion .Value
Set sn = an Array of the Values in the cells

Depending on the Option Base declaration at the top of the module, Arrays index from either zero or 1

LBound(Array) is therefore zero or 1 and Ubound(array) is either (Array count - one) or Array Count where the count is of the first dimension.


If .Columns(1).SpecialCells(12).Count > 2
In the VBIDE, press F2 and Search for "SpecialCells". Under Range, see Function SpecialCells(Type As XlCellType, [Value]) As Range. At the bottom, doubleClick XLCellType and browse the constants to see which is "12".

XLCellType is an Enumeration



There is controversy whether to use the Value, (12) or the Constant ("xlCellTypeVisible.")

MS Promises that if the Value ever changes, the Constant will still work.
vs
Across MS Office Applications, the Value, if used, is the same, but the Constant Name will vary.

While in the Object Browser, (F2,) In the Upper left dropdown, Select "Excel." In the lower Left Pane, scroll down till you see the Enumerations. (The Icon is two yellow squares with lines.) Click on an Enumeration to see all the Constants it contains. Click any Constant to see its Value below.

While in the VBIDE, see Tools >> Options for helpful Programming settings.

snb
10-15-2016, 03:57 AM
But if you have no clue whatsoever of VBA you'd better adapt your ambitions.
It's never a good idea to use VBA (or any other) code you do not fully comprehend/master.

mleblanc001
10-15-2016, 04:52 AM
Hi snb,
It's not that I don't have a clue about VBA, I do. I have created some pretty good vba scripts, I definitely don't have the expertise you have and that's why I'm practising to increase my expertise level. As the saying goes, practice make perfect. I truly appreciate your help on this. Thanks again.

mleblanc001
10-15-2016, 04:59 AM
Hi SamT, Thank you for your explanation of the code. I'll review this information to better understand how the code is reacting. Greatly appreciated. :-)

mleblanc001
10-17-2016, 11:21 AM
Is it possible to paste the filtered results (only) to the next empty row in another worksheet as it goes through the For / Next loop? I've given it a try but it seem to paste all the data and not the filtered data. Any help on this would be greatly appreciated. Thank you.

mleblanc001
10-18-2016, 05:53 AM
Is it possible to paste the filtered results (only) to the next empty row in another worksheet as it goes through the For / Next loop? I've given it a try but it seem to paste all the data and not the filtered data only. This is what I have come up with but it copies all data rather than copy only the filtered data. Any help on this would be greatly appreciated. Thank you.


Sub filter()
Sheet2.Columns(30).ClearContents
Sheet2.Columns(3).AdvancedFilter 2, , Sheet2.Cells(1, 30), True
sn = Sheet2.Cells(1, 30).CurrentRegion
Rem hide columns
'Sheet1.Range("B1,F1:J1,L1:N1").EntireColumn.Hidden = False
With Cells(1).CurrentRegion
For j = 2 To UBound(sn)
.Autofilter 3, sn(j, 1)
If .Columns(1).SpecialCells(12).Count > 2 Then MsgBox "Review then press OK"
.Autofilter
'This is what I have come up with to copy and paste to separate worksheet
'but it copies all data.
Application.ScreenUpdating = False
With Sheets("Result")
Rem Sheet2.Cells(1, 30).CurrentRegion.Copy
Rem Intersect(.UsedRange, .UsedRange.Offset(1)).SpecialCells(12).Copy
Intersect(.UsedRange, .UsedRange.Offset(1)).SpecialCells(xlCellTypeVisible).Copy
Sheets("Data").Cells(Rows.Count, 1).End(xlUp).Offset(1).PasteSpecial xlPasteValues
Sheets("Result").AutoFilterMode = False
Application.ScreenUpdating = True
End With
Next
End With
End Sub

Paul_Hossler
10-18-2016, 08:04 AM
I was thinking along different lines

My half done macro generates a Summary sheet like this

17359


Before I got too far along, I thought I'd see if it was worth contnuing

I'm a little fuzzy on some of the more complicated critera, but the basic concept seems to work OK


17358

mleblanc001
10-18-2016, 09:41 AM
Hi Paul,
I don't see a macro in your attached file, maybe I'm not looking at the right place. However, your criteria and summary sheets are on the right track. My actual data is 100,000+ rows long which is why I need a macro which will syphon through all that data and give me the desired results. The code that snb gave me works perfectly, however, I would like to copy the results of the filter to another worksheet. I would also like to put in some business rules such as if they are cooks they can't work hours as waiters, etc. but first I must resolve the copying to another worksheet first. It looks to me like you are building a business rules macro which is ok with me, it would be easier to add or remove business rules that way. Your help with this would be greatly appreciated and thank you for taking the time to look at this for me. :-)

Paul_Hossler
10-18-2016, 09:47 AM
17364

1. The macro is in Module1

2. I think that your business rules are more than a filter can handle

SamT
10-18-2016, 11:42 AM
This is just my opinion:

With 125K lines or around 30K employees, you need to start from scratch and design a system that can handle that load.

I don't know if the small example you provided at first is truly representative of your actual sheet, but . . . there is no relation between Worker_ID and Emp_Code and any other Field (Header) on the sheet. For that reason, I must ignore them in this essay.

This means that Emp_Numb, which is reasonably unique, is the Primary Key for all other tables. This is the first discrepancy that must be repaired, (in house,) before any other work can be efficiently done.

Once we can assume that each employee has only one unique Emp_Numb, all the rest is very straight forward coding.

While you have been concentrating on the data, I have been looking at the Table Structure. Below is both an analysis and a suggestion for a different Workbook Data Base structure.



Emp_Fields


POSition_Code
position_desc

Transaction Table Fields

PAYable_code
payable_t4


Emp_Numb


Emp_Numb

trans_date

1
15


emp_code

M

Manager

Emp_Numb

2
22.5


worker_id

B
Baker

data_type

3
30


last_name

C
Cook

total_hrs

4
34


first_name

SC
Second Cook

POS_Code


5
50


DOB

W
Waiter

PAY_Code

6
210


address

--
Waitress




7
238


city

D
Dishwasher



8
315


prov_code










PAY RATE






















Basically, four different Tables or Sheets, of which only the Transaction Table sees much movement.

Since the example book you gave was populated by a Database Query, Why don't you fix the Database, or at least fix the Query?

mleblanc001
10-18-2016, 12:35 PM
I truly appreciate your opinion and yes the Emp_Numb is unique but repeats for every transaction. I was able to adapt your script to the data I have and it worked for the most part and your code was much easier to understand than the previous code. I'm fairly new at VBA coding but I think I can now take a crack at completing the rest of the criterion. Thank you so much for helping me on this, it will be very useful. Best regards

SamT
10-18-2016, 02:59 PM
yes the Emp_Numb is unique but repeats for every transaction.
That is why the Transactions need to be in a separate table. For every Transaction, The combination of Trans Date, Emp Number, and Pos Desc is unique.

While a Data Structure like the suggested one looks cumbersome to the human eye, it is very easy to use with any code or Excel analysis function. It is also very easy to use Excel's DB style Code on.

All data that code must work with needs to be well Structured.

mleblanc001
10-19-2016, 06:38 AM
Actually, the data I received were created from SQL code (I didn't write the code) applied against system tables see below. Are you saying that I could use VBA code against the system tables to do the same thing that the SQL code would do?



SELECT


b.ed_code, b.worker_id, b.sin_no, a.sin_dup_count, b.last_name, b.first_name,


b.apt_no,b.address,b.city,b.prov_code, b.postal_code, b.birth_date,


c.data_type,c.position_id,c.position_desc,c.trans_date,c.total_hrs,c.payabl e_t4,c.kilometers,c.payable_expenses,c.payable_expenses_oth,


fieldhq_ind


FROM


( select sin_no, count(distinct ed_code) sin_dup_count


from worker


where sin_no <> 0


group by sin_no


HAVING COUNT(distinct ed_code) > 1 ) a,


worker b,


(


select


ed_code,worker_id,data_type,position_id,position_desc,trans_date,total_hrs, payable_t4,kilometers,payable_expenses,payable_expenses_oth, fieldhq_ind


from


( SELECT


e.ed_code, e.worker_id, e.position_id, f.desc_e position_desc, CASE WHEN temp_query.requisition_paid >= '2015-11-27 13:39:10.383' THEN 1 ELSE 0 END as fieldhq_ind,


temp_query.data_type, temp_query.trans_date, temp_query.total_hrs, temp_query.payable_t4, temp_query.kilometers, temp_query.payable_expenses, temp_query.payable_expenses_oth


FROM


( SELECT ed_code, worker_id, position_id, '1-Timesheet' data_type, worked_date trans_date, id, total_hrs, payable payable_t4, NULL kilometers, NULL payable_expenses, NULL payable_expenses_oth, requisition_paid


FROM timesheets WHERE requisition_paid IS NOT NULL


UNION ALL


SELECT ed_code, worker_id, position_id, '2-Training' data_type, training_date trans_date, 1, 2.0 total_hrs, payable payable_t4, NULL kilometers, NULL payable_expenses, NULL payable_expenses_oth, requisition_paid


FROM positions_worked pw WHERE attended_training ='Y' and requisition_paid IS NOT NULL


UNION ALL


SELECT ed_code, worker_id, position_id, '3-Expenses (Kilometers)' data_type, expense_date trans_date, id, NULL total_hrs, NULL payable_t4, kilometers, payable payable_expenses, NULL payable_expenses_oth, requisition_paid


FROM expenses WHERE requisition_paid IS NOT NULL AND tariff_id IN (41,50)


UNION ALL


SELECT ed_code, worker_id, position_id, '4-Other Expenses (Cell Phone)' data_type, expense_date trans_date, id, NULL total_hrs, NULL payable_t4, NULL kilometers,NULL payable_expenses, payable payable_expenses_oth, requisition_paid


FROM expenses WHERE requisition_paid IS NOT NULL AND tariff_id NOT IN (41,50)


) temp_query,


positions_worked e, positions f


WHERE e.ed_code = temp_query.ed_code


AND e.worker_id = temp_query.worker_id


AND e.position_id = temp_query.position_id


AND e.position_id = f.position_id


) cc


) c


WHERE a.sin_no = b.sin_no


AND b.ed_code *= c.ed_code


AND b.worker_id *= c.worker_id


ORDER BY b.sin_no, b.ed_code, b.worker_id, c.data_type,c.position_id,c.trans_date;

SamT
10-19-2016, 09:38 AM
VBA for Excel can talk directly to the database using a Structured Query Language. It can put the SQL results directly into a worksheet.

The problem for us here at VBAX is that we don't know why you are trying to do all this. If I had to guess, I would say that you are in the HR Dept and want to clean up the Employee files and records. OR, you are in the Accounting Dept and need to accurately calculate payroll. OR, you are management and see the need for efficiency. As a last guess, you are performing an audit. Or even, all the above.

Without that knowledge, none of us can suggest the best way to do what you really need to do.

I don't speak SQL, but the most basic SQL Query is:
SELECT, (or return,) certain Records, (or certain Fields in Records,) WHERE certain Fields meet certain values, FROM certain Tables.

The SQL Query you posted has very complex "certains", but that is all it does.

When using Excel to retrieve Data from a DB, think of the Workbook and its Sheets as Reports, or summaries. Then design a Workbook for the Reported Information you need to see, with each grouping of information on its own Worksheet, as per my suggestion in post #14, with at least two sheets, Employee Info and Transactions.

That workbook will be your ultimate goal. In the meantime, you need to find all the discrepancies as listed in your post # 1. Again design a Workbook with each type of discrepancy listing on a separate sheet. For example



Headers ->
Same EMP_Numb
Last Name
First Name
Emp_Code


Table and Field ->
b.sin_no
b.last_name
b.first_name
b.ed_code





Headers ->
Last Name
First Name
Multiple EMP_Numb


Table and Field ->
b.last_name
b.first_name
b.sin_no





Headers ->
Same EMP_Numb
DOB
Emp_Code


Table and Field ->
b.sin_no
b.birth_date
b.ed_code





Headers ->
Last Name
First Name
Emp_Numb
Cook & Waiter/Waitress


Table and Field ->
b.last_name
b.first_name
b.sin_no
c.position_desc



Having great experience straightening out Data messes, I can tell you that it is a step by step process with decisions on direction after each step.

When you have thought about all this, we can close this thread and you can open a new thread in the SQL forum (http://www.vbaexpress.com/forum/forumdisplay.php?94-SQL), and provide a link there back to this thread so our SQL experts can have the background.

If you look below, you can see that 17 other members are following this thread and I recognize a couple of real SQL experts there. :) They are just waiting for a well defined problem to solve.

mleblanc001
10-19-2016, 10:19 AM
You are close about what my background is. I'm a business/financial analyst with some education both in accounting and programming languages so I try to make things easier by using Excel to its fullest. Right now, I have a business analysis contract and have this data analysis to do with a great deal of data (approx. 125000 lines) to review. The information is sensitive(secret level) which is why I can't divulge the real data. I really appreciate your Forum as it gives me ideas on how to resolve problems. Wow the power of the electronic superhighway.:)

As they say a little knowledge can be dangerous.:) I started by doing simple formulas in Excel to eventually complex formulas. Now I'm intrigued by the power of VBA in Excel. The problem with VBA is that part of the coding is not simple English statements which makes it harder to understand where it's going. For example, I'm having a problem right now understanding the "cells.(iwork, 18.value)". I know that the dimension (DIM) has been defined as long but what is the 18. I believe that cells. is a range object and the iwork may refer to row count and 18 may refer to column count but I'm now sure. I've tried googling it but "iwork" is coming up as apple information. Could you shed some light?

At some point, I will probably look into some SQL Forums to play around and attempt to understand that language as well. For now, it's all VBA. :)

SamT
10-19-2016, 11:01 AM
iwork has to be a variable. 18.Value? :dunno

I put together a small KB on Databases for you

It may have lost some minor details, as I had to delete a lot of javascript and decorative images to get it small enough to upload.



Right now, I have a business analysis contract and have this data analysis to do with a great deal of data (approx. 125000 lines) to review.

Sometimes it is still easier to do some things by hand.
Example:
Delete all columns but Emp_Numb, and first and last names.
Filter for unique Emp_Numbs. Be sure to not filter in place. Pick a different column.
Sort Uniques by Last > First > Emp_Numb.
With VBA, separate out the Duplicate names and Write a sub to replace all 'wrong' Numbs in the master file.
Ditto with misspelled Names.

The result is a master file with all unique names and Emp_Numbs. From here on, always use the Emp_Numb in future work. It is the Primary Key.

At this point I would create a sub-master filtered on unique Emp_numbs, with no Transaction Columns.

With a copy of the sub-master, delete all columns but Emp_Nmb and Address. Sort on Address. Write a VBA Script to remove all lines with unique addresses.

Repeat this process with other criteria as needed, producing several files or Sheets with different Reports.

While I would only use Emp_numb as a Key and would add back the other info in the final report to the client, it may be easier to leave the name columns in the working tables.

mleblanc001
10-19-2016, 12:13 PM
Wow, thank you so much for taking the time to communicate with me and give me all this great information. I really truly appreciate it. I'll take a look at the attachment. Thank you

mleblanc001
10-19-2016, 12:20 PM
Hi SamT
I tried to open the zipped file but it doesn't work and when I view the information, it's not legible. Do I have to open it with a specific software? :dunno

Please advise

SamT
10-19-2016, 02:19 PM
I didn't think it would matter, but the original extension was .7z, created with 7-Zip.

I copied them out my own Knowledge base, but I can find the web links if you need them.

mleblanc001
10-19-2016, 06:32 PM
You've do so much, I can't ask you to find the web links but thank you for offering.

SamT
10-20-2016, 06:48 AM
Not exhaustive, but a decent introduction to the terminology.

Database design - Wikipedia, the free encyclopedia (https://en.wikipedia.org/wiki/Database_design)
Database design basics - Access (https://support.office.com/en-us/article/Database-design-basics-EB2159CF-1E30-401A-8084-BD4F9C9CA1F5)
Introduction to database design | Normalized Data Model (http://www.datanamic.com/support/lt-dez005-introduction-db-modeling.html)
Ten Common Database Design Mistakes - Simple Talk (https://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/)

mleblanc001
10-20-2016, 02:41 PM
Thank you very much SamT. Greatly appreciated.