PDA

View Full Version : Extracting the URLs from a tweet



1819
01-19-2017, 03:30 PM
I have cobbled together code which works but it's extremely slow and the PC sounds as if it wants to die when it runs it.

The objective is copy and paste, from a tweet hyperlink, the http URL and the pic URL to other columns.

For example, take a random tweet: "Secrets for crafting the perfect headline http://buff.ly/2dFmHJF pic.twitter.com/u8cE252Gur"

We would end up with the existing hyperlink in one column, http://buff.ly/2dFmHJF in another column, and "pic.twitter.com/u8cE252Gur" in a third column.

In addition to being slow and clumsy, my code also cannot discriminate between "pic.twitter" and "picture" or "Olympics", which means that a lot of the results are garbage.

Also the 1:300000 ranges should actually dynamically restrict the range to the last used row.

An added complication is that some tweets contain more than 1 URL. I had no idea how to tackle that when I did this code.

So I'd be really grateful for help in streamlining this.



'*************EXTRACT URLS FROM THE TWEET***********************************

'It seems we need to add an additional space before http and pic for the URLs to be captured intact

Dim Lrow As Long
Dim rngC As Range

With ActiveSheet
Lrow = .Cells(Rows.Count, 1).End(xlUp).Row
For Each rngC In .Range("A1:A" & Lrow)
rngC.Replace what:="http", replacement:=" http"
Next
End With

With ActiveSheet
Lrow = .Cells(Rows.Count, 1).End(xlUp).Row
For Each rngC In .Range("A1:A" & Lrow)
rngC.Replace what:="pic", replacement:=" pic"
Next
End With


'This extracts the URLs for http and pic addresses and places them in rows M and N. It then replaces the formulas with values.

Range("M1:M300000").FormulaR1C1 = "=RIGHT(C[-12],LEN(C[-12])-FIND("" http"",C[-12]))"

With Range("m1").CurrentRegion
.Cells.Copy
.Cells.PasteSpecial xlPasteValues
.Cells(1).Select
End With
Application.CutCopyMode = False

Range("N1:N300000").FormulaR1C1 = "=LEFT(C[-1],FIND("" "",C[-1]&"" "")-1)"

With Range("n1").CurrentRegion
.Cells.Copy
.Cells.PasteSpecial xlPasteValues
.Cells(1).Select
End With
Application.CutCopyMode = False

Range("O1:O300000").FormulaR1C1 = "=RIGHT(C[-14],LEN(C[-14])-FIND("" pic"",C[-14]))"

With Range("O1").CurrentRegion
.Cells.Copy
.Cells.PasteSpecial xlPasteValues
.Cells(1).Select
End With
Application.CutCopyMode = False

'This strips errors left over from the formulas

Cells.Replace "#VALUE!", "", xlWhole

'This reinserts the extra spaces put in before http and pic.

Range("A1:A300000").Replace what:=" http", _
replacement:="http", _
LookAt:=xlPart, _
SearchOrder:=xlByRows, _
MatchCase:=False, _
SearchFormat:=False, _
ReplaceFormat:=False

Range("A1:A300000").Replace what:=" pic", _
replacement:="pic", _
LookAt:=xlPart, _
SearchOrder:=xlByRows, _
MatchCase:=False, _
SearchFormat:=False, _
ReplaceFormat:=False

SamT
01-19-2017, 09:43 PM
VBA Rules:
All http URLs start with "http://" and end with... the next space? "pic.twi"? What?

All Image URLs start with "pic.twitter.com/" and end with... what? The next Space? The next non-AlphaNumeric Character? :dunno

1819
01-20-2017, 02:26 AM
Thanks Sam. Yes, http URLs start with "http://" or https://* and usually end with the next space.

Image URLs start with "pic.twitter.com/" and usually end with the next space.

Except often they are at the end of the string (so no space or character afterwards).

snb
01-20-2017, 05:15 AM
Please post a sample of 20 elements.
You should use arrays to do the splitting: avoid worksheet interaction.

1819
01-20-2017, 06:26 AM
Please post a sample of 20 elements.
You should use arrays to do the splitting: avoid worksheet interaction.

A sample workbook attached (see sheet1).

Arrays sounds a great approach.

18091

snb
01-20-2017, 08:32 AM
In your file:
In the codemodule of sheet1:


Sub M_snb()
sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
If InStr(sn(j, 1), "http://") Then sn(j, 2) = "http://" & Split(Split(Replace(sn(j, 1), Chr(160), " "), "http://")(1))(0)
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = "pic.twitter." & Split(sn(j, 1), "pic.twitter.")(1)
Next

Cells(30, 1).Resize(UBound(sn), 3) = sn
End Sub

or

Sub M_snb()
Columns(1).Replace Chr(160), " "
Columns(1).Replace "pic.", " pic."

sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
If InStr(sn(j, 1), "http://") Then sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
Next

Cells(30, 1).Resize(UBound(sn), 3) = sn
End Sub

1819
01-20-2017, 09:53 AM
Thank you. I am getting Compile error - variable not defined at line:
sn = Cells(1).CurrentRegion.Resize(, 3).

How could that be fixed?

Thanks.

SamT
01-20-2017, 10:48 AM
Dim sn As Variant
Dim j as Long

Dim sn
Dim j

Dim sn, j

1819
01-20-2017, 11:19 AM
Thanks, Sam - that's terrific.

How would the code need to change, so that instead of splitting the "http*" and "pic*), they were cut from the original text.

So the result would be:



#AmyGoodman of #DemocracyNow is facing riot charges in ND for a #FacebookLive post in Sept.
http://www.prnewsonline.com/democracy-now-facebook-live
pic.twitter.com/E8xWQ1mnc3


#antibullyingweek: We’ve joined forces w/ @ditchthelabel. Our interactive infographic looks at hate speech online
http://bit.ly/2fRo6uN
pic.twitter.com/2wPMjxfSn3


#Apple Anticipation: Buzz Building for Tomorrow’s Event—But Who's Buying?
http://buff.ly/2bWCFtL
pic.twitter.com/Qy7Ig8fg2X


#Apple Anticipation: Buzz Building for Tuesday's Event—But Who's Buying?
http://ln.is/bulldogreporter.com/TyJh7



#Apple maintains position as most relevant brand via @BulldogReporter:
http://ht.ly/WAIo305YBsw



#Apple responds to diversity criticism: "We had a Canadian" onstage at iPhone 7 event @MelanieHannah for @mic
http://ow.ly/j8aJ3049dIC



#Are We Social Fakes? Only 11% of Millennials Believe People Are Authentic on Social @Cynny … #PR #Publicrelations
http://ln.is/bulldogreporter.com/P5yXX



#ArtificialIntelligence-powered malware is coming - and it's going to be terrifying #AI #cybersecurity #IoT
http://read.bi/2cYOZym
pic.twitter.com/Hd2qTbYOK3


#Asia the best placed to drive e-commerce innovation @Accenture's Simon Eaves, @CampaignAsia
http://ow.ly/gwOb30429w1
pic.twitter.com/RTfneM3MJ6


#AskanAnalyst: See how @FR314 unleashes the power of Brandwatch Audiences -
http://brnw.ch/2dorGzv
pic.twitter.com/RtAum9R9FY


#AskAnExpert: @G_Price tells us what it takes to uncover real social insights -
http://brnw.ch/2d0HuDK
pic.twitter.com/FW4f9on37z


#AskAnExpert: @Kit_Smith has essential advice on how brands can expertly position themselves -
http://brnw.ch/2cNd89c
pic.twitter.com/27l87sIBG7


#AskAnExpert: @NathalieNahai looks at the psychology that drives social media behavior -
http://brnw.ch/2d73Ln3
pic.twitter.com/iQFapbkzMn


#AskAnExpert: Brand loyalty, we all want it but how do we get it? Use these tips to build a solid foundation -
http://brnw.ch/2e10ovN
pic.twitter.com/u78zPzzOvY


#AskAnExpert: Disconnects in data can be an effective way of identifying insights -
http://brnw.ch/2d0HuDK
pic.twitter.com/I0uedq2YgY


#AskAnExpert: How to guide your customers' perceptions of your brand with a brand positioning statement -
http://brnw.ch/2cNd89c
pic.twitter.com/4T69cmRcXq


#Australia first developed print market in world to kill audits–advertisers say nothing Tim Burrowes, @Mumbrellanews
http://mumbrel.la/2i2UgqC



#Australia hasn't had a recession in 25 years--what the rest of the world can learn @wef
http://wef.ch/2cSW3I1
pic.twitter.com/7s77EwMWX3


#AutumnStatement – Business should communicate its 'red lines' to Government now: @MarkHenryGlover @NewingtonComms
http://bit.ly/2fIF9y4
pic.twitter.com/HBuMHTuDfD


#AWCIC16 Cardiff Model for Reducing Violence using A&E Data link Via Prof Shepherd
http://bit.ly/2gkrSPq
pic.twitter.com/HOZDrMg3tY


#B2B Marketers See Customer Experience As An Exciting Opportunity. So What Are Their #CX Priorities?
http://bit.ly/2fwXkqU



#Baidu launches medical #chatbot to help Chinese doctors diagnose patients @jjvincent for @verge
http://ow.ly/3BOR305cjvE



#BehindTheHeadlines with @LauraBQuigley VP of #Comms @CMAphysicians: | #branding #PR #Media
http://bit.ly/2fz3KcP
pic.twitter.com/G73GNQgOhv


#BehindtheHeadlines With Nathan Friedman (@nathanf99)
http://bit.ly/2ffjxNL
pic.twitter.com/L743dIaGYV

snb
01-20-2017, 12:56 PM
Keep it simple: remove 'Option Explicit'.

Sub M_snb()
Columns(1).Replace Chr(160), " "
Columns(1).Replace "pic.", " pic."

sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
sn(j, 1)= trim(Split(sn(j, 1), "http://")(0))
sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
Next

Cells(30, 1).Resize(UBound(sn), 3) = sn
End Sub

snb
01-20-2017, 12:58 PM
see #10

1819
01-20-2017, 02:01 PM
Thanks snb, but when using the code in comment #10, I am getting an error: "Run time error 9 Subscript out of range", and the debug points to this line:
sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0).

snb
01-20-2017, 02:20 PM
In that case you'll have to adapt the code.

1819
01-20-2017, 02:24 PM
I have tried to reply to my own question in comment #9 (How would the code need to change, so that instead of splitting the "http*" and "pic*), they were cut from the original text) with this code, adapted from http://stackoverflow.com/questions/23806741/remove-value-in-one-cell-from-string-in-another-cell-vba, but I've got the ranges wrong (and maybe other things beside). Grateful for any help.



Option Explicit

Sub M_snb()

Dim sn As Variant
Dim j As Long

Columns(1).Replace Chr(160), " "
Columns(1).Replace "pic.", " pic."

sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
If InStr(sn(j, 1), "http") Then sn(j, 2) = Filter(Split(sn(j, 1)), "http")(0)
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
Next

Cells(30, 1).Resize(UBound(sn), 3) = sn

sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)

Dim rLongString As Range
Dim rShortString As Range
Dim I As Long
Dim LastRow As Long

Set rLongString = Range("A30:A" & LastRow)
Set rShortString = Range("B30:C" & LastRow)

For I = 1 To rLongString.Count
rLongString(I).Replace what:=rShortString(I), replacement:="", lookat:=xlPart
Next I
Next

End Sub

snb
01-21-2017, 05:37 AM
Sub M_snb()
Columns(1).Replace Chr(160), " "
Columns(1).Replace "pic.", " pic."

sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
if instr(sn(j,1),"http://") then
sn(j, 1)= trim(Split(sn(j, 1), "http://")(0))
sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)
end if
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
Next

Cells(30, 1).Resize(UBound(sn), 3) = sn
End Sub

1819
01-21-2017, 05:55 AM
Sub M_snb()
Columns(1).Replace Chr(160), " "
Columns(1).Replace "pic.", " pic."

sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
if instr(sn(j,1),"http://") then
sn(j, 1)= trim(Split(sn(j, 1), "http://")(0))
sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)
end if
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
Next

Cells(30, 1).Resize(UBound(sn), 3) = sn
End Sub

Thanks snb. I am getting Runtime error 9: Subscript out of range at this line:

sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)

Please could you advise a fix?

snb
01-21-2017, 06:41 AM
Not with the data in the file you posted.

1819
01-21-2017, 07:57 AM
Thanks snb. My error message won't go away. I've tried to source a cure, but it seems to be one of those errors with lots of possible causes.

I am on Excel 2010. Don't know if that makes a difference.

snb
01-21-2017, 08:11 AM
It has only got to do with the content of your data, not with the VBA code.