View Full Version : Extracting the URLs from a tweet
I have cobbled together code which works but it's extremely slow and the PC sounds as if it wants to die when it runs it.
The objective is copy and paste, from a tweet hyperlink, the http URL and the pic URL to other columns.
For example, take a random tweet: "Secrets for crafting the perfect headline http://buff.ly/2dFmHJF pic.twitter.com/u8cE252Gur"
We would end up with the existing hyperlink in one column, http://buff.ly/2dFmHJF in another column, and "pic.twitter.com/u8cE252Gur" in a third column.
In addition to being slow and clumsy, my code also cannot discriminate between "pic.twitter" and "picture" or "Olympics", which means that a lot of the results are garbage.
Also the 1:300000 ranges should actually dynamically restrict the range to the last used row.
An added complication is that some tweets contain more than 1 URL. I had no idea how to tackle that when I did this code.
So I'd be really grateful for help in streamlining this.
'*************EXTRACT URLS FROM THE TWEET***********************************
'It seems we need to add an additional space before http and pic for the URLs to be captured intact
Dim Lrow As Long
Dim rngC As Range
With ActiveSheet
Lrow = .Cells(Rows.Count, 1).End(xlUp).Row
For Each rngC In .Range("A1:A" & Lrow)
rngC.Replace what:="http", replacement:=" http"
Next
End With
With ActiveSheet
Lrow = .Cells(Rows.Count, 1).End(xlUp).Row
For Each rngC In .Range("A1:A" & Lrow)
rngC.Replace what:="pic", replacement:=" pic"
Next
End With
'This extracts the URLs for http and pic addresses and places them in rows M and N. It then replaces the formulas with values.
Range("M1:M300000").FormulaR1C1 = "=RIGHT(C[-12],LEN(C[-12])-FIND("" http"",C[-12]))"
With Range("m1").CurrentRegion
.Cells.Copy
.Cells.PasteSpecial xlPasteValues
.Cells(1).Select
End With
Application.CutCopyMode = False
Range("N1:N300000").FormulaR1C1 = "=LEFT(C[-1],FIND("" "",C[-1]&"" "")-1)"
With Range("n1").CurrentRegion
.Cells.Copy
.Cells.PasteSpecial xlPasteValues
.Cells(1).Select
End With
Application.CutCopyMode = False
Range("O1:O300000").FormulaR1C1 = "=RIGHT(C[-14],LEN(C[-14])-FIND("" pic"",C[-14]))"
With Range("O1").CurrentRegion
.Cells.Copy
.Cells.PasteSpecial xlPasteValues
.Cells(1).Select
End With
Application.CutCopyMode = False
'This strips errors left over from the formulas
Cells.Replace "#VALUE!", "", xlWhole
'This reinserts the extra spaces put in before http and pic.
Range("A1:A300000").Replace what:=" http", _
replacement:="http", _
LookAt:=xlPart, _
SearchOrder:=xlByRows, _
MatchCase:=False, _
SearchFormat:=False, _
ReplaceFormat:=False
Range("A1:A300000").Replace what:=" pic", _
replacement:="pic", _
LookAt:=xlPart, _
SearchOrder:=xlByRows, _
MatchCase:=False, _
SearchFormat:=False, _
ReplaceFormat:=False
VBA Rules:
All http URLs start with "http://" and end with... the next space? "pic.twi"? What?
All Image URLs start with "pic.twitter.com/" and end with... what? The next Space? The next non-AlphaNumeric Character? :dunno
Thanks Sam. Yes, http URLs start with "http://" or https://* and usually end with the next space.
Image URLs start with "pic.twitter.com/" and usually end with the next space.
Except often they are at the end of the string (so no space or character afterwards).
Please post a sample of 20 elements.
You should use arrays to do the splitting: avoid worksheet interaction.
Please post a sample of 20 elements.
You should use arrays to do the splitting: avoid worksheet interaction.
A sample workbook attached (see sheet1).
Arrays sounds a great approach.
18091
In your file:
In the codemodule of sheet1:
Sub M_snb()
sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
If InStr(sn(j, 1), "http://") Then sn(j, 2) = "http://" & Split(Split(Replace(sn(j, 1), Chr(160), " "), "http://")(1))(0)
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = "pic.twitter." & Split(sn(j, 1), "pic.twitter.")(1)
Next
Cells(30, 1).Resize(UBound(sn), 3) = sn
End Sub
or
Sub M_snb()
Columns(1).Replace Chr(160), " "
Columns(1).Replace "pic.", " pic."
sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
If InStr(sn(j, 1), "http://") Then sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
Next
Cells(30, 1).Resize(UBound(sn), 3) = sn
End Sub
Thank you. I am getting Compile error - variable not defined at line:
sn = Cells(1).CurrentRegion.Resize(, 3).
How could that be fixed?
Thanks.
Dim sn As Variant
Dim j as Long
Dim sn
Dim j
Dim sn, j
Thanks, Sam - that's terrific.
How would the code need to change, so that instead of splitting the "http*" and "pic*), they were cut from the original text.
So the result would be:
#AmyGoodman of #DemocracyNow is facing riot charges in ND for a #FacebookLive post in Sept.
http://www.prnewsonline.com/democracy-now-facebook-live
pic.twitter.com/E8xWQ1mnc3
#antibullyingweek: We’ve joined forces w/ @ditchthelabel. Our interactive infographic looks at hate speech online
http://bit.ly/2fRo6uN
pic.twitter.com/2wPMjxfSn3
#Apple Anticipation: Buzz Building for Tomorrow’s Event—But Who's Buying?
http://buff.ly/2bWCFtL
pic.twitter.com/Qy7Ig8fg2X
#Apple Anticipation: Buzz Building for Tuesday's Event—But Who's Buying?
http://ln.is/bulldogreporter.com/TyJh7
#Apple maintains position as most relevant brand via @BulldogReporter:
http://ht.ly/WAIo305YBsw
#Apple responds to diversity criticism: "We had a Canadian" onstage at iPhone 7 event @MelanieHannah for @mic
http://ow.ly/j8aJ3049dIC
#Are We Social Fakes? Only 11% of Millennials Believe People Are Authentic on Social @Cynny … #PR #Publicrelations
http://ln.is/bulldogreporter.com/P5yXX
#ArtificialIntelligence-powered malware is coming - and it's going to be terrifying #AI #cybersecurity #IoT
http://read.bi/2cYOZym
pic.twitter.com/Hd2qTbYOK3
#Asia the best placed to drive e-commerce innovation @Accenture's Simon Eaves, @CampaignAsia
http://ow.ly/gwOb30429w1
pic.twitter.com/RTfneM3MJ6
#AskanAnalyst: See how @FR314 unleashes the power of Brandwatch Audiences -
http://brnw.ch/2dorGzv
pic.twitter.com/RtAum9R9FY
#AskAnExpert: @G_Price tells us what it takes to uncover real social insights -
http://brnw.ch/2d0HuDK
pic.twitter.com/FW4f9on37z
#AskAnExpert: @Kit_Smith has essential advice on how brands can expertly position themselves -
http://brnw.ch/2cNd89c
pic.twitter.com/27l87sIBG7
#AskAnExpert: @NathalieNahai looks at the psychology that drives social media behavior -
http://brnw.ch/2d73Ln3
pic.twitter.com/iQFapbkzMn
#AskAnExpert: Brand loyalty, we all want it but how do we get it? Use these tips to build a solid foundation -
http://brnw.ch/2e10ovN
pic.twitter.com/u78zPzzOvY
#AskAnExpert: Disconnects in data can be an effective way of identifying insights -
http://brnw.ch/2d0HuDK
pic.twitter.com/I0uedq2YgY
#AskAnExpert: How to guide your customers' perceptions of your brand with a brand positioning statement -
http://brnw.ch/2cNd89c
pic.twitter.com/4T69cmRcXq
#Australia first developed print market in world to kill audits–advertisers say nothing Tim Burrowes, @Mumbrellanews
http://mumbrel.la/2i2UgqC
#Australia hasn't had a recession in 25 years--what the rest of the world can learn @wef
http://wef.ch/2cSW3I1
pic.twitter.com/7s77EwMWX3
#AutumnStatement – Business should communicate its 'red lines' to Government now: @MarkHenryGlover @NewingtonComms
http://bit.ly/2fIF9y4
pic.twitter.com/HBuMHTuDfD
#AWCIC16 Cardiff Model for Reducing Violence using A&E Data link Via Prof Shepherd
http://bit.ly/2gkrSPq
pic.twitter.com/HOZDrMg3tY
#B2B Marketers See Customer Experience As An Exciting Opportunity. So What Are Their #CX Priorities?
http://bit.ly/2fwXkqU
#Baidu launches medical #chatbot to help Chinese doctors diagnose patients @jjvincent for @verge
http://ow.ly/3BOR305cjvE
#BehindTheHeadlines with @LauraBQuigley VP of #Comms @CMAphysicians: | #branding #PR #Media
http://bit.ly/2fz3KcP
pic.twitter.com/G73GNQgOhv
#BehindtheHeadlines With Nathan Friedman (@nathanf99)
http://bit.ly/2ffjxNL
pic.twitter.com/L743dIaGYV
Keep it simple: remove 'Option Explicit'.
Sub M_snb()
Columns(1).Replace Chr(160), " "
Columns(1).Replace "pic.", " pic."
sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
sn(j, 1)= trim(Split(sn(j, 1), "http://")(0))
sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
Next
Cells(30, 1).Resize(UBound(sn), 3) = sn
End Sub
Thanks snb, but when using the code in comment #10, I am getting an error: "Run time error 9 Subscript out of range", and the debug points to this line:
sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0).
In that case you'll have to adapt the code.
I have tried to reply to my own question in comment #9 (How would the code need to change, so that instead of splitting the "http*" and "pic*), they were cut from the original text) with this code, adapted from http://stackoverflow.com/questions/23806741/remove-value-in-one-cell-from-string-in-another-cell-vba, but I've got the ranges wrong (and maybe other things beside). Grateful for any help.
Option Explicit
Sub M_snb()
Dim sn As Variant
Dim j As Long
Columns(1).Replace Chr(160), " "
Columns(1).Replace "pic.", " pic."
sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
If InStr(sn(j, 1), "http") Then sn(j, 2) = Filter(Split(sn(j, 1)), "http")(0)
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
Next
Cells(30, 1).Resize(UBound(sn), 3) = sn
sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
Dim rLongString As Range
Dim rShortString As Range
Dim I As Long
Dim LastRow As Long
Set rLongString = Range("A30:A" & LastRow)
Set rShortString = Range("B30:C" & LastRow)
For I = 1 To rLongString.Count
rLongString(I).Replace what:=rShortString(I), replacement:="", lookat:=xlPart
Next I
Next
End Sub
Sub M_snb()
Columns(1).Replace Chr(160), " "
Columns(1).Replace "pic.", " pic."
sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
if instr(sn(j,1),"http://") then
sn(j, 1)= trim(Split(sn(j, 1), "http://")(0))
sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)
end if
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
Next
Cells(30, 1).Resize(UBound(sn), 3) = sn
End Sub
Sub M_snb()
Columns(1).Replace Chr(160), " "
Columns(1).Replace "pic.", " pic."
sn = Cells(1).CurrentRegion.Resize(, 3)
For j = 1 To UBound(sn)
if instr(sn(j,1),"http://") then
sn(j, 1)= trim(Split(sn(j, 1), "http://")(0))
sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)
end if
If InStr(sn(j, 1), "pic.twitter.") Then sn(j, 3) = Filter(Split(sn(j, 1)), "pic.twitter.")(0)
Next
Cells(30, 1).Resize(UBound(sn), 3) = sn
End Sub
Thanks snb. I am getting Runtime error 9: Subscript out of range at this line:
sn(j, 2) = Filter(Split(sn(j, 1)), "http://")(0)
Please could you advise a fix?
Not with the data in the file you posted.
Thanks snb. My error message won't go away. I've tried to source a cure, but it seems to be one of those errors with lots of possible causes.
I am on Excel 2010. Don't know if that makes a difference.
It has only got to do with the content of your data, not with the VBA code.
Powered by vBulletin® Version 4.2.5 Copyright © 2025 vBulletin Solutions Inc. All rights reserved.