Consulting

Results 1 to 4 of 4

Thread: Extract author and date from web page

  1. #1

    Extract author and date from web page

    Using VBA, how do you extract author and date information from a news URL. For example, in column A I have a list of URLs and I want to extract and paste author name of these news articles to column B (adjacent to each article), and date in Column C. Sample URLs are as follows:

    Column A
    https://www.latimes.com/california/s...pped-raped-her
    https://www.latimes.com/world-nation...ts-to-migrants
    https://www.nytimes.com/2019/04/04/w...a-burundi.html
    https://www.aljazeera.com/news/2018/...062552098.html

    Column B
    Alejandra Reyes-Velarde
    Patrick J. McDonnell
    Milan Schreuer
    None

    Column C
    09/16/2019
    09/02/2019
    04/04/2019
    11/21/2018

    Any help would be appreciated

  2. #2

    Powershell

    Hi, try this in PSh
     $userAgent = "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"  $url = "https://www.latimes.com/california/story/2019-09-16/traffic-stop-passenger-tells-deputies-driver-kidnapped-raped-her"  $Ret = (iwr $url) $Ret.statuscode $P1 = $Ret.content.indexof('email') $Author = $Ret.Content.Substring($P1,200) $Author

  3. #3
    a little improvement:

    $userAgent = "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"  
    $url = "https://www.latimes.com/california/story/2019-09-16/traffic-stop-passenger-tells-deputies-driver-kidnapped-raped-her"  
    $Ret = (iwr $url) 
    $Ret.statuscode 
    $P1 = $Ret.content.indexof('"author":[') 
    $P2 = $Ret.content.indexof('}',$P1)
    $Author = $Ret.Content.Substring($P1+10,$P2-$P1-9)
    $Author

  4. #4
    I'm not very familiar with this language. Can anyone provide something that is simple?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •