perl crawler on javascript website

peterchan75

Supremacy Member
Joined
Apr 26, 2003
Messages
6,719
Reaction score
529
Hi All,

My current setup is having perl to fire up the Excel and run the macro with IE to download a text file and perl to parse the text file. I would like to everything in perl.
1. Use Mechanize PhantomJS. Is this the correct path ? IE or Firefox mechanize seem outdated.
2. Having trouble finding the download button.
This is the css selector.
widget-research-and-reports-download.website-template-widget:nth-child(4) > div:nth-child(2) > button:nth-child(3)
How to put the css selector into this statement below.
my @button = $mech->selector();
$mech->click($button[0]);
I tried various css selector but the error message said element is not visible on the page.
3. After I get past the click download, how do I go about handling the saving of the file.

Thanks in advance.

Update:
No go!
Using PhantomJS 2.1.1.
I couldn't take a screen capture of the website.
Screen capture says current browser not supported.
 
Last edited:

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,301
Hi All,

My current setup is having perl to fire up the Excel and run the macro with IE to download a text file and perl to parse the text file. I would like to everything in perl.
1. Use Mechanize PhantomJS. Is this the correct path ? IE or Firefox mechanize seem outdated.
2. Having trouble finding the download button.
This is the css selector.
widget-research-and-reports-download.website-template-widget:nth-child(4) > div:nth-child(2) > button:nth-child(3)
How to put the css selector into this statement below.
my @button = $mech->selector();
$mech->click($button[0]);
I tried various css selector but the error message said element is not visible on the page.
3. After I get past the click download, how do I go about handling the saving of the file.

Thanks in advance.

Update:
No go!
Using PhantomJS 2.1.1.
I couldn't take a screen capture of the website.
Screen capture says current browser not supported.

I suggest you try the chrome approach https://metacpan.org/pod/WWW::Mechanize::Chrome

Using headless approach to a more popular browser will give you better compatibility parsing and interpreting web pages.

As for your other doubts, will wait till i have access to a full fledge system before i can further help, provided you or other person have not provided solutions :)

Not visible may require you to scroll the page for the element to be shown in the viewport.
 
Last edited:

peterchan75

Supremacy Member
Joined
Apr 26, 2003
Messages
6,719
Reaction score
529
Hi davidktw,
Thanks for your time.
This website that I am planning to automate is a hard one.
I am looking at Selenium::Remote::Driver and this module seems recent. It has module for all browser. I am using Activestate Perl and ppm only can install some old driver. It hard to get newer CPAN modules into Activestate.

What is your thought on Strawberry Perl? Any issue with module installation ? Thanks.
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,547
Reaction score
1,301
Hi davidktw,
Thanks for your time.
This website that I am planning to automate is a hard one.
I am looking at Selenium::Remote::Driver and this module seems recent. It has module for all browser. I am using Activestate Perl and ppm only can install some old driver. It hard to get newer CPAN modules into Activestate.

What is your thought on Strawberry Perl? Any issue with module installation ? Thanks.

Hi peter, I don’t work extensively with perl installation in Windows, so I can’t tell you which distro has better packages management.

Given now that one can run linux subsystems within Windows 10, why don’t you give it a try. It is actually more native in Unix for Perl. That is where it is born :)

I think you can adopt a more flexible stance using both Node and Perl together. Javascript environment may be more fluid to host both backend and web drivers to the browser’s environment
 
Important Forum Advisory Note
This forum is moderated by volunteer moderators who will react only to members' feedback on posts. Moderators are not employees or representatives of HWZ Forums. Forum members and moderators are responsible for their own posts. Please refer to our Community Guidelines and Standards and Terms and Conditions for more information.
Top