HWZ Forums

Login Register FAQ Mark Forums Read

Python and web scraping

Like Tree4Likes
  • 1 Post By u0206397
  • 2 Post By davidktw
  • 1 Post By tangent314
Reply
 
LinkBack Thread Tools
Old 23-05-2017, 10:06 PM   #1
Senior Member
 
Join Date: Jul 2009
Posts: 761
Python and web scraping

Why do most people choose to use Python when doing web scrapping?

In comparison, web scrapping using Java, C#, C++, PHP are practically unheard of, or much lesser.

Perl may have some web scrapping stuffs but still not as popular as Python from my casual observation.
~Dragonite~ likes this.
u0206397 is offline   Reply With Quote
Old 23-05-2017, 10:44 PM   #2
Supremacy Member
 
davidktw's Avatar
 
Join Date: Apr 2010
Posts: 9,699
Why do most people choose to use Python when doing web scrapping?

In comparison, web scrapping using Java, C#, C++, PHP are practically unheard of, or much lesser.

Perl may have some web scrapping stuffs but still not as popular as Python from my casual observation.
Java and C/C++ will be tedious for web scrapping because the need to compile for every change to the site which is not responsive enough. Scripting languages will play a more responsive approach to web scrapping.

Well in my opinion, Perl makes one of the best scrapping tool because of its powerful regex and highly dynamic capabilities with a large CPAN tools, unfortunately it is not one of the easiest tool to master unless you put in good effort to learn it.

Python is picking up with ease of usage and recent movement of it in the community, so it gets more exposure. I suppose the reason of it being more popular is not because of its web scrapping capabilities, but it's analysis libraries availability. Some uses of scrapping from the web goes to data analysis such as sentiments analysis for some specific usage, in this case python does have its advantage and why not just combine both jobs under the same tool?

If you are really into webscrapping, I will say node.js is a very good fit, because of its capability to immediately interpret and execute the scripts. There are also a couple of really good headless browser libraries like phantomjs, slimerjs which can allow dynamic loading of the site and then extract information dynamically from the web page.
ashhong and ~Dragonite~ like this.

Last edited by davidktw; 24-05-2017 at 06:03 AM..
davidktw is offline   Reply With Quote
Old 11-06-2017, 06:56 PM   #3
Moderator
 
tangent314's Avatar
 
Join Date: Jul 2002
Posts: 3,919
For web scrapping, a dynamically typed language makes handling json requests and response A LOT easier, so that eliminates golang, Java, C, C++ and C#.

Error handling in node.js is a complete joke, and so is the callback hell in its async model. PHP was designed for embedding a web app into html, sure you can try to do scrapping with it by bringing in libcurl, but it will still be a painful experience. Perl works, but these days python does everything perl can do in a simpler manner and everyone's gyrating towards there.

Scrapping is just a joy to do with python. The requests library is easy to use, there is beautifulsoup if you need to play with forms, regex is available if you need to scrape html instead of json, and for advanced stuff where you need to simulate user inputs there is selenium. And to run multiple concurrent requests, use gevent with monkeypatching which makes all the async handling transparent without having to go through node.js callback hell.
~Dragonite~ likes this.
tangent314 is offline   Reply With Quote
Old 19-06-2018, 07:38 PM   #4
Junior Member
 
Join Date: Feb 2017
Posts: 73
Why do most people choose to use Python when doing web scrapping?

In comparison, web scrapping using Java, C#, C++, PHP are practically unheard of, or much lesser.

Perl may have some web scrapping stuffs but still not as popular as Python from my casual observation.
it seems that python is much preferred it is relatively easy to set up threads, as well having interesting libraries that make web scraping fast and reliable.
albertchan659 is offline   Reply With Quote
Old 20-08-2018, 10:23 PM   #5
Junior Member
 
Alibarbar's Avatar
 
Join Date: Aug 2018
Posts: 18
beautiful soup ftw!! haha
Alibarbar is offline   Reply With Quote
Old 15-02-2019, 10:17 PM   #6
Junior Member
 
Join Date: Jan 2019
Posts: 16
Why do most people choose to use Python when doing web scrapping?

In comparison, web scrapping using Java, C#, C++, PHP are practically unheard of, or much lesser.
Simply because Scrapy is prevalent in the web scraping community. There is not much open source library for scraping that is robust and easy to use in other languages.
imgroot is offline   Reply With Quote
Reply
Important Forum Advisory Note
This forum is moderated by volunteer moderators who will react only to members' feedback on posts. Moderators are not employees or representatives of HWZ. Forum members and moderators are responsible for their own posts.

Please refer to our Terms of Service for more information.


Thread Tools

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are On