What Is Web Scraping and How Could It Put Passwords and Other Personal Data at Risk?

Information is extremely valuable nowadays, and while some come up with ways to gather it legally, there are always cybercriminals who employ those methods to obtain sensitive information that is supposed to be out of their reach. One of the most popular ways to collect data from the Internet is called web scrapping. If the term does not sound familiar, we encourage you to read the rest of this blog post and find out more about what it is and what the security risks of web scraping are.

Table of Contents

What is web scraping?

Web scraping can be confused with web crawling, but in reality, they are entirely different things. According to Techopedia.com, web scraping is a term used to describe the variety of methods used for collecting information from the Internet. It also says, the data collection process is usually done while using software that gathers various pieces of information from different websites by simulating user browsing behavior. In other words, it can make it possible to mine lots of different pieces of data, such as weather reports or market pricing from various websites. However, due to the potential security risks of web scraping, it is considered to be a controversial method. As for the so-called web crawling, it is when all of the targeted web pages' data is being downloaded automatically. During this process, applications called crawlers locate and follow all hyperlinks the collected websites contain. This method is often used for creating an easily searchable index or a database which, for example, could be used to build a search engine.

How and why companies employ web scraping?

Web scraping might be used in marketing, financing, and many other industries that may have a need to analyze data from different web pages. Such as information related to the industry or data associated with the company's clients, competitors, and so on. In most cases, the information the organization seeks to obtain is gathered with the help of specific applications called web scrapers. Collecting data this way is more efficient. For instance, if you would visit the websites containing information you are interested in and copy and paste the needed data manually, it would take a lot of time. Not to mention, there is a possibility some of the information could be accidentally misplaced and the collected information might become inaccurate. Thus, it is easier to employ a dedicated web scraper that will do all the work for its user. There are free, open-source, and commercial web scrapers, for example, probably one of the most recommended free tools is Octoparse.

Could hackers use web scraping to obtain sensitive data?

Unfortunately, as handy as these data-collecting methods might appear to be, computer security specialists claim there are possible security risks of web scraping. It would seem cybercriminals might abuse this method for malicious purposes, for example, it could be employed to dramatically increase the load on the targeted web page. As you see, the used web scraper may send much more requests than an actual user typically would, and as a result, the cybercriminals might be able to carry out a denial-of-service attack. Besides, hackers might design web scrapers that could bypass targeted websites' security measures to obtain sensitive data that is supposed to be kept in secret.

One of the most recent examples of how web scrapping can compromise user data is the Facebook data scandal related to Cambridge Analytica. It started when 270,000 users downloaded an application named "This Is Your Digital Life." Apparently, by using the Facebook's login feature, these users revealed not only their own private information but also data about their friends. As a result, the application's web scrapers managed to gather information from around 87 million social media users. Even though it violated the Facebook's terms of service, later on, this collection of data was shared with Cambridge Analytica.

Sadly, this was not the only scandal as the company announced unpleasant news again in April. This time, the Facebook's team learned cybercriminals might have obtained data from two billion users via web scraping. The problem was that hackers used the social media's user search feature to identify individuals and even gather data from their profiles, for example, user's full name, profile photograph, workplace, date of birth, and other information they chose to make public or were requested to submit by Facebook. Needless to say, in the hands of scammers such data could be used for phishing attacks and other malicious activities. Therefore, the company had no other choice, but to shut down the feature that allowed searching for the social media users by email address or their telephone number.

All in all, it seems web scraping might be an easy way to steal private information from web pages that do not apply necessary safety measures. Hopefully, the discussed example will encourage more companies handling enormous amounts of sensitive data to take security risks of web scraping more seriously and look for ways to prevent hackers from stealing their user or client information. Of course, there are things all of us can do to ensure at least some of our private information does not end up in the hands of scammers, identity thieves, and other cybercriminals.

Naturally, the smartest thing to do when creating a new profile or account would be to provide only the minimum required information. What's more, users should always think whether the data they plan on making public cannot be used to harm them in any way, especially if it gets combined with pieces of information already available to everyone. It goes for both the information you put on your profile and data you agree to share with other applications/websites.

By Foley

August 30, 2018

Password Security