Londonchiropracter.com

This domain is available to be leased

Menu
Menu

Why web scraping is vital to democracy

Posted on December 28, 2020 by admin

The fruits of web scraping — using code to harvest data and information from websites — are all around us.

People build scrapers that can find every Applebee’s on the planet or collect congressional legislation and votes or track fancy watches for sale on fan websites. Businesses use scrapers to manage their online retail inventory and monitor competitors’ prices. Lots of well-known sites use scrapers to do things like track airline ticket prices and job listings. Google is essentially a giant, crawling web scraper.

Scrapers are also the tools of watchdogs and journalists, which is why The Markup filed an amicus brief in a case before the U.S. Supreme Court this week that threatens to make scraping illegal.

The case itself—Van Buren v. United States—is not about scraping but rather a legal question regarding the prosecution of a Georgia police officer, Nathan Van Buren, who was bribed to look up confidential information in a law enforcement database. Van Buren was prosecuted under the Computer Fraud and Abuse Act (CFAA), which prohibits unauthorized access to a computer network such as computer hacking, where someone breaks into a system to steal information (or, as dramatized in the 1980s classic movie “WarGames,” potentially start World War III).

In Van Buren’s case, since he was allowed to access the database for work, the question is whether the court will broadly define his troubling activities as “exceeding authorized access” to extract data, which is what would make it a crime under the CFAA. And it’s that definition that could affect journalists.

Or, as Justice Neil Gorsuch put it during Monday’s oral arguments, lead in the direction of “perhaps making a federal criminal of us all.”

Investigative journalists and other watchdogs often use scrapers to illuminate issues big and small, from tracking the influence of lobbyists in Peru by harvesting the digital visitor logs for government buildings to monitoring and collecting political ads on Facebook. In both of those instances, the pages and data scraped are publicly available on the internet—no hacking necessary—but sites involved could easily change the fine print on their terms of service to label the aggregation of that information “unauthorized.” And the U.S. Supreme Court, depending on how it rules, could decide that violating those terms of service is a crime under the CFAA.

“A statute that allows powerful forces like the government or wealthy corporate actors to unilaterally criminalize newsgathering activities by blocking these efforts through the terms of service for their websites would violate the First Amendment,” The Markup wrote in our brief.

What sort of work is at risk? Here’s a roundup of some recent journalism made possible by web scraping:

  • The COVID tracking project, from The Atlantic, collects and aggregates data from around the country on a daily basis, serving as a means of monitoring where testing is happening, where the pandemic is growing, and the racial disparities in who’s contracting and dying from the virus.
  • This project, from Reveal, scraped extremist Facebook groups and compared their membership rolls to those of law enforcement groups on Facebook—and found a lot of overlap.
  • The Markup’s recent investigation into Google’s search results found that it consistently favors its own products, leaving some websites from which the web giant itself scrapes information struggling for visitors and, therefore, ad revenue. The U.S. Department of Justice cited the issue in an antitrust lawsuit against the company.
  • In Copy, Paste, Legislate, USA Today found a pattern of cookie-cutter laws, pushed by special interest groups, circulating in legislatures around the country.

This article was originally published on The Markup and was republished under the Creative Commons Attribution-NonCommercial-NoDerivatives license.

Source

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Trump says Anthropic Pentagon deal is ‘possible’, weeks after blacklisting the company as a national security risk
  • Samsung and IKEA just made the $6 smart home real, and your TV is already the hub
  • OpenAI recruits Cognizant and CGI to take Codex into enterprise software shops worldwide
  • Lovable left thousands of projects exposed for 48 days, and the vibe coding security crisis is only getting worse
  • Humble emerges from stealth with $24M and a cableless autonomous electric truck built to go dock-to-dock

Recent Comments

    Archives

    • April 2026
    • March 2026
    • February 2026
    • January 2026
    • December 2025
    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020

    Categories

    • Uncategorized

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    ©2026 Londonchiropracter.com | Design: Newspaperly WordPress Theme