< >

Make a web page screenshot service

Posted seven months ago

I'll take you step by step into how to make a service that takes screenshots of webpages and returns them as an image.

First, let's assume you have google-chrome or chromium-browser installed. Both should work the same way. These browsers have command line options that let you capture a screenshot in headless mode.

chromium-browser --headless 
    --disable-gpu 
    --no-sandbox 
    --screenshot=out.png 
    --window-size=1280,900 
    --virtual-time-budget=1000 
    --user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" 
    https://cbc.ca/news

Explanation of parameters

Url of the site to screenshot

--headless	Put it in headless mode
--disable-gpu	Using GPU in the background sometimes gives us problems, so we disable it
--no-sandbox	The sandbox often gives problems with headless execution as well.
--screenshot="<filename>"	You can tell it to capture the screenshot to this file.
--window-size=width,height	Set the width and height of the window
--virtual-time-budget=<ms>	Wait this many milliseconds before taking the screenshot, to give the site time to execute frontend frameworks.
--useragent="<user agent>"	Set a custom user agent, since many sites will not work with the default one used in headless mode.
url

Making a service

Let's make it into a docker image that we can use. I won't go into the details, since I merely asked AI to do it for me. Synchronize the repo it made here:

git clone github.com/smhanov/screenshot-service
cd screenshot-service
make build
make run

This will build and create your screenshotting service.Then you can get the screenshot of any web site by going to the url:

http://localhost:5000/screenshot?url=https://cbc.ca/news

It also accepts other parameters for the various arguments.

Steve Hanov makes a living working on Rhymebrain.com, rapt.ink, www.websequencediagrams.com, and Zwibbler.com. He lives in Waterloo, Canada.

Post comment

Why are all my lines fuzzy in cairo?

Make sure your lines are sharp using this simple trick.

Tool for Creating UML Sequence Diagrams

If you have to draw something called "UML Sequence Diagrams" for work or school, you already know that it can take hours to get a diagram to look right. Here's a web site that will save you some time.

Asking users for steps to reproduce bugs, and other dumb ideas

You can fix impossible bugs, if you really try.

Detecting C++ memory leaks

It's fairly simple to redefine malloc() and free() to your own functions, to track the file and line number of memory leaks.

C++: A language for next generation web apps

On Monday, I was pleased to be an uninvited speaker at Waterloo Devhouse, hosted in Postrank's magnificent office. After making some surreptitious alterations to their agile development wall, I gave a tongue-in-cheek talk on how C++ can fit in to a web application.

Asana's shocking pricing practices, and how you can get away with it too

If one apple costs $1, how much would five apples cost? How about 500? If everyday life, when you buy more of something, you get more bananas for your buck. But software companies are bucking the trend.

Four ways of handling asynchronous operations in node.js

Javascript was not designed to do asynchronous operations easily. If it were, then writing asynchronous code would be as easy as writing blocking code. Instead, developers in node.js need to manage many levels of callbacks. Today, we will examine four different methods of performing the same task asynchronously, in node.js.

O(n) Delta Compression With a Suffix Array

The difference between two sequences A and B can be compactly stored using COPY/INSERT operations. The greedy algorithm for finding these operations relies on an efficient way of finding the longest matching part of A of any given position in B. This article describes how to use a suffix array to find the optimal sequence of operations in time proportional to the length of the input sequences. As a preprocessing step, we find and store the longest match in A for every position in B in two passes over the suffix array.

Fast and Easy Levenshtein distance using a Trie

If you have a web site with a search function, you will rapidly realize that most mortals are terrible typists. Many searches contain mispelled words, and users will expect these searches to magically work. This magic is often done using levenshtein distance. In this article, I'll compare two ways of finding the closest matching word in a large dictionary. I'll describe how I use it on rhymebrain.com

A Rhyming Engine

Here's a rhyming engine, written in 1000 lines of C++ code. It uses the freely available Moby dictionary, and full source code is provided.