whatsinlaion.com Help

Overview

This site is built on data found in LAION’s improved_aesthetics_4.5plus with effective indexes for domain name, total number of urls per domain name, and the aesthetic score of image urls.

Totals View

Totals page

Table view of subdomains and the number of image urls belonging to subsets of LAION5B.  Each url was parsed for their root address and pre-totaled.

Clicking the left hand column will direct you to a detailed view for that domain. Clicking the number on the right hand column will direct you to that domain’s image gallery.

Domain Search

Domain search

To avoid heavy processing/indexing of substring matching, search terms are based on matching letters from the beginning of word boundaries in the url.

For example the url s3-ap-southeast-2.amazonaws.com is broken into:

s3   ap   southeast   2   amazonaws   com

This address would be found in queries: s3 sou amazon

But would not be visible in queries: east aws as there are letters that precede the search term.

(I will explore different indexing options in the future, but that is the working functionality at the moment.)

Domains are presented in descending order of total amount in the improved_aesthetics_4.5plus dataset. Queries are limited to the first 40,000 results found per query to limit processing time. (The search results for ‘com’ for example have over 5 million results.)

Please be patient with longer running queries and be as specific as possible.

Finding domain information

Many platforms use CDNs and alias server names that don’t correlate to their main address. Pinterest for example uses the domain ‘i.pinimg.com’ for hosting image content.

If you’re looking to see where a specific image is located (to tell if a particular website or service was scraped for instance) you may need to first discover the hosts address.

Method 1 - Right Click -> Open Image in New Tab

Right click. Open image in new tab.

This will open the image in a new browser tab where you may see the web address.

Url address of image

The section of the url that follows ‘http://’ or ‘https://’ is the domain where the image is stored.

Method 2 - Developer Tools

Some websites disable right-clicking on images, but the address is still available via the browser’s developer tools. The following video demonstrates how to find the image url if right-click access is disabled.

https://youtu.be/weVF9vjAFKc?si=aGfWabqPVe8i8NPn&t=53

The gallery view presents images in order of highest to lowest aesthetic score, filtered by their respective domains (i0.wp.com in this instance.)

Raising and lowering the ‘Low’ and ‘High’ values then pressing the ‘Update’ button will limit the range of aesthetic scores visible on the page. 

Each page will attempt to load 30 images at a time. Scrolling down will load more images up to a max of 180 images, before a ‘Load More’ button becomes visible. 

Remember these are only 180 of 1,379,851,932 images contained in this dataset! I encourage you to find interesting sites to limit the Gallery view, as well as manipulate the Low/High value of aesthetics to get a better sense of imagery spread across the dataset. 

Image Card View

Clicking on an image in the Gallery View will reveal an image popup with more detail from the dataset, as well as some navigational functions.

Card Header

Domain - the top left corner displays the host of the image

- navigate to the detail view of this domain

- navigate to the gallery view of this domain

Score - the # in the top right corner comes from the aesthetic_score column of the dataset

- set the ‘high’ number of the gallery view to be equal to this score

Card Image

Url- the red text is the url of this image found inside the dataset. Clicking the link here will open the image in a new tab. Reminder - the images loaded here are not hosted by this site, but accessed via a direct link to the host.

Alt text - below the image in dashed border is the ‘alt=’ text used to describe the image.

view prev / next image loaded in gallery.

visually similar - open a new tab to a CLIP (image recognition) based search on haveibeentrained.com

search alt text - open a new tab to search using the alt text of the image on haveibeentrained.com 

Help! No images are shown for..

Image requests that return an error code, such as 404, are hidden from the user at this time. It may be the url has gone bad, or that the website is blocking traffic linked from outside domains.

Images that are above a certain NSFW score threshold are also omitted from display.

Additional Functionality

The features of this website are focused on discovery of image hosts and quick navigation of the content based on aesthetic ranking. Direct searching for image urls via the visual content, filename, or alt text tags is not accessible at this time. For additional functionality I recommend the following:

For search queries of specific keywords related to the text description of an image, see haveibeentrained.com.  This is very effective for finding works related to artist, celebrity, and intellectual property names/trademarks.

For using CLIP (image recognition based) search, see also haveibeentrained.com or LAION’s Clip front https://rom1504.github.io/clip-retrieval/

For full SQL based queries of the top 12 million ranked images, see https://laion-aesthetic.datasette.io/