Skip to main content

First Blog - Pup

<time datetime="2022-11-09 16:44:09 &#43;0530 IST">9 November 2022</time><span class="px-2 text-primary-500">&middot;</span><span>291 words</span><span class="px-2 text-primary-500">&middot;</span><span title="Reading time">2 mins</span><span class="px-2 text-primary-500">&middot;</span> <span class="mb-[2px]"> <a href="#/blog/first_blog.md" class="text-lg hover:text-primary-500" rel="noopener noreferrer" target="_blank" title="Edit content" ><span class="relative inline-block align-text-bottom px-1 icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M490.3 40.4C512.2 62.27 512.2 97.73 490.3 119.6L460.3 149.7L362.3 51.72L392.4 21.66C414.3-.2135 449.7-.2135 471.6 21.66L490.3 40.4zM172.4 241.7L339.7 74.34L437.7 172.3L270.3 339.6C264.2 345.8 256.7 350.4 248.4 353.2L159.6 382.8C150.1 385.6 141.5 383.4 135 376.1C128.6 370.5 126.4 361 129.2 352.4L158.8 263.6C161.6 255.3 166.2 247.8 172.4 241.7V241.7zM192 63.1C209.7 63.1 224 78.33 224 95.1C224 113.7 209.7 127.1 192 127.1H96C78.33 127.1 64 142.3 64 159.1V416C64 433.7 78.33 448 96 448H352C369.7 448 384 433.7 384 416V319.1C384 302.3 398.3 287.1 416 287.1C433.7 287.1 448 302.3 448 319.1V416C448 469 405 512 352 512H96C42.98 512 0 469 0 416V159.1C0 106.1 42.98 63.1 96 63.1H192z"/></svg> </span></a > </span>

What I learned today #

Today I came across a fantastic tool called pup.

From github:

Pup is “a command line tool for processing HTML. It reads from stdin, prints to stdout, and allows the user to filter parts of the page using CSS selectors.#

It is basically a html parser, and makes web scraping from websites dead simple.

What I did #

I used pup in a simple script, which just fetches all the links on a website, and gives you a list to choose one, and then opens it up in a browser

Gif

Tools used #

Walkthrough #

So first lets get a website, I’ll use mine, and curl it

curl -sL https://kshitijaucharmal.github.io/main

Now pass that in pup with the –color flag. This just makes it look better

curl -sL https://kshitijaucharmal.github.io/main | pup --color

We want all the links now, so as you might know, links in html are in a tag “<a>” and in the attribute href.

curl -sL https://kshitijaucharmal.github.io/main | pup --color 'a attr{href}'

Output #

#main-content
/
/main
/blog
/
/main/
/tags/personal/
#what-this-website-is-about
https://gohugo.io/
#my-youtube-channel
https://youtube.com/@artificialcode
#online-projects
https://kshitijaucharmal.github.io/gridworld
https://kshitijaucharmal.github.io/NEAT-JS
https://narutotheboss.itch.io/bishop-challenge
#passion-projects-on-github--gitlab
https://github.com/kshitijaucharmal/2048
https://github.com/kshitijaucharmal/WaveFunctionCollapse
https://github.com/kshitijaucharmal/gridworld
https://github.com/kshitijaucharmal/GridWorld-Processing
https://github.com/kshitijaucharmal/NEAT-JS
https://github.com/kshitijaucharmal/NEAT-Algorithm
https://github.com/kshitijaucharmal/bishop-challenge
https://github.com/kshitijaucharmal/Reverse-Shell
https://github.com/PlumPeach
https://github.com/kshitijaucharmal/Genetic-Sentences
https://github.com/kshitijaucharmal/KMeans-Visualization
https://github.com/kshitijaucharmal/Lorenz-Equation
https://github.com/kshitijaucharmal/Flocking
https://github.com/kshitijaucharmal/Boids
https://www.facebook.com/sharer/sharer.php?u=https://kshitijaucharmal.github.io/main/&amp;quote=Kshitij%27s%20website
https://twitter.com/intent/tweet/?url=https://kshitijaucharmal.github.io/main/&amp;text=Kshitij%27s%20website
https://pinterest.com/pin/create/bookmarklet/?url=https://kshitijaucharmal.github.io/main/&amp;description=Kshitij%27s%20website
https://reddit.com/submit/?url=https://kshitijaucharmal.github.io/main/&amp;resubmit=true&amp;title=Kshitij%27s%20website
mailto:?body=https://kshitijaucharmal.github.io/main/&amp;subject=Kshitij%27s%20website
#the-top
https://gohugo.io/
https://git.io/hugo-congo

Lets grep out the lines starting with http to just get the links and pipe it to dmenu

curl -sL https://kshitijaucharmal.github.io/main | pup --color 'a attr{href}' | grep '^http' | dmenu -i -l 10

Now you can store it in a variable and ask brave to open it!!

brave $(curl -sL https://kshitijaucharmal.github.io/main | pup --color 'a attr{href}' | grep '^http' | dmenu -i -l 10)

Thats it.

Finished Script #