>>102093805
thank you, knew it was missing the L flag for file download
i few it into markov chain again asking it to download in parallel and seems to be working so far
#!/bin/bash
set -eu
base_url="https://hackspace.raspberrypi.com/issues"
fetch_download_link() {
local issue_number=$1
response=$(curl -s "${base_url}/${issue_number}/pdf/download")
download_link=$(echo "$response" | grep -oP '(?<=<iframe src=")(.*?)(?=")' | sed 's|^|https://magpi.raspberrypi.com|')
if [ -n "$download_link" ]; then
echo "$download_link"
else
echo "No download link found for issue #$issue_number" >&2
fi
}
download_file() {
local url=$1
local filename=$(basename "$url")
if [ ! -f "$filename" ]; then
echo "Downloading $filename from $url"
curl -LO "$url"
else
echo "File '$filename' already exists, skipping download."
fi
}
export -f fetch_download_link
export -f download_file
export base_url
issue_numbers=$(seq 1 81)
download_links=($(echo "$issue_numbers" | parallel -j 10 fetch_download_link))
echo "Download links:"
for link in "${download_links[@]}"; do
echo "$link"
done
echo "Starting downloads..."
echo "${download_links[@]}" | tr ' ' '\n' | parallel -j 4 download_file
maybe could raw libressl netcat to get the information, curl has bugs often, wget probably too
fastest way to download a file with shell? (i know if you download single file in browser it downloads it via multiple connections in chunks)
for pdfs, i would use https://github.com/QubesOS/qubes-app-linux-pdf-converter which converts it into series of images of the pages with no executable code so it can be viewed safely
then alongside i would keep original pdf and extract all files and code from it into folder next to it all
then generate (probably blake3 because fast) hash https://github.com/BLAKE3-team/BLAKE3 to verify correct file is downloaded