I bi ke Schruuber und o ke Coder, aber hiä äs PowerShell-Script churz uf em Heiwäg inneghackt. Für diä wo's wei usprobiere:
- Code unge kopiere und in Microsoft PowerShell ISE iifüege oder .txt abelade und in .ps1 umbenenne und mit ISE ufte
- Variable aapasse
- $pageFrom
- $pageTo
- $baseURL
- usfüehre, ca. 1min warte - voilà
- macht ä .html mit allne Biudli vor definierte $baseUrl
- äs ladt keiner Biudli aabe! macht rein nume HTML-Code
- uf biudli klick um dsBiudli gross aaluege (neue Tab)
<#
.SYNOPSIS
A WotLab Board Thread Image Grabber script
.NOTES
This is a quick n dirty approach!
.TODO
- move code to a function
- change init vars into function params
.AUTHOR
SISTöfflibueb
#>
# init
$timer = [Diagnostics.Stopwatch]::StartNew()
[int]$pageFrom = 1
[int]$pageTo = 256
[string]$placeHolderPageIndex = "{%PAGE_INDEX%}"
[string]$baseURL = "https://www.toeff-forum.ch/thread/8230-der-eure-neuesten-anschaffungen-thread/?pageNo={%PAGE_INDEX%}" # note place holder {%PAGE_INDEX%}
# log and output files.
# created in current folder
$logFile = Join-Path -Path $PSScriptRoot -ChildPath "ParseWebPage.log"
$htmlFile = Join-Path -Path $PSScriptRoot -ChildPath "ParseWebPage.html"
$htmlGallery = Join-Path -Path $PSScriptRoot -ChildPath "ParseWebPageGallery.html"
# write dummy non wc3 conform gallery html file
[string]$htmlCode = "<!DOCTYPE html>`n" +
"<html><head>`n" +
"<title>SiSTöfflibuebs WotLab Thread Image Viewer</title>`n" +
"</head>`n" +
"<body>"
Out-File -FilePath $htmlGallery -InputObject $htmlCode
$htmlCode = "<h1>base url: $baseUrl</h1>"
Out-File -FilePath $htmlGallery -InputObject $htmlCode -Append
$htmlCode = "<h1>page from: $pageFrom</h1>"
Out-File -FilePath $htmlGallery -InputObject $htmlCode -Append
$htmlCode = "<h1>page to: $pageTo</h1>"
Out-File -FilePath $htmlGallery -InputObject $htmlCode -Append
# loop through all required pages ...
for ($pageIndex = $pageFrom; $pageIndex -le $pageTo; $pageIndex++) {
# build url
[string]$url = $baseURL.Replace($placeHolderPageIndex, $pageIndex)
# get web page
Write-Host "loading url $url (page $pageIndex/$pageTo)..."
$WebResponse = Invoke-WebRequest $url -UseBasicParsing
if ($WebResponse.StatusCode -eq 200) {
# page successfully accessed
Write-Host "web page successfully accessed ($($WebResponse.StatusCode))!" -ForegroundColor Green
# write to gallery file
$htmlCode = "<!-- images of $url -->"
Out-File -FilePath $htmlGallery -InputObject $htmlCode -Append
$htmlCode = "<h1><a href=""$url"" target=""_blank"">images of Page $pageIndex</a></h1>"
Out-File -FilePath $htmlGallery -InputObject $htmlCode -Append
# parse urls
Write-Host ""
Write-Host "extracting href-image urls ..." # WotLab Board specific urls
$srcrpattern = '(?i)href="(.*?)"'
#$srcrpattern = '(?i)src="(.*?)"'
#$srcrpattern = '(?i)src="(.*?)\battachment\b(.*?)"'
$srcs = ([regex]$srcrpattern ).Matches($WebResponse.RawContent) | ForEach-Object { $_.Groups[1].Value }
Write-Host ""
$imageUrls = New-Object System.Collections.ArrayList
foreach ($src in $srcs) {
# look for attchment urls
if ($src.ToLower().Contains("attachment")) {
$src = $src.replace("/?thumbnail=1","")
[void]$imageUrls.Add($src)
Write-Host $src
# write to gallery file
$htmlCode = "<a href=""$src"" target=""_blank""> <img src=""$($src)?thumbnail=1""></a>"
Out-File -FilePath $htmlGallery -InputObject $htmlCode -Append
}
}
# TODO - copy/pasted from code block above!
Write-Host ""
Write-Host "extracting src-image urls ..." # WotLab Board specific urls
$srcrpattern = '(?i)src="(.*?)"'
#$srcrpattern = '(?i)src="(.*?)"'
#$srcrpattern = '(?i)src="(.*?)\battachment\b(.*?)"'
$srcs = ([regex]$srcrpattern ).Matches($WebResponse.RawContent) | ForEach-Object { $_.Groups[1].Value }
Write-Host ""
$imageUrls = New-Object System.Collections.ArrayList
foreach ($src in $srcs) {
# look for attchment urls
if ($src.ToLower().Contains("attachment")) {
$src = $src.replace("/?thumbnail=1","")
[void]$imageUrls.Add($src)
Write-Host $src
# write to gallery file
$htmlCode = "<a href=""$src"" target=""_blank""> <img src=""$($src)?thumbnail=1""></a>"
Out-File -FilePath $htmlGallery -InputObject $htmlCode -Append
}
}
Out-File -FilePath $htmlFile -InputObject $WebResponse.Content
<#
foreach ($imageUrl in $imageUrls) {
start $imageUrl
}
#>
#Write-Host "Parsed HTML"
#$WebResponse.ParsedHtml.querySelector("img").src
<#
$imageUrls = $WebResponse.Links | Where-Object {
$_.href -like "*attachment*"
} | Select-Object href
#>
<#
$WebResponse.Images | Where-Object {
$_.name -like "* Value*"
} | Select-Object title, class, src
#>
} else {
# failed to access web page
Write-Host "failed to access web page $($WebResponse.StatusCode)!" -ForegroundColor Red
}
} # for each page
$htmlCode = "</body></html>"
Out-File -FilePath $htmlGallery -InputObject $htmlCode -Append
Write-Host ""
Write-Host "script runtime: $( [Math]::Round($timer.Elapsed.TotalSeconds,0)) seconds" -ForegroundColor Green
$timer.Stop()
start $htmlGallery
Alles anzeigen
planti Verbesserige:
- keiner dopplete Biudli aazeige
- viellech no X Zeiche vor und/oder nach em Biud als Hover-Text lah aazeige
Funktioniert übrigens o fürs Witze-Thema!
Da SERVER uf Fett steiht, ha-n-i zersch no mit GreaseMonkey probiert, dassi grad dsGallery-Feature cha bruuche und schön dargestellt. Aber dsDOM isch niä komplett glade gsy, ha s de lah sy. Weiss nid, ob mr für WotLab cha eigeti Code-Modul schriibe, ä Port uf JS, Pyhton, Ruby wär ja kes Problem - de chönnt mr das grad als AddIn im Board innepflüümle. Aber für mi Zwäck längts, cha jetzt au-i Aaschaffige schön aaluege u mi lah inspiere. Merci